Ridge regression for Statistical Arbitrage in Crypto (pairs trading)

I previously wrote an article for Dorian Trader which explored using ridge regression to develop a trading strategy. The article showed how ridge regression helped prevent overfitting as compared to regular regression, and was often uncannily good at predicting the peaks and troughs of trading data. See the original article here.

I also created a follow up article for Crypto News testing the same strategy on cryptocurrencies.

In those articles, the strategy was limited to only using a single asset. That is, the signals used as inputs were calculated only from the price history of the same asset that we were trading. In this article, we’ll explore whether the price history of closely related assets can contain valuable predictive information. This could be, for example, stocks from a similar industry, or different crypto currencies, both of which tend to move together. This is usually called statistical arbitrage or pairs trading (but note that despite the name, one could do it with any number of related assets, not necessarily two).

The basic mechanism

Pairs trading or statistical arbitrage makes use of the tendency for two assets to move together. It may mean that when one asset moves, the other is more likely to follow suit. It may also mean that the further apart the two assets become, the more likely they are to converge together again.

When we applied ridge regression to the single asset case, we used moving averages and regression lines of various lengths as our main signals. Here, we will do the same, except that we will have an additional set of signals generated from the second asset. We will use three moving averages and three regression lines of different lengths, so the regression has a choice of using whichever is more predictive. You could also use more than three, especially initially, if you are unsure of how much time might elapse between divergence and convergence of the related assets.

The mechanism here is that by including moving averages of different lengths, a linear model like ridge regression can implicitly include their differences as well, so that it can capture how much each asset has diverged (up or down) compared to its earlier values. It also implicitly includes the difference between the divergence of the first asset and the divergence of the second asset, so that it can capture how much one asset has diverged relative to the other.

If we believe that there is a lag between divergence and convergence, we could consider introducing a lag between the signal calculation time and the time at which the ridge regression tries to predict the price. However, by including signals like moving averages calculated over a variety of time windows, the model will be able to automatically calculate a few lagged moving averages anyway (as these are a linear combination of existing moving averages).

An additional issue springs to mind in the two asset case, is that the two assets may have very different scales, and the ratio between the two could drift over time. For this reason, one might consider taking various ratios of the signals and including them as additional input signals:

For each asset, we can generate additional signals by taking the ratios of moving averages of different lengths, and the ratios of regression lines of different lengths.
We’ll also add the ratios of the same signal between the two assets. This will allow the ridge regression to track how much the the assets have diverged in terms of their ratio.

The reality is, if the data window is not too large, the scales of the assets won’t change too much over the data set, and the relevant ratios can be assumed constant and will be automatically captured by the coefficients chosen by the ridge regression. So one might make a case that it’s not necessary to include the ratios when fitting over a relatively small time window (we will use one week). However, I’ve chosen to include all the mentioned ratios anyway. If nothing else, being able to see the coefficients for these signals provides more insight into how the ridge regression is working.

At the moment the model trains only on the most recent week of data. Although more data might seem better, more recent data is more relevant and allows the model to adapt to recent market conditions.

Cryptocurrencies

We use one week of data ending on the 14th of August 2025, with the test set being approximately the final three hours of data.

It’s most likely that movements in the big coins are going to presage movements in the smaller coins, rather than the other way round. For this reason it’s most sensible to use BTC as an input when trying to predict ETH. Below we show the coefficients that the ridge regression has assigned to each of our signals.

RegLine5 = -0.707
RegLine20 = 0.509
RegLine50 = -0.403
MovAvg5 = -0.323
MovAvg20 = 0.185
MovAvg50 = 1.127
MovAvg5/20 = -0.442
MovAvg5/50 = 1.025
MovAvg20/50 = -0.674
RegLine5/20 = 0.055
RegLine5/50 = -0.205
RegLine20/50 = 0.129
Close_BTC-USD = 0.014
RegLine5_BTC-USD = -0.046
RegLine20_BTC-USD = 0.542
RegLine50_BTC-USD = -0.142
MovAvg5_BTC-USD = -0.051
MovAvg20_BTC-USD = -0.192
MovAvg50_BTC-USD = -0.235
MovAvg5/20_BTC-USD = 0.249
MovAvg5/50_BTC-USD = -0.496
MovAvg20/50_BTC-USD = 0.291
RegLine5/20_BTC-USD = 0.096
RegLine5/50_BTC-USD = -0.18
RegLine20/50_BTC-USD = 0.117
MovAvg5/5 = -0.555
MovAvg20/20 = 0.139
MovAvg50/50 = 0.116
Strategy profit = 107.596
Buy Hold profit = 19.161

The first 12 signals are all calculated from ETH price data. They consist of the linear regression lines (across three windows of different lengths), moving averages, ratios of the moving averages and ratios of the linear regression predictions. The following 12 signals are identical except calculated using BTC price data. The final three signals are the ratios of the the ETH moving averages to the BTC moving averages.

Since ridge regression tries to make the coefficients small when it doesn’t harm the predictive power too much, and the signals are scaled prior to fitting, one can to an extent judge the significance of each signal by the size of the coefficient.

The most significant ETH signals are the 50 minute moving average, and the ratio between the 5 and 50 minute moving averages. The other moving average ratios, and the regression lines, are also significant.
The BTC signals are less significant than the ETH signals with smaller coefficients, as you would expect. The most significant are the 20 minute regression line, and the ratio between the 5 and 50 minute moving average.
The ratio between the 5 minute BTC and 5 minute ETH moving average also makes a contribution.

Looking at the plot, we see what we often see with ridge regression – it can be very good at placing sell markers at the peaks, and buy markers at the troughs. This gives a visual representation of why the strategy profit is 107.6 while the buy and hold profit is only 19.2.

But how much impact is the second asset, BTC, having here?

Isolating the impact of the second asset

We naturally want to know how much predictive power is coming from the second asset (BTC), and how much is coming from ETH’s own price history. We can see from the size of the coefficients that the ETH price history is more significant than the BTC price history.

To further evaluate this question, we can omit the BTC signals and see how much the prediction / profitability degrades.

Now when we do this exercise, it’s important that we make the test and fit sets the same. This is because when the test set is outside of the data seen during fitting, it can be totally arbitrary and could favour any one of the models simply by chance. This makes comparison of different models very difficult. This might be mitigated by using a very large test set, which would have to be assumed to be representative of all future possibilities. However, this requires that a much larger amount of data be obtained, and the fact that market regimes change over time raises it’s own challenges. By using overlapping test and fit sets, we can test which of the three models can match the data most successfully, albeit without considering potential overfitting. In this case, I’ve used the entire week’s dataset for both testing and fitting (which is why the numbers are different from the previous section). For reference, the buy and hold profit was 945.3 in this case.

Fit and backtest the model using only ETH signals – Strategy profit = 1343.0, Rsquared = 0.011
Fit and backtest the model using both – Strategy profit =1556.5, Rsquared = 0.014

These numbers make a possible case that including the BTC signals improves the strategy. However, the case isn’t water tight due to the fact that including additional signals increases the opportunity for overfitting.

More than two assets

Statistical arbitrage, or the principles behind pairs trading, can work with any number of assets. For example, one could use a whole basket of stocks from the same industry. In the case of crypto trading, we can consider using other coins in our prediction. Movements in smaller coins are unlikely to have as big an impact as Bitcoin on the price of ETH. Let’s try the exercise again including the next four largest coins by market capitalization: XRP, BNB, SOL and DOGE.

Using five coins as additional signals: Strategy profit = 2051.7, Rsquared = 0.015

The strategy profit is distinctly higher than when using only ETH, and when using ETH and BTC as inputs. All of the coins had signals with decent sized coefficients, suggesting that the ridge regression found them significant. However, as mentioned already, overfitting is a concern here.

Risks of multi-asset strategies

It’s important to consider that a strategy involving the signals of other assets may be more dangerous than one involving only one asset. This is because although a group of assets may often move similarly, there’s the possibility for their behaviour to significantly diverge for periods of time. For example, two crypto coins may often move similarly as they are both affected by general sentiment about crypto. But if one coin were to experience significant growth or decline based on events pertaining only to that asset, the relationship may cease to hold. Safeguards need to be added to the strategy to enable it to disregard the signals of other assets under certain circumstances.

Genius Mathematics Consultants

Quantitative Consulting Services from PhDs