How Genius Mathematics Consultants Compares to Big Financial Consulting Firms

When you think about financial consulting firms, you probably think about huge firms like EY, Deloitte, KPMG, PwC and McKinsey. But did you know that it’s possible to get superior expertise, more conveniently and with faster execution, all at dramatically lower cost than big consulting firms?

This consulting practice is deliberately different. We specialise in all technical and quantitative work, in both financial services and in science and engineering, right up to PhD research level — delivered personally, efficiently, and without the overheads of a large corporate machine.

Value for money

Big consulting firms:

  • High overheads due to layers of partners, managers, office infrastructure and expensive real estate.
  • Day rates often reflect branding and corporate structure, not actual work.
  • You may meet a senior expert at the proposal stage, but most work is done by juniors paid a small fraction of the fee you pay.

Our consulting practice:

  • Lean structure with no inflated corporate costs and no real estate costs.
  • You pay only for the hours worked by PhD qualified experts
  • No outsourcing to cheaper or junior employees

What this means for you:
Dramatically lower costs, yet more experience and expertise.

Quality and depth of expertise

Big consulting firms:

  • Rely more on branding, image and politics than on rigorous work
  • Technical work often handled by consultants with limited specialist training.
  • Reliance on frameworks and templates rather than deep analysis.
  • Documentation frequency incomplete, confusing or using obscure legalistic language, or no documentation at all.

Our consulting practice:

  • Fully specialised and PhD-qualified in mathematics, coding, problem solving and quantitative finance.
  • Tailored, mathematically rigorous solutions rather than generic frameworks.
  • Clear and organised documentation

What this means for you:
You get bespoke, focused and technically accurate solutions for complex problems, explained and documented clearly.

Convenience, speed, and lack of bureaucracy

Big consulting firms:

  • Complex onboarding, resourcing, and reporting processes.
  • Slow adaptation to changing project needs or new information.
  • Multiple communication layers between the client and actual modeller.
  • The person you speak with may not be the person producing the work.
  • Delays of weeks or months while other work is prioritised
  • Small, specialised technical tasks are often uneconomical for them.

Our consulting practice:

  • Direct, fast, and responsive — you deal with the person actually doing the work.
  • Flexible and able to pivot quickly as requirements evolve.
  • No unnecessary bureaucracy or internal approval cycles.
  • Clear accountability and ownership of work delivered.
  • Ideal for both small targeted projects and complex long-term engagements.

What this means for you:
Faster turnaround, clear communication, and you get exactly the expertise you need, in the format that suits your business.

Independence and objectivity

Big consulting firms:

  • May have partnerships or commercial agreements with software vendors.
  • Recommendations can sometimes be influenced by internal business interests, and politics that serve the consulting firm rather than the needs of your business.

Our consulting practice:

  • Fully independent, with no vendor alliances or incentives.
  • No internal politics – only objective advice driven purely by a desire to help you succeed.

What this means for you:
Objective, unbiased solutions designed solely around your needs.

The obvious choice

Looking to partner with a consulting firm? Contact Us Today to get the ball rolling.

Artificial Intelligence Consulting for Mathematical Problem Solving, Coding, and Quantitative Finance

At Genius Mathematics Consultants, we help businesses, researchers, and financial professionals harness the power of artificial intelligence models like ChatGPT, Claude and Google Gemini. Artificial intelligence is revolutionizing the way people engage in mathematical problem solving, coding, research and quantitative finance. As extremely impressive but imperfect tools, it’s important that they are guided by someone with the appropriate expertise in the underlying subject matter. AI assisted working is the future, and we can help you get started.

Mathematical Problem Solving with AI

Artificial intelligence has progressed rapidly and is now useful even for advanced mathematical and symbolic reasoning, theorem exploration and numerical analysis. Its ability to rapidly survey many sources, and combine and reformat the results to answer your query will create a revolution in research. Gone too is the time consuming task of formatting equations in Latex, as artificial intelligence now does this for you in a flash.

We help clients use these tools to dramatically improve efficiency, while ensuring results remain academically rigorous.

AI for Coding, Automation, and Algorithm Design

AI can now help developers write, debug and optimize code across multiple programming languages. At Genius Mathematics Consultants, we guide teams through AI-assisted algorithm design. We help you incorporate intelligent code automation without compromising the mathematical accuracy of your models or integrity of your software.

AI in Quantitative Finance

As a field focused on maths, coding and data processing, AI is going to revolutionise quantitative finance. Our consulting services cover deployment of AI for trading including researching and backtesting strategies, automated risk model validation for regulatory compliance, and rapidly building code to analyse and reformat trading book data..

We also utilize machine learning techniques such as machine learning trading strategies, option pricing using neural networks, and portfolio optimization using reinforcement learning. Our consultants can help you learn to use artificial intelligence to develop capabilities like these quickly and accurately.

Implementing AI in Your Organisation

Whether you’re exploring AI for the first time or wanting to delve deeper, we can help you develop a strategy to make AI work for your business. Our consultants can identify high-impact use cases, and take them from design to deployment. Our approach is collaborative and transparent. We don’t just deliver models — we help your organisation understand and control the technology behind them.

Why Work With Mathematics Consultants

Our consultants combine deep expertise in research level mathematics, coding, and quantitative finance. We bring cross-disciplinary experience to every engagement — leveraging artificial intelligence for everything from financial derivatives to engineering automation to symbolic reasoning for mathematical research. Every project is fully customized to align with your objectives, ensuring measurable results and long-term capability building.

We work with financial institutions, technology firms, and individual researchers who value both mathematical precision and innovative engineering.

Ready to integrate AI into your work?
Simply contact us to arrange a consultation on AI assisted problem solving, coding and quantitative modelling.

Algorithmic Differentiation: Fast Greeks in Monte Carlo Option Pricing

The problem with finite difference methods

If you’ve ever tried to calculate Greeks when pricing options using Monte Carlo, you’ve probably found that it’s both very slow, and also potentially inaccurate unless an enormous number of paths is used. There are two main reasons for this.

Firstly, calculating Greeks using finite differences requires that the pricing function be evaluated multiple times. For example, for a second order derivative like gamma, three evaluations of the pricing function are required. Monte Carlo pricers are already slow even when evaluated once, and having to evaluate them many more times to get the Greeks makes them many times slower again.

Secondly, since the step size needs to be small when calculating derivatives, the prices are likely very close together as well. When calculating the difference between two very similar values, you run the risk that you are just getting Monte Carlo noise. One could try to use a larger step size, so that the calculated difference is larger when compared against the noise, but the calculated value may then diverge from the true derivative.

You might also like to check out our article on the Longstaff-Schwarz method for pricing American options.

Algorithmic differentiation

You might have heard about algorithmic differentiation, and “adjoint” algorithmic differentiation – a technique for more efficiently and more accurately calculating Greeks for Monte Carlo pricing models. This technique allows derivatives to be calculated via repeated applications of the chain rule, without needing to evaluate the pricing function multiple times. Instead, one spends time manually (symbolically) differentiating some of the functions involved. However, you might have found that the explanations available on the internet tend to be quite abstract and much more confusing than they need to be. The purpose of this note is to give a clear and readable example of exactly how and why algorithmic differentiation works when applied to Monte Carlo options pricing.

Calculating delta and vega

The essence of a Monte Carlo option pricer is simple. The discounted payoff of the option is evaluated on each path \(i\), and then the present value \(P\) is simply the average:

\[PV = \frac{1}{N} \sum_i e^{-rT_i}\text{Payoff}(i).\]

Here \(T_i\) is the settlement date for the \(i^{th}\) path. Note that in general, for path dependent derivatives,

\[\text{Payoff(i)} = \text{Payoff}(S_0^i,…,S_N^i,T_i).\]

However, in many cases the payoff would only depend on \(S_N\).

Let’s suppose we want to calculate a derivative of \(PV\), say delta. By simply moving the derivative inside the summation we get

\[\frac{dPV}{dS_0} = \frac{1}{N} \sum_i \frac{d}{dS_0} \left( e^{-rT_i}\text{Payoff}(i) \right).\]

Let’s consider now calculating the term inside the summation for the \(i^{th}\) path, dropping the subscript \(i\) for simplicity. We start with the formula to generate the path. Each time step we calculate the value \(S\) of the underlying at the next step by some function

\[S_{n+1} = f(S_n, Z_n),\]

where \(Z_n\) is a random draw from a normal distribution. Let’s keep things simple by considering the case \(T_i = T\) for all \(i\) so we can pull the discount factor out the front. Then applying the chain rule we need to calculate

\[\frac{d}{dS_0} \left( \text{Payoff} \right) = \sum_k \frac{d\text{Payoff}}{dS_k} \frac{dS_k}{dS_0}.\]

The derivative of the payoff \(\frac{d\text{Payoff}}{dS_k}\) we’ll look at in the next sections. This term needs to be calculated individually for each kind of derivative in the trading book (this is one downside of algorithmic differentiation, as it increases the complexity of the code and the likelihood of errors). Let’s now examine the final term \(\frac{dS_k}{dS_0}\). The usual path generation calculation using geometric brownian motion is

\[S_{n+1} = S_n e^{(r-\frac{1}{2} \sigma^2) \delta t + \sigma \sqrt{\delta t} Z_n}.\]

Since the exponential term has no \(S\) dependence, this means we get simply

\[\frac{dS_k}{dS_0} = \frac{S_k}{S_0}.\]

This value can be calculated at the \(k^{th}\) time step and stored.

If we were calculating vega instead, we would get

\[\frac{d}{d\sigma} \left( \text{Payoff} \right) = \sum_k \frac{d\text{Payoff}}{dS_k} \frac{dS_k}{d\sigma}.\]

Then the final term is

\[ \frac{dS_{n+1}}{d\sigma} = S_{n+1} \left( -\sigma \,\delta t + \sqrt{\delta t}\, Z_n \right) \frac{S_{n+1}}{S_n} \cdot \frac{dS_n}{d\sigma}. \]

We have of course \(\frac{S_0}{\sigma} = 0\), and we calculate each successive value using the one before, storing the values to use at the end.

Vanilla options

The payoff of vanilla options only depends on the underlying at expiry. This means the derivatives with respect to \(S_k\) are all zero except the final one, which is

\[\frac{\text{dPayoff}_{Van}}{dS_N} = 1_{S_N > K} \]

for a call and

\[\frac{\text{dPayoff}_{Van}}{dS_N} = -1_{S_N< K} \]

for a put.

Barrier options

For barrier options we run into a problem where the derivative of the payoff with respect to \(S_k\) would seem to be zero, except for the case \(k=N\). This is because since each \(S_k\) is some finite distance from the barrier (assuming no breach yet), when you shift each one infinitesimally, you still get no barrier breach. But in reality, there is an increased probability of breach in between the time steps. To capture this, we need to use a Brownian bridge. We denote by

\[p_m = \text{exp}\left( -\frac{2(B-S_m)(B-S_{m+1})}{\sigma^2 \Delta t_m} \right) \]

the probability that the path breaches the barrier in between time steps \(\) and \(m+1\) (we assume an upper knockout barrier here). Then

\[ \text{Payoff} = \prod_{m=0}^{N-1} p_m \big(S_m,\, S_{m+1}\big) \text{Payoff}_{Van} \]

We can now easily differentiate this payoff with respect to \(S_m\) using the chain rule, a noting that we already calculated the derivatives of the second term in the previous section.

Gamma

A technical difficulty arises around using this method for gamma. If you think about the shape of the payoff of a vanilla option, it’s second derivative is zero everywhere except at the strike, where it could be thought of as infinite (or undefined). One simple way you can attempt to handle this issue is to slightly smooth the payoff function near the bend. Another approach is to not differentiate the payoff at the final step N, but instead differentiate it’s expectation one step earlier (which is simply the Black Scholes formula over one time step). This trick is sufficient to smooth out the kink in the payoff.

Adjoint Algorithmic Differentiation

There’s a slightly different approach which is more computationally efficient when you want to calculate a large number of derivatives with respect to many different variables. This is called “adjoint” algorithmic differentiation and it involves doing both a forward and a backward pass. The runtime of the forward method grows linearly with the number of derivatives required (O(m)), while the backwards method is essentially O(1). Let’s look again at our formula for delta. Suppose we have already done a forward pass which has generated the path values \(S_k\) for each k. The backward pass works like this:

We can already calculate \(\frac{d\mathrm{Payoff}}{dS_N}\), since it’s just the dependence of the payoff on the spot value at maturity. Of course, what we really want on the bottom here is \(S_0\). We step backwards one step by employing the chain rule like this:

\[ \frac{d\mathrm{Payoff}}{dS_{N-1}} = \frac{d\mathrm{Payoff}}{dS_{N}} \frac{dS_N}{dS_{N-1}}. \]

We can then do it again:

\[ \frac{d\mathrm{Payoff}}{dS_{N-2}} = \frac{d\mathrm{Payoff}}{dS_{N-1}} \frac{dS_N}{dS_{N-2}}. \]

Continuing, we eventually arrive at an expression for \(\frac{d\mathrm{Payoff}}{dS_{0}}\).

Automated Model Validation vs Manual Model Validation – Finding a Balance

Model validation is a strict regulatory requirement for many financial services businesses, essential for risk management and front office profitability. It’s also critical for many other industries as well. In the medical device, aviation and mining industries, model failure could not only lead to financial loss, but cost lives. In so many industries, models must be thoroughly validated both before deployment, and at regular intervals after deployment. They must also be stress tested on unusual scenarios to ensure they will remain robust.

In order to speed up and broaden the scope of model validation, many organisations are considering building automated model validation tools. In this article, we’ll look at how automated model validation differs from manual validation, and how both are an essential part of a validation ecosystem.

What is model validation?

Model validation is a process to regularly test and monitor mathematical models. This spans the conceptual soundness of the model, the correctness of the coding implementation, and the interactions between systems such as data interfaces. The context could be option pricing or risk models in finance, machine learning models to detect mineral deposits in mining, automated trading systems, or medical monitoring devices. Increasingly, AI and machine learning models need to be validated. And of course, the wide-spread use of impressive but error-prone AI tools like ChatGPT necessitates a whole new world of model validation. The process typically involves:

  • Reviewing the correctness of the mathematical methodology and documenting it
  • Assessing the any model limitations or boundaries of validity
  • Building an independent model, either of the same methodology or a different one, and ensuring the output of both models are within some acceptable tolerance
  • Monitoring the ongoing appropriateness of the model considering changes in downstream and upstream systems, and changes in the environment the model operates in
  • Stress testing the model to ensure robustness under unusual scenarios (eg financial downturns)
  • Documenting all findings

What is automated model validation?

Automated model validation is the use of software tools and frameworks to autonomously and systematically test and monitor models. Typically these tools:

  • Automatically pull in and format data from source systems (where a human validator might need to examine a GUI one number at a time)
  • Verify huge numbers of model outputs by comparing against independent model implementations

Benefits of automated model validation

A downside of automated model validation systems is they may require a significant initial investment to build. So what are the advantages?

  • Once set up, automated model validation is far less time intensive than model validation. An automated system could examine thousands of trades in a trading book, where a manual validator has time to check a dozen.
  • Automated systems make it easy to quickly retest the models after system updates, or set up regular periodic validations or continuous monitoring.
  • Scalability: automated systems can have a scale beyond the scope of human validators
  • Automated systems may free up human validators to focus more on conceptual model review
  • To the extent that less manual work is required, they may reduce costs.

Benefits of manual model validation

  • Manual validation involves reviewing the assumptions and methodology of the model, often in a changing environment. This is something which doesn’t happen at all with automated model validation.
  • How can you be sure the independent implementation used by the automated validation system is correct? To an extent, automated validation actually results in two models that need to be validated. And where both models have similar assumptions or methodologies, it’s entirely possible for the models to agree and both be wrong.
  • Manual model validators also provide another very important function – the preparation of detailed validation reports clarifying the methodology, assumptions and limitations of the model where documentation is usually sorely lacking.
  • Human validators can improve model design, not just test model outputs.
  • Credibility with the regulator: in the case of regulated industries, human expert analysis may be considered essential.

Why a hybrid approach is best practice

The best approach to model validation is a hybrid approach, combining manual model validation with automated model validation. This is where manual model validators carefully check model assumptions, work to improve and maintain the accuracy of the automated model, and produce high quality documentation about the models. At the same time, the automated system allows for much more frequent and large scale testing and monitoring of trades and models. Having found serious errors in models that had already been validated several times, this author can tell you that the need for expert validators will never go away.

We offer model validation consulting services, including both manual model validation and the design of automated model validation systems. Contact us to learn more.

Vendor Trading Systems vs. Building In-House: Pros and Cons

Choosing between a vendor trading system and an in-house solution is one of the most important technology decisions banks, trading firms, and hedge funds make. This article explores the trade-offs and helps decision-makers navigate this complex choice – build or buy?

Should you build your own quantitative trading system in-house, or to pay ongoing licensing fees for a vendor trading platform such as Murex or Calypso? This decision has profound implications for the cost, flexibility, and competitiveness of your trading operations, and it’s not a decision that can be easily changed in the future.

Vendor trading systems will likely be faster to market, with faster access to extensive functionality, and models that have undergone many cycles of validation already. But any functionality you don’t need simply adds unnecessary complexity. Systems like Murex have a significant learning curve to set up and configure, with Murex technical support staff required to be hired at your firm. This is in additional to technical support staff at the vendor that you will be billed for. Adding your own models and functionality may be harder, more expensive or not possible at all. Model development and bug fixing is likely to be much faster with all source code in-house, leading to a more agile business. Model documentation can be maintained in-house, instead of being at the mercy of how conscientious the vendor has been with their documentation.

For regulated entities, vendor systems can make model validation and regulatory reporting more complex. Without source code access, determining what models are doing or how they are failing becomes a complex business problem, and demonstrating compliance to auditors and regulators more time consuming, which increases cost and operational risk.

In terms of cost, developing your own system might cost more initially, but vendor fees and hidden charges can easily be underestimated. These will have no expiry and will eventually accumulate to more than the cost of building your own system. Once you have signed on, moving away from the vendor setup would be so expensive that you’re at their mercy in terms of fees.

The interests of the vendor and your own firm are never entirely aligned. Vendors are juggling hundreds of clients. When engaging in model validation of the vendor systems, I’ve noticed that vendors are often reluctant to admit their systems have a fault. In a particularly egregious case, I saw a vendor deliberately obstruct model validation staff in order to avoid admitting there was a serious problem with their model. After months of fighting with the vendor, they finally fixed the fault and whole book got revalued.

Vendor Trading Systems (e.g., Murex, Calypso)

Pros:

  • Speed to market: Because vendor systems are already built, you will likely be operational sooner.
  • Broad functionality: Immediate access to a wide range of models and configuration options, including derivative pricing models and market risk models.
  • Model confidence: The models of very large vendors have been repeated validated by their clients, reducing the likelihood of faults.

Cons:

  • Very high licencing and maintenance costs: The cost of signing on is only the beginning, as vendors charge further fees for support staff, unlocking more advanced models or modules, and even fixing faults.
  • Vendor lock-in: once you have heavily invested in the vendor’s systems, it’s too expensive to change to another option. This gives you little negotiating power with the vendor.
  • Less flexibility: You’re largely limited to the existing functionality the system and any inherent design limitations.
  • Integration: The vendor systems must be integrated with your own
  • Unused Complexity: the broad functionality needed for the vendor’s many clients makes configuration and use more complicated, and increases the risk of model misconfiguration.
  • Model documentation: Although vendors provide documentation, the quality of this can vary. With no direct access to the source code, it can be challenging to determine what the models are doing.
  • Politics and misaligned interests: The vendor may be more focused on extracting money, juggling many other clients, and avoiding admitting to any faults with their systems, than in helping your business.

Building an In-House Trading System

Pros:

  • Fully customizable: An in-house system is tailored for exactly your requirements
  • Streamlined: No unused functionality adding complexity
  • Unique or cutting-edge: Instead of using the system everyone else is using, you can innovate for a competitive advantage.
  • Agility: Rapid implementation of fixes and new features with fewer layers of bureaucracy and go-betweens.
  • Cheaper: Likely cheaper in the long run, but with significant upfront costs.
  • Model documentation: you have direct access to the source code when questions arise about the behaviour of the models

Cons:

  • Longer time to market: The vendor solution can likely be operational sooner.
  • Required expertise: You may require more quantitative expertise in-house. However, you could alternatively partner with a suitable consulting firm.

At Genius Mathematics Consultants, we specialize in providing quant support to help firms design and implement in-house trading systems. Our consulting services span quantitative and algorithmic trading, derivative pricing models, and risk model design such as Market VaR calculations. We ensure that your technology not only runs efficiently but also meets regulatory standards. We bring the ability to translate cutting-edge research into reliable production code. By partnering with us, you gain access to flexible quant expertise that supports your team in building systems tailored to your strategies—without the cost and complexity of maintaining in-house quant staff. To learn more about how your business can partner with our consultants, Contact Us.

Ridge regression for Statistical Arbitrage in Crypto (pairs trading)

I previously wrote an article for Dorian Trader which explored using ridge regression to develop a trading strategy. The article showed how ridge regression helped prevent overfitting as compared to regular regression, and was often uncannily good at predicting the peaks and troughs of trading data. See the original article here.

I also created a follow up article for Crypto News testing the same strategy on cryptocurrencies.

In those articles, the strategy was limited to only using a single asset. That is, the signals used as inputs were calculated only from the price history of the same asset that we were trading. In this article, we’ll explore whether the price history of closely related assets can contain valuable predictive information. This could be, for example, stocks from a similar industry, or different crypto currencies, both of which tend to move together. This is usually called statistical arbitrage or pairs trading (but note that despite the name, one could do it with any number of related assets, not necessarily two).

The basic mechanism

Pairs trading or statistical arbitrage makes use of the tendency for two assets to move together. It may mean that when one asset moves, the other is more likely to follow suit. It may also mean that the further apart the two assets become, the more likely they are to converge together again.

When we applied ridge regression to the single asset case, we used moving averages and regression lines of various lengths as our main signals. Here, we will do the same, except that we will have an additional set of signals generated from the second asset. We will use three moving averages and three regression lines of different lengths, so the regression has a choice of using whichever is more predictive. You could also use more than three, especially initially, if you are unsure of how much time might elapse between divergence and convergence of the related assets.

The mechanism here is that by including moving averages of different lengths, a linear model like ridge regression can implicitly include their differences as well, so that it can capture how much each asset has diverged (up or down) compared to its earlier values. It also implicitly includes the difference between the divergence of the first asset and the divergence of the second asset, so that it can capture how much one asset has diverged relative to the other.

If we believe that there is a lag between divergence and convergence, we could consider introducing a lag between the signal calculation time and the time at which the ridge regression tries to predict the price. However, by including signals like moving averages calculated over a variety of time windows, the model will be able to automatically calculate a few lagged moving averages anyway (as these are a linear combination of existing moving averages).

An additional issue springs to mind in the two asset case, is that the two assets may have very different scales, and the ratio between the two could drift over time. For this reason, one might consider taking various ratios of the signals and including them as additional input signals:

  • For each asset, we can generate additional signals by taking the ratios of moving averages of different lengths, and the ratios of regression lines of different lengths.
  • We’ll also add the ratios of the same signal between the two assets. This will allow the ridge regression to track how much the the assets have diverged in terms of their ratio.

The reality is, if the data window is not too large, the scales of the assets won’t change too much over the data set, and the relevant ratios can be assumed constant and will be automatically captured by the coefficients chosen by the ridge regression. So one might make a case that it’s not necessary to include the ratios when fitting over a relatively small time window (we will use one week). However, I’ve chosen to include all the mentioned ratios anyway. If nothing else, being able to see the coefficients for these signals provides more insight into how the ridge regression is working.

At the moment the model trains only on the most recent week of data. Although more data might seem better, more recent data is more relevant and allows the model to adapt to recent market conditions.

Cryptocurrencies

We use one week of data ending on the 14th of August 2025, with the test set being approximately the final three hours of data.

It’s most likely that movements in the big coins are going to presage movements in the smaller coins, rather than the other way round. For this reason it’s most sensible to use BTC as an input when trying to predict ETH. Below we show the coefficients that the ridge regression has assigned to each of our signals.

RegLine5 = -0.707
RegLine20 = 0.509
RegLine50 = -0.403
MovAvg5 = -0.323
MovAvg20 = 0.185
MovAvg50 = 1.127
MovAvg5/20 = -0.442
MovAvg5/50 = 1.025
MovAvg20/50 = -0.674
RegLine5/20 = 0.055
RegLine5/50 = -0.205
RegLine20/50 = 0.129
Close_BTC-USD = 0.014
RegLine5_BTC-USD = -0.046
RegLine20_BTC-USD = 0.542
RegLine50_BTC-USD = -0.142
MovAvg5_BTC-USD = -0.051
MovAvg20_BTC-USD = -0.192
MovAvg50_BTC-USD = -0.235
MovAvg5/20_BTC-USD = 0.249
MovAvg5/50_BTC-USD = -0.496
MovAvg20/50_BTC-USD = 0.291
RegLine5/20_BTC-USD = 0.096
RegLine5/50_BTC-USD = -0.18
RegLine20/50_BTC-USD = 0.117
MovAvg5/5 = -0.555
MovAvg20/20 = 0.139
MovAvg50/50 = 0.116
Strategy profit = 107.596
Buy Hold profit = 19.161

The first 12 signals are all calculated from ETH price data. They consist of the linear regression lines (across three windows of different lengths), moving averages, ratios of the moving averages and ratios of the linear regression predictions. The following 12 signals are identical except calculated using BTC price data. The final three signals are the ratios of the the ETH moving averages to the BTC moving averages.

Since ridge regression tries to make the coefficients small when it doesn’t harm the predictive power too much, and the signals are scaled prior to fitting, one can to an extent judge the significance of each signal by the size of the coefficient.

  • The most significant ETH signals are the 50 minute moving average, and the ratio between the 5 and 50 minute moving averages. The other moving average ratios, and the regression lines, are also significant.
  • The BTC signals are less significant than the ETH signals with smaller coefficients, as you would expect. The most significant are the 20 minute regression line, and the ratio between the 5 and 50 minute moving average.
  • The ratio between the 5 minute BTC and 5 minute ETH moving average also makes a contribution.

Looking at the plot, we see what we often see with ridge regression – it can be very good at placing sell markers at the peaks, and buy markers at the troughs. This gives a visual representation of why the strategy profit is 107.6 while the buy and hold profit is only 19.2.

But how much impact is the second asset, BTC, having here?

Isolating the impact of the second asset

We naturally want to know how much predictive power is coming from the second asset (BTC), and how much is coming from ETH’s own price history. We can see from the size of the coefficients that the ETH price history is more significant than the BTC price history.

To further evaluate this question, we can omit the BTC signals and see how much the prediction / profitability degrades.

Now when we do this exercise, it’s important that we make the test and fit sets the same. This is because when the test set is outside of the data seen during fitting, it can be totally arbitrary and could favour any one of the models simply by chance. This makes comparison of different models very difficult. This might be mitigated by using a very large test set, which would have to be assumed to be representative of all future possibilities. However, this requires that a much larger amount of data be obtained, and the fact that market regimes change over time raises it’s own challenges. By using overlapping test and fit sets, we can test which of the three models can match the data most successfully, albeit without considering potential overfitting. In this case, I’ve used the entire week’s dataset for both testing and fitting (which is why the numbers are different from the previous section). For reference, the buy and hold profit was 945.3 in this case.

  • Fit and backtest the model using only ETH signals – Strategy profit = 1343.0, Rsquared = 0.011
  • Fit and backtest the model using both – Strategy profit =1556.5, Rsquared = 0.014

These numbers make a possible case that including the BTC signals improves the strategy. However, the case isn’t water tight due to the fact the including additional signals increases the opportunity for overfitting.

More than two assets

Statistical arbitrage, or the principles behind pairs trading, can work with any number of assets. For example, one could use a whole basket of stocks from the same industry. In the case of crypto trading, we can consider using other coins in our prediction. Movements in smaller coins are unlikely to have as big an impact as Bitcoin on the price of ETH. Let’s try the exercise again including the next four largest coins by market capitalization: XRP, BNB, SOL and DOGE.

Using five coins as additional signals: Strategy profit = 2051.7, Rsquared = 0.015

The strategy profit is distinctly higher than when using only ETH, and when using ETH and BTC as inputs. All of the coins had signals with decent sized coefficients, suggesting that the ridge regression found them significant. However, as mentioned already, overfitting is a concern here.

Risks of multi-asset strategies

It’s important to consider that a strategy involving the signals of other assets may be more dangerous than one involving only one asset. This is because although a group of assets may often move similarly, there’s the possibility for their behaviour to significantly diverge for periods of time. For example, two crypto coins may often move similarly as they are both affected by general sentiment about crypto. But if one coin were to experience significant growth or decline based on events pertaining only to that asset, the relationship may cease to hold. Safeguards need to be added to the strategy to enable it to disregard the signals of other assets under certain circumstances.

Option Pricing using Neural Networks

I recently wrote an article for Dorian Trader on using a neural network to replicate option pricing models. The idea behind using neural networks in option pricing, is that while the original numerical model might be slow, the neural network (once trained) is lightning fast at pricing options.

In that article, I fitted a neural network to the vanilla Black-Scholes formulae as a proof-of-concept. I used ChatGPT to generate an initial piece of code, before tweaking it myself to improve the accuracy and configure the output how I wanted it. So this exercise is also an interesting demonstration of AI-assisted coding.

You can read the full article here

Below I’ve attached the python code that I used. The code trains the neural network using your Nvidia GPU if you have one, else it uses the CPU.

There’s many parameters you can alter, including the number of options to use in fitting, number and size of the layers in the neural network, and number of epochs to train for.

An interesting exercise would be to try to modify the code to work with more complex options, such as barrier options or American options.

Python Code

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

# Black-Scholes formula for European call options
def bs_price(S, K, T, r, sigma):
    from torch.distributions import Normal
    sqrtT = torch.sqrt(T)
    d1 = (torch.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * sqrtT)
    d2 = d1 - sigma * sqrtT
    norm = Normal(0., 1.)
    return S * norm.cdf(d1) - K * torch.exp(-r * T) * norm.cdf(d2)

# Generate dataset
N = 600_000
S = torch.rand(N, 1) * 100 + 1
K = torch.rand(N, 1) * 100 + 1
T = torch.rand(N, 1) * 2 + 0.01
r = torch.rand(N, 1) * 0.1
sigma = torch.rand(N, 1) * 0.5 + 0.01
prices = bs_price(S, K, T, r, sigma)

# Transform the data so S and T are one
K2 = K / S
T2 = torch.ones_like(K)
sigma2 = sigma * torch.sqrt(T)
DF = torch.exp(-r * T)
prices2 = prices / S

X = torch.cat([K2, T, r, sigma2, DF], dim=1)
y = prices2

train_size = int(0.8 * N)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

batch_size = 512
dataset = TensorDataset(X_train, y_train)
loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

# Network configuration
hidden_size = 256
num_layers = 6

def build_model(input_dim, hidden_size, num_layers):
    layers = [nn.Linear(input_dim, hidden_size), nn.ReLU()]
    for _ in range(num_layers - 1):
        layers += [nn.Linear(hidden_size, hidden_size), nn.ReLU()]
    layers.append(nn.Linear(hidden_size, 1))
    return nn.Sequential(*layers)

class BSNet(nn.Module):
    def __init__(self):
        super(BSNet, self).__init__()
        self.model = build_model(5, hidden_size, num_layers)
    def forward(self, x):
        return self.model(x)

# Initialize
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net = BSNet().to(device)
optimizer = optim.Adam(net.parameters(), lr=1e-3, weight_decay=1e-6)
criterion = nn.MSELoss()

# Training loop
epochs = 200
for epoch in range(1, epochs + 1):
    net.train()
    running_loss = 0.0
    for xb, yb in loader:
        xb, yb = xb.to(device), yb.to(device)
        preds = net(xb)
        loss = criterion(preds, yb)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * xb.size(0)
    epoch_loss = running_loss / train_size
    if epoch % 50 == 0:
        print(f"Epoch {epoch}/{epochs}, Training MSE: {epoch_loss:.10f}")

def eval_test():
    net.eval()
    with torch.no_grad():
        X_device = X_test.to(device)
        preds = net(X_device)
        return criterion(preds, y_test.to(device)).item()

test_loss = eval_test()
print(f"\nTest MSE after {epochs} epochs: {test_loss:.10f}")

def price_option(S_val, K_val, T_val, r_val, sigma_val):
    net.eval()
    with torch.no_grad():
        valsbs = torch.tensor([[S_val, K_val, T_val, r_val, sigma_val]], device=device)
        vals = torch.tensor([[K_val/S_val, T_val, r_val, sigma_val*torch.sqrt(torch.tensor(T_val)), torch.exp(torch.tensor(-r_val*T_val))]], device=device)
        nn_p = net(vals).cpu().item()*S_val
        bs_p = bs_price(valsbs[:,0:1], valsbs[:,1:2], valsbs[:,2:3], valsbs[:,3:4], valsbs[:,4:5]).cpu().item()
    print(nn_p, bs_p)

print("\nTesting on some options:")
price_option(55, 60, 1.2, .03, .12)
price_option(340, 330, 1.4, .02, .15)
price_option(131, 131, 0.6, .02, .1)

Changes of Measure in Finance and why Forward Rate Agreements don’t Require a Convexity Adjustment.

Is measure theory really needed in finance?

Measure theory is an area of mathematics that was created to generalise the theory of integration to more exotic kinds of functions. It has applications in pure mathematics, particularly to the theory of differential equations – see, for example, Sobolev spaces. At some point, finance academics started using the language of measure theory when writing papers and textbooks about financial derivative pricing. However, the field of measure theory isn’t really necessary in finance because in almost all cases the functions that arise in derivative pricing in practice are nice, smooth functions that don’t require considerations from measure theory like sigma algebras. Couching derivative pricing in the language of measure theory simply makes it difficult for non-mathematicians to understand things which they would otherwise be able to understand.

The main concept from measure theory which is useful in derivative pricing is that of a “measure”. However, we only need a greatly simplified concept of it. In finance, a measure is basically just a smooth function which is used to weight or assign probabilities to different outcomes.

Futures convexity corrections

In this section, we discuss the convexity correction which accounts for the difference between forwards and futures prices. Later, we will identify another convexity correction that is required for late or early payment of interest (such as for swaps in arrears).

In this discussion, everything is with respect to the risk neutral measure, which takes into account investor’s risk preferences so that by definition expectations agree with market prices.

Consider modelling the short rate (instantaneous interest rate) \(r\) by a stochastic differential equation with time dependent drift \(\theta(t)\):

\[dr = \theta(t)dt + \sigma dz.\]

The forward rate \(F(t,T,T + \tau)\) between \(T\) and \(T+\tau\), as seen from time \(t\), is some function of the values of \(r(t)\) at each time \(t\). You’ll remember from calculus that when you want to find the rate of change of a function of a function, you use the chain rule, which generates an additional term. In Stochastic calculus, the chain rule is known as Ito’s Lemma, and it actually has three terms – it has an additional second order (or convexity) term that you don’t get in ordinary calculus. So, a function of a stochastic variable obeys its own stochastic differential equation which we can generate using Ito’s lemma. Because of the new third term, this new stochastic differential equation can have a drift term even if the original equation does not. In other words, even if \(\theta(t) = 0\), the forward rate equation can have a drift term, so that it’s expected value at time \(T\) is not the same as it’s value now at time \(t\). The precise equation for the expected future value is derived in this addendum to Hull.

We’ve seen that the forward rate is not a martingale under the risk neutral measure. In other words, it is not equal to its future expected value because the equation describing it has a drift term. By contrast, futures are martingales and are equal to their future expected values. The reason for this is simple. Because Futures are margined daily, their market value is by definition zero. So, since they cost nothing to buy, their expected future value, with respect to the risk neutral measure, must be zero as well. This means that the expected value tomorrow, or any future day, must be equal to today’s value.

This means that the equation derived in Hull for the expected future value of the forward, is also equal to the convexity correction between forwards and futures prices.

Change of numéraire

Changing the numéraire means changing the unit of measure used to value financial instruments. For example, changing to a different currency, or changing to the value of the same currency at some future time after it has appreciated due to the time value of money. We will refer to the wikipedia article on numeraires.

We consider the value of an asset \(S\) in terms of a numéraire \(M\). We assume the existence of a so-called risk neutral measure \(Q\) under which asset prices are martingales, that is the expected values at time \(T\) are the same as their current values:

\[\frac{S(t)}{M(t)} = E_Q \left[ \frac{S(T)}{M(T)} \right]\]

We define a new measure \(Q^N\) at time \(T\) by weighting all probabilities by the factor \(\frac{N(T)}{M(T)}\). Then we have the formula

\[E_{Q^N} \left[ \frac{S(T)}{N(T)} \right] = E_Q \left[\frac{N(T)}{M(T)} \frac{S(T)}{N(T)} \right] / E_Q \left[ \frac{N(T)}{M(T)} \right]\]

To understand this, note that when we switch back to calculating the expectation with respect to \(Q\), we have to add the weighting (or scaling) factor inside the expectation. In addition, because on the LHS everything is in terms of the numeraire \(N\), we have to have the division on the RHS so both sides are in the same numeraire (imagine a currency change). Note also that the expectation in the denominator is simply equal to the same expression evaluated at time \(t\), using the martingale property above.

The forward measure

Now we consider a particular example that we will need shortly. The time value of money means that a dollar today is worth more than a dollar in the future, leading to the use of discounting in pricing financial instruments. We let \(P(t,T)\) be the value at time \(t\) of a zero coupon bond which pays a cashflow of \(1\) at time \(T\) (effectively, this is the discount factor between \(t\) and \(T\)). Let’s choose a change of numeraire given by \(N(t) = P(t,T)\) and \(M(t) = P(t,T+\tau)\). Then we have

\[ \frac{N(t)}{M(t)} = \frac{1}{P(t,T,T+\tau)} = E_Q \left[ \frac{N(T)}{M(T)} \right], \]

where \(P(t,T,T+\tau)\) represents the forward rate as seen from time \(t\), and we have used the martingale property described above.

The new measure associated with this numeraire change is called the \(T + \tau\)-forward measure, and expectations with respect to this measure are denoted by \(E^{T + \tau}\).

Vanilla swap pricing and the forward measure

For a vanilla swap, the rate fixes at the start of each period, but the payment is settled at the end of each period.

Pricing a vanilla swap requires no complex modelling, as the valuation just involves computing the interest payments as if the future interest rates were equal to their current forward values. This is possible because the expected future value of the forward rate (with respect to the forward measure) is equal to its present value. In technical language, the forward rate is a “martingale”.

Consider a swap period starting at time \(T\) and ending at time \(T + \tau\), as seen from the current time \(t\). Let \(F(t,T,T + \tau)\) denote the forward interest rate over this period. The Martingale property means that

\[E^{T + \tau}\{F(T,T,T+\tau|\mathcal{F}_t\}=F(t,T,T+\tau).\]

The “filtration” \(\mathcal{F}_t\) simply indicates that the expectation is as seen from time \(t\), and we neglect it in what follows. The notation \(E^{T + \tau}\) means that the expectation is taken with respect to the (\(T + \tau\))-forward measure.

A clear explanation of this fact can be found in Brigo & Mecurio [1]. Rearranging the definition of a (simply compounded) forward rate slightly, we have

\[F(t,T,T + \tau)P(t,T + \tau) = \frac{1}{Y(T, T+\tau)}(P(t,T) – P(t,T+\tau)),\]

where \(Y\) represents the year fraction.

We now consider taking the expectation of both sides. Because of the discount factor on the LHS which scales all possible forward values by a constant factor, we are taking the expectation of the forward rate with respect to the (\(T + \tau\))-forward measure. On the other hand, the RHS is just the difference between the today-prices of two zero coupon bonds of different maturities. Since the latter are real assets trading in the market, their expected value must be equal to their current value. It follows that the same is true for the left hand side, i.e. the forward rate scaled by the discount factor.

Swap in arrears pricing

Swaps in arrears differ from vanilla swaps in that the both fixing and settlement occur at the start of each period. In this case, the expected future interest rate is no longer equal to the current forward value, necessitating a so-called convexity correction. The value of a swap in arrears is given by

\[V(t) = P(t,T) E^T \left[ F(t,T,T + \tau) \right].\]

For swaps in arrears, the expectation is taken at \(T\) instead of \(T + \tau\), so we can’t use the martingale property from the previous section. Using the change of numéraire formulae in a previous section, we can change to the (\(T + \tau\))-forward measure which introduces a discount factor inside the expectation as follows:

\[V(t) = P(t,T) E^{T+\tau} \left[ \frac{F(t,T,T + \tau)}{P(T,T+\tau)} \right] P(t,T,T+\tau)\]

\[= P(t,T+\tau) E^{T+\tau} \left[ \frac{F(t,T,T + \tau)}{P(T,T+\tau)} \right].\]

Because of the factor of \(1/P(T,T+\tau)\) inside the expectation, we can’t use the martingale property of forward rates here.

Attempting the quantify the magnitude of this correction is known as the “convexity correction”. It’s an amount that gets added to the forward rates before using them to price swaps, and calculating it requires that you first make a choice of stochastic interest rate model.

Why Forward Rate Agreements (FRAs) don’t require a convexity adjustment.

A FRA is similar to a single period swap, but not quite. A key difference is that the rate is fixed in advance, at the start of the accural period, just like a swap in arrears. However, unlike a swap in arrears, FRAs do not require a convexity adjustment, and one naturally wonders why. The answer lies in the second key difference between a FRA and a swap – for a FRA, the floating leg is discounted by the forward floating rate between the start and end dates of the accrual period. This discount factor, as seen from time \(t\), is

\[ P(t,T,T+\tau) = P(t,T+\tau)/P(t,T).\]

The value of a FRA is given by

\[V(t) = P(t,T) E^T \left[ F(t,T,T + \tau) P(T,T+\tau) \right],\]

which differs from a swap in arrears by the discount factor inside the expectation.

Note that if we now change to the (\(T + \tau\))-forward measure, as we did above for swaps in arrears, the new discount factor \(P(T,T+\tau)\) would exactly cancel with the change of measure \(1/P(T,T+\tau)\). Thus, FRAs don’t require a convexity correction.

Girsanov’s theorem

We’ve seen above how forwards are martingales under the forward measure, but have a drift term under the risk neutral measure. So it seems like one could get rid of a drift term by an appropriate measure, or conversely, change measure at the expense of adding a drift term. Girsanov’s theorem is the formalization of this idea.

[1] Brigo & Mecurio, Interest Rate Models – Theory and Practice, Springer Finance 2007

Perpetual Options on Cryptocurrencies

What are perpetual options?

Everyone familiar with crypto trading will have heard of perpetual futures. These products are defined by regular settlements, say once per day at times \(t_1, t_2, t_3,…\), where the holder pays the amount \(V(t,S(t)) – P(S(t))\). Here, \(P\) is the payoff of the future, given by \(P(S) = S\). The function \(V\) is the current price at which the perpetual is trading on the market.

Perpetual options come about by simply replacing the payoff P in the above construct with the payoff off an option. That is, \(Max(S-K, 0)\) for a call option and \(Max(K-S, 0)\) for a put option.

For each daily settlement, you can view \(P\) as the cashflow you receive, and \(V\) as the cashflow you have to pay. The function \(V\), known as the funding fee, is often viewed as a penalty which forces the value of the perpetual to converge towards the payoff as the time approaches each daily expiry. Otherwise, the holders of the perpetuals are penalized by having to pay more than what they receive. As we’ll see, another way to view it in the case of a perpetual option, is as the premium of a portfolio of options.

One of the arguments used to motivate perpetuals is that they prevent the available liquidity from being divided among options of many different expiries. Another is that they avoid the inconvenience and spread fees of having to manually roll contracts as each option expires.

Perpetual options were originally discussed in this short paper.

Perpetual options as a portfolio of options

Let’s just consider for a moment the second part of the settlement only. Since at each time \(t_i\) we receive the payoff \(P_i = P(S(t_i))\), we can view this collection of cashflows as exactly that from a portfolio of options with expiries \(t_1, t_2, t_3,…\).

It makes sense therefore that the payments \(V_i = V(t_i,S_i)\) can be conceived of as the premium payments corresponding to the option portfolio. Now pay attention to the following fact: since at each time \(t_i\) the holder of the perpetual pays the current value of the option portfolio, this means that the notionals of all remaining options must double.

about the payment  as the option premium payment, since the investor pays the current value of the portfolio, it’s clear that the notional of the options must double at each payment time (excluding the one that’s expiring at the same time, of course). Since the kth option notional will be doubled one more time than the (k-1)th option, each successive option in the portfolio will have half the notional than the last one. In other words, the notionals must follow a geometric series with ratio ½.

Continuous funding

The notional halfing each time you move to the option with an expiry one day longer means that the notionals obey an exponential relationship. Instead of using a base of one half, we can use a base of \(e\) as long as we introduce a constant out the front. Thus, assuming our perpetual is a call, we could write the price of our perpetual (option portfolio) as

\[V = A\sum_{i=1}^{\infty} e^{-t_i/T} C_{t_i},\]

for some constant \(A\), where \(C_{t_i}\) is the price of a call option with expiry \(t_i\), and \(T\) is the interval between option expiries (assumed to be one day in our discussion above). If we now imagine decreasing the spacing of the options so that the sum becomes an integral (while leaving \(T\) unchanged), we get what’s known as the continuous funding case:

\[V = A\int_{0}^{\infty} e^{-t/T} C_{t}dt\]

Pricing models for perpetual options

Since the product can be conceived of as a (discrete or continuous) collection of options, the problem of pricing reduces to the problem of pricing options of different maturities. Assuming a Black-Scholes framework, this reduces to the problem of  fitting an arbitrage free volatility surface to whatever implied volatilities exist in the market for vanilla options. Algorithms exist that can find the closest (least squares) fit subject to no arbitrage conditions. If there is limited data, a parametric approach could be considered, as these have far fewer degrees of freedom than a surface obtained by interpolating between options which exist in the market. At the more complex end of the scale, local and stochastic vol models are both parametric approaches but could be overkill.

As usual, volatilities for options whose strikes or expiries fall in between those in the market are generated by interpolating the volatility surface. For example, interpolating between time 0 and the first option can be done by assuming the volatility surface is linear in variance (\(V^2T\) interpolation).

Continuously funded everlasting options would need to be priced as a numerical integral, with options of different expiries being assigned different volatilities from the surface.

Variants where the strike is a moving average of recent prices can also be handled using the corresponding option pricing formulae for these products.

In other words, as long as you have the volatility data and model required to price regular options, perpetual prices follow without too much difficulty.

In terms of evaluating the integral above, the exponential weighting term means that we should only need to consider about 10 funding intervals (each of length \(T\)) before the contribution of additional terms becomes negligible. The integral can then be carried out by approximating it by a finite number of steps as usual.

How do the price and greeks of perpetual options compare to vanilla options?

The price/greeks of an everlasting are going to be a weighted sum of the prices/greeks of options of increasing expiries. It should be noted that the price of an everlasting will depend on how the volatility of an option depends on expiry according to the chosen vol surface. For simplicity, I’ve assumed the volatility surface is flat. If the funding period is short such as a single day, assuming volatility has no term structure may be reasonable.

To see how the price and greeks of a perpetual option relates to that of a single vanilla option with a single expiry, I’ve used python code to produce some graphs.

Focusing on calls, I’ve plotted the prices of an 8 hour, strike 100 call. I’ve set the volatility to 80% to accentuate the difference between the two prices which is otherwise hard to see.

The two prices are surprisingly close. The vanilla is slightly more valuable near at the money, and slightly less valuable in the wings. This is seen more clearly when plotting the difference between the two prices:

Since the price graphs are very similar, the Greeks will be very similar as well. We plot the delta difference:

The gamma difference:

And the vega difference:

The patterns are interesting, but their small magnitude wouldn’t have much impact on a standard market risk or margin amount (VaR) calculation, for which approximate values of greeks would be sufficient. Thus the risk profile is very similar to that of a dated option with maturity equal to the length of the funding period.

Hedging perpetual options

Viewing it as a portfolio of regular options, you can hedge it using a portfolio of whatever instruments you would normally use for the regular options. However, unless you only wish to hedge the nearest few expiries, this might require purchasing many instruments. Conceptually, purchasing spot in order to delta hedge could be accomplished by either combining the spot amounts required for each of the deltas of the individual options, or purchasing an amount of spot based on the delta of the everlasting option.

Alternatively, depending on the hedge, it might be possible to hedge all expiries simultaneously using another perpetual product, if it exists.

Arbitrage with vanilla options

In the standard options market one can look for various kinds of arbitrage, including violations of put call parity and inconsistencies with the volatility surface. This includes calendar arbitrage, where an option with closer maturity is assigned an implied volatility that leads it to be more valuable than an option with further maturity, and butterfly arbitrage, where the option price as a function of strike is not convex.

If the everlasting option is viewed as a finite portfolio of options, and one of those options is mispriced compared to an equivalent dated option with the same expiry, then one simply buys one and sells the other. In someone believes that the perpetual is mispriced at many expiries, they can buy/sell a portfolio of dated options alongside buying or selling the everlasting option.

Python code

Below is the python code used for this article.

import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt

T = 1/365/3
K = 100
r = 0
V = 0.8

def Call(V, S, K, T, r):
    if T > 0:
        d1 = (np.log(S/K) + (r + 0.5*V**2)*T)/(V*np.sqrt(T))
        d2 = (np.log(S/K) + (r - 0.5*V**2)*T)/(V*np.sqrt(T))
        call = S*norm.cdf(d1,0.0,1.0) - K*np.exp(-r*T)*norm.cdf(d2,0.0,1.0)
        return call
    else:
        return S - K

def ECall(V, S, K, T, r):
    
    u = np.sqrt(1 + 8/(V*V*T))
    
    intrinsic = max(S-K,0)
    
    if S >= K:
        timevalue = (K/u)*(S/K)**(-(u-1)/2)
    else:
        timevalue = (K/u)*(S/K)**((u+1)/2)
    
    return intrinsic + timevalue

def Delta(V, S, K, T, r, Pricefunction):
    step = 0.0001*S
    return(Pricefunction(V,S+step,K,T,r) - Pricefunction(V,S-step,K,T,r)) / (2*step)

def Gamma(V, S, K, T, r, Pricefunction):
    step = 0.0001*S
    return (Pricefunction(V,S+step,K,T,r) -2*Pricefunction(V,S,K,T,r) + Pricefunction(V,S-step,K,T,r))/(step**2)

def Vega(V, S, K, T, r, Pricefunction):
    step = 0.0001*V
    return (Pricefunction(V+step,S,K,T,r) - Pricefunction(V,S,K,T,r))/step

# Price plot
plt.figure(1)
plt.xlabel('Spot', fontsize=12)
plt.ylabel('Call Price', fontsize=12)
xlist = np.linspace(90,110,300)

ylistvan = [Call(V, S, K, T, r) for S in xlist]
plt.plot(xlist,ylistvan,label='Vanilla', linewidth=1)

ylistever = [ECall(V, S, K, T, r) for S in xlist]
plt.plot(xlist,ylistever,label='Everlasting', linewidth=1)

#plt.plot(xlist[:-1],ylist,label='Everlasting')
plt.legend(loc='upper right')

# Price difference plot
plt.figure(2)
plt.xlabel('Spot', fontsize=12)
plt.ylabel('Everlasting - vanilla price', fontsize=12)
ylistdiff = [ylistever[i] - ylistvan[i] for i in range(len(ylistever))]
plt.plot(xlist,ylistdiff,label='Everlasting - vanilla', linewidth=1)

# Delta difference plot
plt.figure(3)
plt.xlabel('Spot', fontsize=12)
plt.ylabel('Everlasting - vanilla delta', fontsize=12)
ylistdeltavan = [Delta(V, S, K, T, r, Call) for S in xlist]
ylistdeltaever = [Delta(V, S, K, T, r, ECall) for S in xlist]

ylistdeltadiff = [ylistdeltaever[i] - ylistdeltavan[i] for i in range(len(ylistever))]
plt.plot(xlist,ylistdeltadiff,label='Everlasting - vanilla', linewidth=1)

# Gamma difference plot
plt.figure(4)
plt.xlabel('Spot', fontsize=12)
plt.ylabel('Everlasting - vanilla gamma', fontsize=12)
ylistgammavan = [Gamma(V, S, K, T, r, Call) for S in xlist]
ylistgammaever = [Gamma(V, S, K, T, r, ECall) for S in xlist]

ylistdeltadiff = [ylistgammaever[i] - ylistgammavan[i] for i in range(len(ylistever))]
plt.plot(xlist,ylistdeltadiff,label='Everlasting - vanilla', linewidth=1)

# Vega difference plot
plt.figure(5)
plt.xlabel('Spot', fontsize=12)
plt.ylabel('Everlasting - vanilla vega', fontsize=12)
ylistvegavan = [Vega(V, S, K, T, r, Call) for S in xlist]
ylistvegaever = [Vega(V, S, K, T, r, ECall) for S in xlist]

ylistdeltadiff = [ylistvegaever[i] - ylistvegavan[i] for i in range(len(ylistever))]
plt.plot(xlist,ylistdeltadiff,label='Everlasting - vanilla', linewidth=1)

Trading Risk, Margin Modelling and the Standard Initial Margin Model (SIMM)

To mitigate credit risk in trading, one or both parties may be required to post both initial and variation margin. While initial margin is posted by both counterparties at the inception of the trade, variation margin is periodically exchanged (daily or even intra-daily) during the life of the trade due to changes in the mark to market value of the position.

The margin protects against a counterparty defaulting if the trade doesn’t move in their favour. In the case of centrally cleared derivatives, an intermediary manages the margining requirements of the counterparties. For OTC derivatives which are reasonably standardized, central clearing may still be a regulatory requirement. However, for OTC derivatives that are bespoke and illiquid, determining appropriate margining requirements may be more challenging due to the difficulty in regular valuation of the contracts. Margining requirements have become more stringent since the GFC.

While one does not want to be inadequately protected against non-payment by a counterparty, excessive margining also imposes costs on the derivatives markets. The immediate questions is, how does one estimate the amount of margin required by a counterparty to ensure that, at some chosen statistical confidence level, they can meet their obligations? This question is the subject of this article.

While margining is often a regulatory requirement, financial services firms including proprietary trading firms and brokers/exchanges, may wish to conduct their own margin or collateral modelling to ensure they are protected against default by counterparties due to sudden changes in valuation of their trades. An example of this in the context of cryptocurrency exchanges is provided later in the article.

Margin modelling methodologies

Approaches to margin modelling may have much in common with methodologies used for market risk calculations. In particular, value at risk (VAR) and expected shortfall methods. The VAR method assume a distribution type with which to model risk factor (market data) shifts (typically a normal distribution), calibrates the distributions to the market somehow (typically historical simulation, which analyses the shifts in market variables over some specified time interval), and then calculates the 99% quantile worst market move (or some other specified confidence level) of the derivative or portfolio valuation over some given time horizon. Expected shortfall methods are a slight modification on this approach which look at the mean of all outcomes above the 99% quantile.

More frequent margin calls will reduce the credit risk (due to the shorter time horizon available for the market to move), at the expense of more operational inconvenience.

These methods require careful choice of a historical window for calibration. If only the most recent data is used, rather than a historically stressed period, it may cause over leveraging during more stable market periods, leading to higher exposure when the markets become more tumultuous.

Another question that arises is how to calculate margin requirements for an entire portfolio instead of a single trade. A portfolio may have offsetting trades, and a decision must be made about what correlations to assume and what level of diversification benefit to allow.

The ISDA standard initial margin model (SIMM)

To avoid disputes over margining amounts, it’s important to have a single, transparent and standardized approach that all parties must adhere to. The International Swaps and Derivatives Association prescribes an approach known as the standard initial margin model. The ISDA documentation for this approach can be found here.

The ISDA includes four product classes: Interest rates and FX (ratesFX), credit, equity and commodity.

Six “risk classes” are specified. These are market data categories on which trade valuations depend. They are interest rate, credit (qualifying), credit (non qualifying), equity, commodity, and FX. Each trade depends on one or more “risk factors” from each category, such as interest rate curves, equity prices and exchange rates. The model postulates that the moves in any given market data or “risk factor” are normally distributed, which is also the approach commonly taken for market risk. The changes in trade valuations are also assumed to be normally distributed, something which is not actually true for nonlinear products like options even when the risk factor shifts are normal, and which becomes less true the larger the shifts are. The valuation shifts depend on the changes in the risk factors through trade sensitivities.

We first discuss how to calculate the initial margin for an individual trade. The approach is to calculate the delta, vega and interest rate sensitivities of the trade. The ISDA has calculated numbers representing how much they believe each risk factor will move at 99% confidence, which then multiply the sensitivities. The process by which the user finds the correct number is fairly intricate, and we won’t attempt to reproduce every detail here. The approach also takes into account the ISDA’s view on correlations between risk factors.

For options, a “curvature” term is also added. It is actually the vega contribution scaled by a “scaling factor”, which represent’s the ISDA’s view on how gamma and vega are related for vanilla options.

Having calculated the initial margin for an individual trade, we now discuss how to aggregate trades to calculate the initial margin of the entire portfolio. A fantastic simplifying feature of normal distributions, is that the sum of two normal distributions is another normal distribution with some new standard deviation. The new standard deviation is a function of the old ones, calculated using what I will call the “square root rule”: \(\)

\[ \sigma_{agg} = \sqrt{\sum_i \sigma_i + \sum_r \sum_{s \neq r} \psi_{rs} \sigma_r \sigma_s},\]

where \(\psi\) is a matrix of correlations between the distributions. One can aggregate the normally distributed valuation shifts for different risk factors for any given trade, giving a total normal distribution for each trade. One can then aggregate all these normal distributions across all trades in the same way, yielding a new normal distribution corresponding to the total valuation shift in the whole portfolio..

The ISDA methodology has three differences from the procedure described above. Firstly, it applies the square root rule to the initial margins, rather than the standard deviations of the distributions as described above. Since an initial margin is just some quantile of a normal distribution distribution, it is proportional to the standard deviation, so that the square root rule applies to the initial margins just as like it does to the standard deviations.

Secondly, It’s easy to verify that for the square root rule, the order in which we aggregate all the normal distributions doesn’t matter. The ISDA approach aggregates the normal distributions in a specific order. For each product class, and within each risk class \(X\) within the product class, it first aggregates the delta margin across all trades, and similarly the vega margin, curvature margin and base correlation margin. It then aggregates these four distributions as a simple sum:

\[ IM_X = DeltaMargin_X + VegaMargin_X + CurvatureMargin_X + BaseCorrMargin_X \]

This is the third difference. This sum is more conservative than that obtained by using the square root rule, and assumes that there is no diversification benefit between these four types of moves. The ISDA may believe that in stressed conditions, these different classes of risk factors may move together.

The ISDA then uses the square root rule to aggregate these initial margins across all risk classes within the product class. Just like with the four margin types, the ISDA stipulates that no diversification benefit may be assumed between the product classes, so that the total amount of initial margin required is the sum of those for each of the four product classes:

\[ SIMM_{Product} = SIMM_{RatesFX}+SIMM_{Credit}+SIMM_{Equity}+SIMM_{Commodity}\]

That is, above the product class level, we don’t apply the square root rule but a straight sum. I’m not sure whether this is based in part on some analysis the ISDA has done concerning correlations between product classes, or is simply conservatism.

In any event, we’ve now calculated the total initial margin across our entire portfolio, as stipulated by the ISDA’s standard initial margin model.

FX sensitivities

Consider an FX trade with underlying \(S = CCY1CCY2\) and present value \(PV(S)\) expressed in CCY2. When calculating sensitivities, SIMM considers a 1% shift \(dS=.01S\). The sensitivity with respect to \(CCY1\) is defined to be the corresponding shift

\[sen_{CCY1} = dPV.\]

The sensitivity with respect to \(CCY2\) is defined to be

\[sen_{CCY1} = -Sd(PV/S).\]

In other words, we convert \(PV\) to \(CCY1\), find the shift, and then convert back. Using the chain rule for differentiation, it’s easy to show that

\[sen_{CCY1} + sen_{CCY2} = .01PV.\]

Once one sensitivity is calculated, the other can be easily found using this formula.

SIMM contribution of option premium settlement

For options, the option premium is typically settled two business days after the trade date. When a SIMM calculation is performed for an option trade before the option premium has settled, the premium cashflow is included in the calculation of the sensitivities. Since the option premium is equal and opposite to the usual PV of the trade, this means that the “total PV” including the premium payment is zero at the inception of the trade. Using the above formula, this means that the two sensitivities are equal and opposite.

The contribution that the premium payment makes to \(sen_{CCY1}\) and \(sen_{CCY2}\) depends on which currency settlement occurs in. Either way, since it is a fixed and known payment, its sensitivity to one of the currencies will be zero, and its sensitivity to the other currency will be \(.01Premium\).

Bitmex’s insurance fund

Given the high volatilities of cryptocurrencies, market risk and margin modelling is particularly challenging. When some major cryptocurrency exchanges combine this with very high leverages, managing their risk becomes particularly challenging.

On a traditional exchange, if the loss on a trade exceeds the posted margin, the exchange will insist that the trader put forward additional margin amount to cover the loss, and chase the debt if necessary.

Bitmex have taken a margining approach different to that used in traditional finance. They call this their insurance fund. In essence it means that Bitmex will liquidate your position at half margin, and appropriate the amount remaining after liquidating your position at the best available price. This money is then used to compensate for other trades by other participants which lost more than the margin amount. This means that traders which lost more than their margin are compensated, at the expense of traders who didn’t lose all of their margin.

To new participants, it seems surprising that the exchange could “steal” half your margin and keep it. However, Bitmex point out that, on the upside, traders are guaranteed to never lose more than their margin amount.