Algorithmic Differentiation: Fast Greeks in Monte Carlo Option Pricing

The problem with finite difference methods

If you’ve ever tried to calculate Greeks when pricing options using Monte Carlo, you’ve probably found that it’s both very slow, and also potentially inaccurate unless an enormous number of paths is used. There are two main reasons for this.

Firstly, calculating Greeks using finite differences requires that the pricing function be evaluated multiple times. For example, for a second order derivative like gamma, three evaluations of the pricing function are required. Monte Carlo pricers are already slow even when evaluated once, and having to evaluate them many more times to get the Greeks makes them many times slower again.

Secondly, since the step size needs to be small when calculating derivatives, the prices are likely very close together as well. When calculating the difference between two very similar values, you run the risk that you are just getting Monte Carlo noise. One could try to use a larger step size, so that the calculated difference is larger when compared against the noise, but the calculated value may then diverge from the true derivative.

You might also like to check out our article on the Longstaff-Schwarz method for pricing American options.

Algorithmic differentiation

You might have heard about algorithmic differentiation, and “adjoint” algorithmic differentiation – a technique for more efficiently and more accurately calculating Greeks for Monte Carlo pricing models. This technique allows derivatives to be calculated via repeated applications of the chain rule, without needing to evaluate the pricing function multiple times. Instead, one spends time manually (symbolically) differentiating some of the functions involved. However, you might have found that the explanations available on the internet tend to be quite abstract and much more confusing than they need to be. The purpose of this note is to give a clear and readable example of exactly how and why algorithmic differentiation works when applied to Monte Carlo options pricing.

Calculating delta and vega

The essence of a Monte Carlo option pricer is simple. The discounted payoff of the option is evaluated on each path \(i\), and then the present value \(P\) is simply the average:

\[PV = \frac{1}{N} \sum_i e^{-rT_i}\text{Payoff}(i).\]

Here \(T_i\) is the settlement date for the \(i^{th}\) path. Note that in general, for path dependent derivatives,

\[\text{Payoff(i)} = \text{Payoff}(S_0^i,…,S_N^i,T_i).\]

However, in many cases the payoff would only depend on \(S_N\).

Let’s suppose we want to calculate a derivative of \(PV\), say delta. By simply moving the derivative inside the summation we get

\[\frac{dPV}{dS_0} = \frac{1}{N} \sum_i \frac{d}{dS_0} \left( e^{-rT_i}\text{Payoff}(i) \right).\]

Let’s consider now calculating the term inside the summation for the \(i^{th}\) path, dropping the subscript \(i\) for simplicity. We start with the formula to generate the path. Each time step we calculate the value \(S\) of the underlying at the next step by some function

\[S_{n+1} = f(S_n, Z_n),\]

where \(Z_n\) is a random draw from a normal distribution. Let’s keep things simple by considering the case \(T_i = T\) for all \(i\) so we can pull the discount factor out the front. Then applying the chain rule we need to calculate

\[\frac{d}{dS_0} \left( \text{Payoff} \right) = \sum_k \frac{d\text{Payoff}}{dS_k} \frac{dS_k}{dS_0}.\]

The derivative of the payoff \(\frac{d\text{Payoff}}{dS_k}\) we’ll look at in the next sections. This term needs to be calculated individually for each kind of derivative in the trading book (this is one downside of algorithmic differentiation, as it increases the complexity of the code and the likelihood of errors). Let’s now examine the final term \(\frac{dS_k}{dS_0}\). The usual path generation calculation using geometric brownian motion is

\[S_{n+1} = S_n e^{(r-\frac{1}{2} \sigma^2) \delta t + \sigma \sqrt{\delta t} Z_n}.\]

Since the exponential term has no \(S\) dependence, this means we get simply

\[\frac{dS_k}{dS_0} = \frac{S_k}{S_0}.\]

This value can be calculated at the \(k^{th}\) time step and stored.

If we were calculating vega instead, we would get

\[\frac{d}{d\sigma} \left( \text{Payoff} \right) = \sum_k \frac{d\text{Payoff}}{dS_k} \frac{dS_k}{d\sigma}.\]

Then the final term is

\[ \frac{dS_{n+1}}{d\sigma} = S_{n+1} \left( -\sigma \,\delta t + \sqrt{\delta t}\, Z_n \right) \frac{S_{n+1}}{S_n} \cdot \frac{dS_n}{d\sigma}. \]

We have of course \(\frac{S_0}{\sigma} = 0\), and we calculate each successive value using the one before, storing the values to use at the end.

Vanilla options

The payoff of vanilla options only depends on the underlying at expiry. This means the derivatives with respect to \(S_k\) are all zero except the final one, which is

\[\frac{\text{dPayoff}_{Van}}{dS_N} = 1_{S_N > K} \]

for a call and

\[\frac{\text{dPayoff}_{Van}}{dS_N} = -1_{S_N< K} \]

for a put.

Barrier options

For barrier options we run into a problem where the derivative of the payoff with respect to \(S_k\) would seem to be zero, except for the case \(k=N\). This is because since each \(S_k\) is some finite distance from the barrier (assuming no breach yet), when you shift each one infinitesimally, you still get no barrier breach. But in reality, there is an increased probability of breach in between the time steps. To capture this, we need to use a Brownian bridge. We denote by

\[p_m = \text{exp}\left( -\frac{2(B-S_m)(B-S_{m+1})}{\sigma^2 \Delta t_m} \right) \]

the probability that the path breaches the barrier in between time steps \(\) and \(m+1\) (we assume an upper knockout barrier here). Then

\[ \text{Payoff} = \prod_{m=0}^{N-1} p_m \big(S_m,\, S_{m+1}\big) \text{Payoff}_{Van} \]

We can now easily differentiate this payoff with respect to \(S_m\) using the chain rule, a noting that we already calculated the derivatives of the second term in the previous section.

Gamma

A technical difficulty arises around using this method for gamma. If you think about the shape of the payoff of a vanilla option, it’s second derivative is zero everywhere except at the strike, where it could be thought of as infinite (or undefined). One simple way you can attempt to handle this issue is to slightly smooth the payoff function near the bend. Another approach is to not differentiate the payoff at the final step N, but instead differentiate it’s expectation one step earlier (which is simply the Black Scholes formula over one time step). This trick is sufficient to smooth out the kink in the payoff.

Adjoint Algorithmic Differentiation

There’s a slightly different approach which is more computationally efficient when you want to calculate a large number of derivatives with respect to many different variables. This is called “adjoint” algorithmic differentiation and it involves doing both a forward and a backward pass. The runtime of the forward method grows linearly with the number of derivatives required (O(m)), while the backwards method is essentially O(1). Let’s look again at our formula for delta. Suppose we have already done a forward pass which has generated the path values \(S_k\) for each k. The backward pass works like this:

We can already calculate \(\frac{d\mathrm{Payoff}}{dS_N}\), since it’s just the dependence of the payoff on the spot value at maturity. Of course, what we really want on the bottom here is \(S_0\). We step backwards one step by employing the chain rule like this:

\[ \frac{d\mathrm{Payoff}}{dS_{N-1}} = \frac{d\mathrm{Payoff}}{dS_{N}} \frac{dS_N}{dS_{N-1}}. \]

We can then do it again:

\[ \frac{d\mathrm{Payoff}}{dS_{N-2}} = \frac{d\mathrm{Payoff}}{dS_{N-1}} \frac{dS_N}{dS_{N-2}}. \]

Continuing, we eventually arrive at an expression for \(\frac{d\mathrm{Payoff}}{dS_{0}}\).