# Machine Learning Methods in Finance @ KAUST¶

## From Signatures to Reinforcement Learning¶

Christian Bayer, Christoph Belak, Blanka Horvath, Paul Hager$\ast$

$\ast$ Humboldt University of Berlin

1. May 2022

## Outline of the Lecture¶

This lecture will be a practical guide to:

1. calculating the truncated log-signatures for piecewise linear paths,
1. implementing deep signature stopping rules,
1. approximating the optimal stopping problem of fractional Brownian motion

## 1. Calculation of Log-Signatures¶

Notation: $\{\color{blue}{e_1}, \dots, \color{blue}{e_d}\}$ will denote a basis of $\mathbb{R}^d$.

### 1.1 Recap of Signatures of Smooth Paths¶

Recall that the signature of a smooth path $x: [0,T] \to \mathbb{R}^d$ is given by the sequence of iterated integerals as a formal tensor series

$$\mathbb{x}^{<\infty}_{s,t} = 1 + \int_s^t dx_{t_1} + \int_s^t \int_s^{t_2} dx_{t_1} \otimes dx_{t_2} + \dots \quad \in \mathbb{T}((\mathbb{R}^d)),\quad (0 \le s \le t \le T).$$

In other words, the component of the signature corresponding to the multiindex or word $\color{blue}{i_1} \cdots \color{blue}{i_n} \in \{\color{blue}{1}, ..., \color{blue}{d}\}^n$ is given by

$$\langle \color{blue}{i_1} \cdots \color{blue}{i_n}, \mathbb{x}^{<\infty}_{s,t}\rangle = \idotsint_{s \le t_1 \le \dots \le t_n \le t} dx^{\color{blue}{i_1}}_{t_1} \dots dx^{\color{blue}{i_n}}_{t_n}$$

Furthermore, we have also seen that the signature satisfies the Chen relation

$$\mathbb{x}^{<\infty}_{s,t} = \mathbb{x}^{<\infty}_{s,u} \otimes \mathbb{x}^{<\infty}_{u,t},\qquad (0 \le s \le u \le t \le T)$$

and the shuffle identity

$$\langle w, \mathbb{x}^{<\infty}_{s,t}\rangle \langle v, \mathbb{x}^{<\infty}_{s,t}\rangle = \langle w \sqcup\mathchoice{\mkern-7mu}{\mkern-7mu}{\mkern-3.2mu}{\mkern-3.8mu}\sqcup v, \mathbb{x}_{s,t}\rangle, \qquad w, v \in \mathcal{W}_d.$$

The elements of of the tensor algebra satisfying the shuffle relation form a group - the free Lie group $G(\mathbb{R}^d)$ its tensor-logarithm $\mathfrak{g}(\mathbb{R}^d) = \log(G(\mathbb{R}^d))$ is the

### 1.2 The Free Lie algebra¶

The free Lie algebra can be characterized in another way.

Define the bilinear commutator bracket

$$[\cdot, \cdot]: T((\mathbb{R}^d)) \times T((\mathbb{R}^d)) \to T((\mathbb{R}^d)), \qquad [a, b] = a\otimes b - b \otimes a.$$

and note that it satisfies the following relations:

• (antisymmetry) $\quad[a,b] = -[b,q]$
• (Jacobi identity) $\quad [a,[b,c]] + [b, [c, a]] + [c, [a, b]] = 0$

The free Lie algebra $\mathfrak{g}(\mathbb{R}^d)$ is the smallest sub-algebra of $T((\mathbb{R}^d))$ containing $\mathbb{R}^d$ that is closed under the the commutator bracket $[\cdot, \cdot]$.

In other words, it is the algebra generated by $\{\color{blue}{e_1}, \cdots \color{blue}{e_d}\}$ and $[\cdot, \cdot]$.

What does that mean for the signature?

There exists as sequence of bracketing expressions such that

$$\mathbb{x}^{<\infty}_{0,t} = \exp( a_1 \color{blue}{e_1} + \dots + a_d \color{blue}{e_d} + a_{12}[\color{blue}{e_1}, \color{blue}{e_2}] + \dots a_{(d-1)d}[\color{blue}{e_{d-1}}, \color{blue}{e_d}] \\+ a_{(d-1)d}[\color{blue}{e_1},[\color{blue}{e_1}, \color{blue}{e_2}]] + \dots).$$

This raises two further questions:

• What basis can we choose on $\mathfrak{g}(\mathbb{R}^d)$?
• How to calculated the coefficients of the log-signature in that basis?

### 1.3 A Basis of the Free Lie Algebra¶

For simplicity let us assume that $d=2$ (the one-dimensional case yields no insight) and identify $\color{blue}{e_i} \leftrightarrow \color{blue}{i}$.

Let us look at the first bracketing terms:

1. On the 1st level we have no brackets
• $\color{blue}{1}$, $\color{blue}{2}$.

1. On the 2nd level we obtain

• $[\color{blue}{1}, \color{blue}{1}]$, $[\color{blue}{1}, \color{blue}{2}]$, $[\color{blue}{2}, \color{blue}{1}]$, $[\color{blue}{2}, \color{blue}{2}]$

By definition we have $[\color{blue}{1}, \color{blue}{1}] = [\color{blue}{2}, \color{blue}{2}] = 0$ and $[\color{blue}{1}, \color{blue}{2}] = - [\color{blue}{2}, \color{blue}{1}]$.

Hence we can single out one basis element $[\color{blue}{1}, \color{blue}{2}]$ for this level.

1. On the 3nd level we obtain

• $[\color{blue}{1}, [\color{blue}{1}, \color{blue}{2}]],\quad$  $[\color{blue}{2}, [\color{blue}{1}, \color{blue}{2}]],$

where we have already dismissed the difference between left- and right bracketing. No further simplifications are possible.

1. On the 4th level we obtain

• $[\color{blue}{1}, [\color{blue}{1}, [\color{blue}{1}, \color{blue}{2}]],\quad$ $[\color{blue}{1}, [\color{blue}{2}, [\color{blue}{1}, \color{blue}{2}]]$
• $[\color{blue}{2}, [\color{blue}{1}, [\color{blue}{1}, \color{blue}{2}]],\quad$ $[\color{blue}{2}, [\color{blue}{2}, [\color{blue}{1}, \color{blue}{2}]]$
• $[[\color{blue}{1}, \color{blue}{2}], [\color{blue}{1}, \color{blue}{2}]]$

The last term is clearly zero, however we have by the Jacobi identity we have $$0 = [[\color{blue}{1}, \color{blue}{2}], [\color{blue}{1}, \color{blue}{2}]] = - [\color{blue}{1}, [\color{blue}{2}, [\color{blue}{1}, \color{blue}{2}]]] - [\color{blue}{2}, [[\color{blue}{1}, \color{blue}{2}], \color{blue}{1}]] \\ \qquad\qquad\qquad\hspace{.4em} = - [\color{blue}{1}, [\color{blue}{2}, [\color{blue}{1}, \color{blue}{2}]]] + [\color{blue}{2}, [\color{blue}{1}, [\color{blue}{1}, \color{blue}{2}]]]$$ hence we only need one of the two terms of the above r.h.s.

Hence using the properties of the Lie bracket we arrived at the following basis for $\mathfrak{g}^{4}(\mathbb{R}^2)$

$$\color{blue}{1},\; \color{blue}{2}, \\ [\color{blue}{1}, \color{blue}{2}],\\ [\color{blue}{1}, [\color{blue}{1}, \color{blue}{2}]],\; [\color{blue}{2}, [\color{blue}{1}, \color{blue}{2}]],\\ [\color{blue}{1}, [\color{blue}{1}, [\color{blue}{1}, \color{blue}{2}]],\; [\color{blue}{1}, [\color{blue}{2}, [\color{blue}{1}, \color{blue}{2}]],\; [\color{blue}{2}, [\color{blue}{2}, [\color{blue}{1}, \color{blue}{2}]]$$

yielding $\dim(\mathfrak{g}^{4}(\mathbb{R}^2)) = 8$.

This already hints, why it is much more efficient to save the log-signature instead of the signature itself.

Indeed, the level 4 truncated signature has dimension

$$\dim(T^4(\mathbb{R}^2))-1 = 2 + 2^2 + 2^3 + 2^4 = 30.$$

How can we systemize the above considerations?

One way two construct a basis of the free Lie algebra is through so-called Lyndon words.

Unfortunately a detailed description of Lyndon words would go beyond the scope of this lecture.

However, we can look again at the $d=2$ case more systematically. The following are the first Lyondon words up to the level 4:

$$\color{blue}{1},\; \color{blue}{2},\; \color{blue}{1}\color{blue}{2},\; \color{blue}{1}\color{blue}{1}\color{blue}{2},\; \color{blue}{1}\color{blue}{2}\color{blue}{2},\; \color{blue}{1}\color{blue}{1} \color{blue}{1}\color{blue}{2},\; \color{blue}{1}\color{blue}{1}\color{blue}{2}\color{blue}{2},\; \color{blue}{1}\color{blue}{2}\color{blue}{2}\color{blue}{2}.$$

Why? Because each word is comes before all its proper suffixes in lexicografical order.

For $\color{blue}{1}$ and $\color{blue}{2}$ there are no proper suffixes.

• $\color{blue}{1}\color{blue}{2} < \color{blue}{2}$
• $\color{blue}{1}\color{blue}{1}\color{blue}{2} < \color{blue}{1}\color{blue}{2}, \; \color{blue}{2}$
• $\color{blue}{1}\color{blue}{2}\color{blue}{2} < \color{blue}{2}\color{blue}{2}, \; \color{blue}{2}$
• $\color{blue}{1}\color{blue}{1} \color{blue}{1}\color{blue}{2} < \color{blue}{1} \color{blue}{1}\color{blue}{2},\; \color{blue}{1}\color{blue}{2}, \; \color{blue}{2}$
• etc.

Every Lyndon word $w$ (longer then one) has at least one proper suffix that is also a Lyndon word. Denote by $Ly(w)$ the longest such suffix

• $Ly(\color{blue}{1}\color{blue}{2}) = \color{blue}{2}$
• $Ly(\color{blue}{1}\color{blue}{1}\color{blue}{2}) = \color{blue}{12}$
• $Ly(\color{blue}{1}\color{blue}{2}\color{blue}{2}) = \color{blue}{2}$
• $Ly(\color{blue}{1}\color{blue}{1} \color{blue}{1}\color{blue}{2}) = \color{blue}{1} \color{blue}{1}\color{blue}{2}$
• etc.

Further one can show that for a Lyndon word $w$ with $w = v Ly(w)$, the suffix $v$ is also a Lyndon word.

On can define a map $\sigma$ from Lyndon words to the Lie algebra by

• $\sigma(i) = i, \qquad i\in \{\color{blue}{1}, \dots, \color{blue}{d}\}$

• $\sigma(w) = [\;\sigma(v)\;,\; \sigma(Ly(w))\;], \qquad$ for $w = vLy(w).$

E.g. $\sigma(\color{blue}{1}\color{blue}{1}\color{blue}{2}\color{blue}{2}) = [\color{blue}{1}, \sigma(\color{blue}{1}\color{blue}{2}\color{blue}{2})] = [\color{blue}{1}, [\sigma(\color{blue}{1}\color{blue}{2}), \color{blue}{2}]] = [\color{blue}{1}, [[\color{blue}{1},\color{blue}{2}], \color{blue}{2}]]$.

One can show that the image of $\sigma$ forms a basis of the free Lie algebra.

In order to determine $\dim(\mathfrak{g}^{N}(\mathbb{R}^d))$ one can count the number of Lyndon words up to level $N$. Algebraically this is related to counting necklaces and one obtains

$$\dim(\mathfrak{g}^{N}(\mathbb{R}^d)) := \sum_{n=1}^N \frac{1}{n} \sum_{k | n}\mu(n/k) d^k,$$

where the inner sum is taken over all divisors $k$ of $n$ and $\mu$ is the Möbius function.

Here is a comparison of the $\eta_{d, N} := \dim(\mathfrak{g}^{N}(\mathbb{R}^d))$ with the dimension of the signature $\sigma_{d,N} := \dim(T^{N}(\mathbb{R}^d))$

### 1.4 The Signature of Piecewise Linear Paths¶

Suppose that we are given a discrete time series $\{x_0, x_2, \dots, x_J\}\subset\mathbb{R}^d$. Denote by $x:[0,J]\to \mathbb{R}^d$ the piecewise linear interpolation and also define $\Delta x_j = x_j - x_{j-1}$.

First note that, due to linearity, in each interval interval $[j-1,j]$ we have

$$\idotsint_{n-1 \le t_1 \le \dots \le t_n \le n} dx_{t_1} \otimes \dots \otimes dx_{t_n} = (\Delta x_j)^{\otimes n} \int_{\Delta_n} (dt)^n = \frac{1}{n!} (\Delta x_j)^{\otimes n}.$$

Hence the signature is given by

$$\mathbb{x}^{<\infty}_{j-1, j} = 1 + \sum_{n=1}^\infty \frac{1}{n!} (\Delta x_j)^{\otimes n} = \exp(\Delta x_j).$$

Then using Chen's identity, we have

$$\mathbb{x}^{<\infty}_{0, j} = \exp(\Delta x_1) \otimes \cdots \otimes \exp(\Delta x_j).$$

Hence in order to calculate the log-signature of a piecewise linear path we need to understand how to multiply two non-commutative exponentials.

#### Theorem (Baker-Campbell-Hausdorff + Dynkin formula)¶

Let $\mathbf{a}, \mathbf{b} \in \mathfrak{g}(\mathbb{R}^d)$ then it holds

$$\log(\exp(\mathbf{a})\otimes\exp(\mathbf{b})) = \sum_{n=1}^{\infty} \sum_{k=1}^{n} \sum_{s_i + r_i > 0\\s_1+ \cdots+ r_k = n} \frac{(-1)^{n-1} [\mathbf{a}^{s_1} \mathbf{b}^{r_1} \cdots \mathbf{a}^{s_k} \mathbf{b}^{r_k}] }{k \cdot n \cdot s_1!r_1!\cdots s_k!r_k!}$$

where $[\mathbf{a}^{r}\mathbf{c}] = [\mathbf{a}, [\mathbf{a}, \cdots (r \text{ times}) \cdots [\mathbf{a}, \mathbf{c}]\dots]]$.

• The formula gives an explicit expression in terms of brackets.
• However, it does not express the result in terms of a Lie basis.
• It is possible two algorithmically calculated the BCH-formula directly in the Lyndon basis (see Casas and Murua 2009). The description of this method is, however, beyond the scope of this lecture.

Up to the first three levels the BCH-formula in a Lie basis reads

$$\log(\exp(\mathbf{a})\otimes\exp(\mathbf{b})) = \mathbf{a} + \mathbf{b} +\frac{1}{2}[\mathbf{a}, \mathbf{b}] + \frac{1}{12}([\mathbf{a}, [\mathbf{a}, \mathbf{b}]] + [[\mathbf{b}, \mathbf{a}], \mathbf{b}]) \, + \dots$$

Returning to the log-signature of piecewise linear paths...

Again we consider the case $d=2$ for simplicity and we truncate the log-signature at level 3.

Let us define coefficients in terms of the Lie basis

$$\log\mathbb{x}^{\le 3}_{0,t} =: \ell^{\le3}_t =: \ell^1_t\color{blue}{1} + \ell^2_t(t)\color{blue}{2} + \ell^{12}_t[\color{blue}{1},\color{blue}{2}] + \ell^{112}_t[\color{blue}{1},[\color{blue}{1},\color{blue}{2}]] + \ell^{122}_t[[\color{blue}{1},\color{blue}{2}], \color{blue}{2}]$$

By previous considerations we can calculate the log signature iteratively using the BCH-formula. Indeed

$$\ell^{\le3}_j = \log(\exp(\ell^{\le3}_{j-1})\otimes \exp(\Delta x_j)) \hspace{19em} \\= \ell^{\le3}_{j-1} + \Delta x_j + \frac{1}{2}[\ell^{\le3}_{j-1}, \Delta x_j] + \frac{1}{12}([\ell^{\le3}_{j-1}, [\ell^{\le3}_{j-1}, \Delta x_j]] + [[\Delta x_j, \ell^{\le3}_{j-1}], \Delta x_j]) \, + \dots \\ = \ell^{\le3}_{j-1} + \Delta x_j + \frac{1}{2}[\ell^{\le2}_{j-1}, \Delta x_j] + \frac{1}{12}([\ell^{\le1}_{j-1}, [\ell^{\le1}_{j-1}, \Delta x_j]] + [[\Delta x_j, \ell^{\le1}_{j-1}], \Delta x_j]) \, + \dots$$

Comparing coefficients we obtain

• $\ell^{1}_j = \ell^{1}_{j-1} + \Delta x_j^1$
• $\ell^{2}_j = \ell^{2}_{j-1} + \Delta x_j^2$
• $\ell^{12}_j = \ell^{12}_{j-1} + \frac{1}{2}(\ell^{1}_{j-1}\Delta x_j^2 - \ell^{2}_{j-1}\Delta x_j^1)$
• $\ell^{112}_j = \ell^{112}_{j-1} - \frac{1}{2} \ell^{12}_{j-1}\Delta x_j^1$ $\qquad\quad + \frac{1}{12}(\ell^{1}_{j-1}\ell^{1}_{j-1}\Delta x_j^2 - \ell^{1}_{j-1}\ell^{2}_{j-1}\Delta x_j^1 - \Delta x_j^1\ell^{2}_{j-1}\Delta x_j^1 + \Delta x_j^2\ell^{1}_{j-1} \Delta x_j^1)$ $\qquad = \ell^{112}_{j-1} - \frac{1}{2} \Delta x_j^1 \ell^{12}_{j-1} + \frac{1}{12}(\ell^{1}_{j-1}-\Delta x_j^1)(\ell^{1}_{j-1}\Delta x_j^2 - \ell^{2}_{j-1}\Delta x_j^1)$
• $\ell^{122}_j = \ell^{122}_{j-1} + \frac{1}{2} \ell^{12}_{j-1}\Delta x_j^2$ $\qquad\quad + \frac{1}{12}(-\ell^{2}_{j-1}\ell^{1}_{j-1}\Delta x_j^2 + \ell^{2}_{j-1}\ell^{2}_{j-1}\Delta x_j^1 + \Delta x_j^1\ell^{2}_{j-1}\Delta x_j^2 - \Delta x_j^2\ell^{1}_{j-1} \Delta x_j^2)$ $\qquad = \ell^{112}_{j-1} - \frac{1}{2} \ell^{12}_{j-1}\Delta x_j^2 - \frac{1}{12}(\ell^{2}_{j-1}-\Delta x_j^2)(\ell^{1}_{j-1}\Delta x_j^2 - \ell^{2}_{j-1}\Delta x_j^1)$

Here is an implmentation of this computation:

Let us do a sanity check:

For the general case, i.e., higher dimension and truncation level one can pre-calculate the BCH-formula in Lyndon basis. This is done by the iisignature package. Let us wrap our function from above:

## 2. Implementing Deep Signature Stopping Rules¶

Recall that we have considered randomized stopping rules of the form

$$\tau_\theta = \inf\left\{ t \in [0,T] \;\bigg\vert\; \int_0^t \theta(\mathbb{X}\vert_{[0,s]})^2 ds \;\ge\; Z\right\},$$

where

• $\hat{\mathbb{X}}$ is an underlying (time augmented) rough path generating the filtration,

• $\theta$ a continuous map from the path space to $\mathbb{R}$,

• $Z$ is a positive random variable independent of everything else.

We have studied the case where $\theta(\mathbb{X}\vert_{[0,t]}) = \langle \ell, \mathbb{X}_{0,t} \rangle$ a linear functional of the signature.

In the following we will, however, focus on the case where $\theta$ is a deep neural network.

More specifically we define a deep signature stopping rule by

$$\theta(\mathbb{X}\vert_{[0,t]}) = (A_I \circ \varphi \circ A_{I-1} \circ \varphi \cdots \circ A_0)(\log \mathbb{X}^{\le N}_{0,t}),$$

where

• $A_0: \mathbb{R}^{\eta_{N,d}} \to \mathbb{R}^{q}$,
• $A_1, \dots, A_{I-1}: \mathbb{R}^{q} \to \mathbb{R}^{q}$,
• and $A_I: \mathbb{R}^{q} \to \mathbb{R}$ are affine maps,
• $q\in\mathbb{N}$ is the number of neurons and $I\in\mathbb{N}$ is the depth of the network,
• and $N\in\mathbb{N}$ is the log-signature truncation level.

Denote by $\mathcal{T}_{\log}$ the space of deep signature stopping rules and recall that we had also the following approximation result

#### Theorem (Bayer-H-Riedel-Schoenmakers)¶

Let Y be adapted to the filtration generated by $\mathbb{X}$ with $\mathbb{E}[\Vert Y \Vert_\infty] < \infty$ then it holds

$$\sup_{\tau \in \mathcal{S}}\mathbb{E}[Y_\tau] \;=\; \sup_{\theta \in \mathcal{T}_{\log}}\mathbb{E}[Y_{\tau_\theta}],$$

where $\mathcal{S}$ is the set of all stopping times.

One of the important steps in the proof was that due to randomization the expected stopped value is of the form

$$\mathbb{E}[Y_{\tau_\theta}] = Y_0 + \mathbb{E}\left[ \int_0^T \exp\left(-\int_0^t \theta(\mathbb{X}\vert_{[0,s]})^2 ds \right)dY_t\right],$$

for the case that we have chosen $Z\sim\mathrm{Exp}(1)$.

This expression immediately suggest a numerical approach to optimization

1. Sample $\mathbb{X}$ and $Y$ on a time-grid $\{t_0, \cdots, t_n\}$ say $m$ times.
1. Calculate the log-signature for discretized sample path of $\hat{\mathbb{X}}$ for some truncation level $N$.
1. In the above expression, replace integrals and expectations by discrete sums and optimze over $\theta \in \mathcal{T}_{\log}$ using backpropagation and stochastic gradient decent.
1. Resimulate $\mathbb{X}$, $Y$ and $\log(\hat{\mathbb{X}}^{\le N})$ and use the result of the optimization $\theta^{\ast} \in \mathcal{T}_{\log}$ to calculate a lower bound.

Note that the number of parameters of the neural network depends on $q, I, N$ and $d$.

It does not depend on $n$, the of the discretization grid!

This is in strong contrast to approaches where the lifting the path to a $n$-dimensional Markov process

$$(X_{t_0}, 0, \dots, 0),\; (X_{t_1}, X_{t_0}, 0, \dots, 0),\; (X_{t_2}, X_{t_1}, X_{t_0}, 0, \dots, 0),\; \dots \in \mathbb{R}^{n\times d}.$$

## 3. Optimal Stopping of Fractional Brownian Motion¶

Finally, we turn to a specific example.

Let $X^H$ be a fractional Brownian motion (fBm) with Hurst parameter $H\in(0,1]$.

Note that the time augmented path $\hat{X} = ((t, X^H_t))_{0 \le t \le T}$ is a rough path. (Indeed, no lift is needed since all iterated (cross) integrals are pathwise well-defined as Young integrals.)

We are interested in the optimal stopping of the fBm, hence $Y=X^{H}$, i.e., in approximating

$$\sup_{\tau\in\mathcal{S}}\mathbb{E}[X^H_\tau]$$

First we will need to generate sample paths of fBm.

As explained on the exercise sheet, this can efficiently be done using the circulant embedding method and the fast Fourier transform.