How to Generate Correlated Random Numbers

From Open Risk Manual

How to Generate Correlated Random Numbers

Generation of correlated random numbers is of wide applicability in many domains of quantitative analysis and risk modelling. This article is a review of approaches.

Precise Problem Definition

The more precisely defined question is how to generate random numbers \mathbf X according to a defined multivariate probability distribution.

F_{X_1,\ldots,X_N}(x_1,\ldots,x_N) = \operatorname{P}(X_1 \leq x_1,\ldots,X_N \leq x_n)

Special Cases

The most commonly encountered special case is that of multivariate normal distribution given by the density:


f_{\mathbf X}(x_1,\ldots,x_k) = \frac{\exp\left(-\frac 1 2 ({\mathbf x}-{\boldsymbol\mu})^\mathrm{T}{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu})\right)}{\sqrt{(2\pi)^k|\boldsymbol\Sigma|}}

For the purposes of this article will ignore non-zero means and non-unit (unscaled) variances as those aspects can be handled on a univariate basis. We will focus on the Correlation Matrix.

In general the methodologies involve generating realizations of the random (vector) \mathbf X on the basis of a random vector \mathbf Z of uncorrelated normal variables, which in turn are (typically) produced by a random vector \mathbf U of uncorrelated uniform variables.

Cholesky Decomposition

Given the variance-covariance matrix \Sigma (that is positive definite), the Cholesky decomposition is

\Sigma = L L^T

Upon simulation of random vectors \mathbf Z the correlated realisations are provided by:

\mathbf X = L \mathbf Z

where L is a lower triangular matrix that is effectively the "square-root" of the correlation matrix

Singular Value Decomposition

When the correlation matrix is estimated empirically it may be the case that it fails to be positive semi-definite, in which case the Cholesky decomposition may fail. One option is to adjust the correlation matrix. Another option is to pursue a singular value decomposition

\Sigma = U D V^T , where U, V are orthogonal matrices and D is diagonal (with possibly reduced rank, ie. some eigenvalues are zero).

Then

\mathbf X = U \sqrt{D} \mathbf Z

Prespecified Linear Models

It is not always the case that our input data or modelling framework is based on covariance matrix. If the dependency between dependent variables is explicitly derived in terms of a multi-factor model wher the factors F_m are uncorrelated then the problem is reduced to generating independent random numbers and subsequently constructing suitable sums:


   X_{i}  = b_1 F_1 + \ldots + b_m  F_m + e_i

The simplest example being the case

X_{i} = \rho Z + \sqrt(1 - \rho^2) \epsilon_{i}

Box–Muller transform

The wikipedia:Box–Muller transform is an interesting special case, primarily of interest for educational purposes. In the first step uncorrelated normal numbers are generated via


\begin{align}
Z_0 & = & R \cos(\Theta) =\sqrt{-2 \ln U_1} \cos(2 \pi U_2)\, \\
Z_1 & = & R \sin(\Theta) = \sqrt{-2 \ln U_1} \sin(2 \pi U_2).
\end{align}

Those can then be combined as per the formulas of the previous section on linear models

Copulas

In various applications the multi-variate dependency cannot be assumed to be a Gaussian. In this case we require to have a specified copula which provides for more general dependency structures.

The copula of (X_1,X_2,\dots,X_d) is defined as the cumulative distribution function of (U_1,U_2,\dots,U_d) via

C(u_1,u_2,\dots,u_d)=\mathrm{Pr}[U_1\leq u_1,U_2\leq u_2,\dots,U_d\leq u_d] .

The Student-t Copula

This is a special copula as it is linked to the Gaussian distribution

Archimedean Copulas

  • Clayton
  • Gumbel
  • Frank

Empirical Copula

Applications


Implementation

A list of open source implementations

Library / Package Language Characteristics
XXX Python TD
XXX R TD
XXX C++ TD
XXX Julia TD