Simulating correlated random variables in Python

Oscar Nieves
6 min readJul 30, 2021
Source: https://www.nature.com/articles/nmeth.3587

In my previous Medium story (https://oscarnieves100.medium.com/simulating-normal-random-numbers-in-python-18a2a21a1329) I discussed how to simulate normal random numbers with specific mean and variance properties by using something called the Box-Muller method. The idea was to take a set of independently sampled uniform random numbers, and convert them into normal random numbers by using a transformation involving polar coordinates, giving us two uncorrelated normal variables X and Y.

This is all good and fun, however when sampling numbers like this in a computer program, we always get “uncorrelated” variables. Correlation is a measure of how well a variable Y is described by a variable X, or basically how “closely related” a change in Y is to a chance in X. We generally measure correlation through a coefficient ρ which has a value between -1 and +1, with -1 indicating complete anti-correlation and +1 indicating complete correlation. To understand how ρ describes a data-set, we can look at the following diagrams:

Different datasets and their correlation coefficients. Source: https://en.wikipedia.org/wiki/Correlation

Basically when ρ is close to -1 or 1, it means that X and Y are almost linearly related to one another, and so we could in theory predict changes in X by changes in Y quite easily. When ρ is close to zero, it means that X and Y are weakly correlated (or entirely uncorrelated) and so no meaningful relation between the two can be established. There are of course more subtleties to this, and the correlation coefficient doesn’t really tell you the full story. Nevertheless, what we would like to do is use this to generate random variables that are correlated via some predetermined coefficient ρ.

We will do this by using two normal random variables S1 and S2. Let S1 and S2 be standard normal random variables (each with mean 0 and variance 1) sampled from a normal distribution. Because we are sampling them independently of one another, they are by default uncorrelated. We will test this out in Python by calculating the correlation coefficient. Remember, we define correlation as follows:

Here, Cov[X, Y] is the covariance between X and Y, σ is the standard deviation (square root of the variance) and…

--

--

Oscar Nieves

I write stories about applied math, physics and engineering.