Simulating correlated random variables in Python

Oscar Nieves
6 min readJul 30, 2021
Source: https://www.nature.com/articles/nmeth.3587

In my previous Medium story (https://oscarnieves100.medium.com/simulating-normal-random-numbers-in-python-18a2a21a1329) I discussed how to simulate normal random numbers with specific mean and variance properties by using something called the Box-Muller method. The idea was to take a set of independently sampled uniform random numbers, and convert them into normal random numbers by using a transformation involving polar coordinates, giving us two uncorrelated normal variables X and Y.

This is all good and fun, however when sampling numbers like this in a computer program, we always get “uncorrelated” variables. Correlation is a measure of how well a variable Y is described by a variable X, or basically how “closely related” a change in Y is to a chance in X. We generally measure correlation through a coefficient ρ which has a value between -1 and +1, with -1 indicating complete anti-correlation and +1 indicating complete correlation. To understand how ρ describes a data-set, we can look at the following diagrams:

Different datasets and their correlation coefficients. Source: https://en.wikipedia.org/wiki/Correlation

Basically when ρ is close to -1 or 1, it means that X and Y are almost linearly related to one another, and so we could in theory predict changes in X by…

--

--

Oscar Nieves

I write stories about applied math, physics and engineering.