Sampling random numbers from any probability distribution you want
A guide using simple Python code.
In my previous story (https://oscarnieves100.medium.com/simulating-correlated-random-variables-in-python-c3947f2dbb10) I discussed how to simulate correlated random numbers from a normal probability distribution. We focused on a normal distribution because it is well known, and standard methods of sampling exist.
But what if you wanted to create your own probability distribution? What if I told you that you don’t need to be constrained by any of the well-known probability distributions out there? Where there is a will, there is a way, and what I am about to show you is very simple yet very powerful.
First, let us establish some rules for creating a probability distribution p(x):
- p(x) ≥ 0, that is: p(x) must be positive or zero for all x values, where x is a real number. p(x) must never be negative regardless of the circumstances (negative probabilities make no sense).
- The integral of p(x) over all possible x must be equal to one. If p(x) is discrete, then by definition the sum of all p(x) for all x must be 1. We define this mathematically as Σp(x) = 1. In general, we can ensure that a probability distribution always obeys this property as follows: first, we create an “un-normalised” distribution g(x) according to some rules of our choice (e.g. g(x) = x² or whatever we want, as long as Rule #1 is satisfied). Then, we normalise it by dividing it by the sum of all its values, namely: p(x) = g(x)/Σg(x). By definition, this ensures that Σp(x) = Σg(x)/Σg(x) = 1 always.
Now that we got that out of the way, let’s play around with a simple distribution: g(x) = (x-x0)² where x0 is some value we define which shifts the parabola to the right. We will assume that g(x) exists on an interval [a, b] and is zero outside these bounds.
The general idea is this: first, we compute an array X which extends from a to b, ideally with uniform spacing. Then, we compute p(X) = g(X)/Σg(X). We shall call this array P . Then, we compute the cumulative sum of P, namely C. This means that at each value in array X, C becomes larger because we are adding the previous value in the cumulative sum. By definition, the maximum value of C = 1 because ΣP = 1, so C…