A Layman's Introduction to MCMC
Updated: Jan 26
WHAT is MCMC?
MCMC, short for Markov Chain Monte Carlo, is a class of algorithms used to approximate distributions we can not calculate otherwise.
WHY do we use MCMC?
We use MCMC because we want to sample from a distribution we have no knowledge of. For instance you want to what the percentage of data scientists use linear regression on a daily basis in their jobs and you don't know where all the data scientists are and if they're lying.
To make it more sophisticated, we want to approximate the posterior distribution that we can not measure directly so we randomly sample (Monte Carlo) in probabilistic space and then we navigate these random samples on Markov chain principles i.e. we combine a Monte Carlo method (approximation) and a Markov chain (random walk on that approximation) to get our posterior estimation.
WHEN do we use MCMC?
We use MCMC when the distribution we want to approximate is high dimensional, unknown, or intractable in any way or form.
The process came around the same time as the Manhattan project.
How to use MCMC?
MCMC has two parts that make the whole. Monte Carlo approximation and Markov Chains.
Monte Carlo Approximation
The Monte Carlo in MCMC refers to Monte Carlo approximation. Given a large population of i.i.d (independent and identically distributed) variables, you can find a reasonably accurate mean of the population by simply picking out random samples from the population and calculating the mean of these samples.
If N is large enough the true mean will be equal to the sample mean by law of large numbers.
Let's assume we want to know average height of the 100 people in a class. By Monte Carlo approximation if we start selecting people at random, the average height of the selected individual will start getting close the actual average height of the class as we increase the number of selected individuals.
Markov property simply means that given the present state and an action, you can move to a future state without any knowledge of the prior states. Mathematically, in probabilistic terms the Markov property can be written as follows.
A Markov chain is a graphical model that obeys the Markov property. In probabilistic space it would mean that the current value is only dependent on the last variable in the chain.
Let's consider a random walk across the x-axis by 0.1 in either direction, the destination or next state is only dependent on where you are on the x-axis at the moment and if you move forward by 0.1 or move backward by 0.1. All other states are irrelevant.
MCMC (Markov Chain Monte Carlo)
There are many Markov Chain Monte Carlo algorithms that mostly define different ways of constructing the Markov Chain when performing each Monte Carlo sample.
Most commonly known MCMC algorithm is Gibbs sampling. A version of pseudo code for a specific case is given below.
A version of Gibbs sampling code implementation is at
Following is a link to an interesting visualization for various MCMC algorithms:
1. MCMC algorithms are sensitive to their starting point, and often require a burn-in phase, after reaching the burn in we can discard prior samples. The goal is to enter what we call a stationary distributions to collect useful samples.
2. It's not always easy to know if a chain has converged. Often the process requires a large number of iterations and can be very slow.