next up previous
Next: Metropolis-Hastings algorithm Up: MCMC and EM Algorithms Previous: Bayesian Inference

Markov Chain Monte Carlo (MCMC)

Monte Carlo methods can solve these general problems:

Monte Carlo methods can be used to evaluate the integrations for $E[f(X)]$ in Baysian inference, by drawing samples $\{X_t, t=1,\cdots, n\}$ from $p(X)$ and then approximating

\begin{displaymath}E[f(X)]\approx \frac{1}{n}\sum_{t=1}^n f(X_t) \end{displaymath}

This is referred to as Monte Carlo integration.

A Markov Chain is a sequence of random variables $\{X_0,X_1,\cdots\}$ generated by a Markov process. At each time $t \ge 0$, the next state $X_{t+1}$ is sampled from a distribution called transition kernel $P(X_{t+1}\vert X_t) \stackrel{\triangle}{=} Prob(X_t \rightarrow X_{t+1})$. By definition, the next state $X_{t+1}$ of a Markov process only depends on the current state $X_t$, but is independent of all previous states $X_0, X_1, \cdots, X_{t-1}$ (limited horizon):

\begin{displaymath}P(X_{t+1}\vert X_0, X_1, \cdots X_t) = P(X_{t+1}\vert X_t) \end{displaymath}

Let $\{s_i, (i=1,\cdots,N) \}$ be the $N$ possible states of the Markov chain. We define

Then the following Chapman-Kolomogrov equation gives the probability of the next state taking a specific value $s_i$:
$\displaystyle p_i(t+1)$ $\textstyle =$ $\displaystyle Prob(X_{t+1}=s_i)=\sum_{i=1}^N P(X_{t+1}=s_i\vert X_t=s_j)P(X_t=s_j)$  
  $\textstyle =$ $\displaystyle \sum_{j=1}^N P(j\rightarrow i) p_j(t)=\sum_{j=1}^N P(i,j) p_j(t)$  

We represent both distributions at stage $t$ and $t+1$ column vectors:

\begin{displaymath}{\bf p}(t)=[p_1(t),\cdots,p_N(t)]^T, \;\;\;\;
{\bf p}(t+1)=[p_1(t+1),\cdots,p_N(t+1)]^T \end{displaymath}

then the Chapman-Kolomogrov equation can be expressed in matrix form:

\begin{displaymath}{\bf p}(t+1)={\bf P} \; {\bf p}(t) \end{displaymath}

This state transition can be recursively extended until reaching the initial state ${\bf p}(0)$:

\begin{displaymath}{\bf p}(t+1)={\bf P} \; {\bf p}(t)={\bf P}{\bf P}\;{\bf p}(t-1)
= \cdots = {\bf P}^{t+1} \; {\bf p}(0) \end{displaymath}

If the Markov chain is irreducible and aperiodic, the distribution ${\bf p}(t)$ may reach a stationary distribution ${\bf p}^*$ after a large number of transitions, independent of the initial distribution ${\bf p}^*(t)$. Any further transition will not change the distribution any more:

\begin{displaymath}{\bf p}^*={\bf P}\;{\bf p}^* \end{displaymath}

In other words, the stationary distribution ${\bf p}^*$ is the eigenvector of the transition matrix ${\bf P}$ corresonding to the eigenvalue $\lambda=1$.

A sufficient condition for a unique stationary distribution is that the detailed balance equation holds:

\begin{displaymath}P(i\rightarrow j) p_i^*=P(j\rightarrow i) p_j^* \end{displaymath}

under this condition the chain is reversible, and the ith element of vector ${\bf P} {\bf p}$ is

\begin{displaymath}({\bf P} {\bf p})_i=\sum_k P(k\rightarrow i) p_k
=\sum_k P(i\rightarrow k) p_i=p_i\sum_k P(i\rightarrow k)=p_i \end{displaymath}

(last equal sign due to $\sum_k P(i \rightarrow k)=1$), i.e., ${\bf p}$ is stationary. This condition guarantees stationary distribution from the beginning of the chain, and is not a necessary condition as the chain can also become stationary gradually.


next up previous
Next: Metropolis-Hastings algorithm Up: MCMC and EM Algorithms Previous: Bayesian Inference
Ruye Wang 2006-10-11