next up previous
Next: Appendix A: Kullback-Leibler (KL) Up: MCMC and EM Algorithms Previous: Expectation Maximization (EM)

EM Method for Parameter Estimation of Mixture Densities

Assume $N$ random samples $\{X_1,\cdots,X_N\}$ are drawn from a mixture distribution

\begin{displaymath}X_n \sim p(X\vert\Theta)=\sum_{m=1}^M \alpha_m\;p_m(X\vert\theta_m) \end{displaymath}

where $p_m$ is the mth distribution component parameterized by $\theta_m$, and $\alpha_m$ is the mixing coeficient or prior probability of each mixture component satisfying

\begin{displaymath}0 \le \alpha_m \le 1,\;\;\;\;\; \sum_{m=1} \alpha_m = 1 \end{displaymath}

The parameters are $\Theta=\{\alpha_m, \theta_m,(m=1,\cdots,M) \}$. As $\{X_1,\cdots,X_N\}$ are independent, their joint distribution is

\begin{displaymath}p(X\vert\Theta)=\prod_{n=1}^N p(X_n\vert\Theta) \end{displaymath}

and the log-likelihood of the parameters is
$\displaystyle log[L(\Theta\vert X)]$ $\textstyle =$ $\displaystyle log(p(X\vert\Theta)=log[\prod_{n=1}^N p(X_n\vert\Theta)]
=\sum_{n=1}^N log\; p(X_n\vert\Theta)$  
  $\textstyle =$ $\displaystyle \sum_{n=1}^N log[ \sum_{m=1}^M \alpha_m
p_m(X_n\vert\theta_m)]$  

Finding a $\Theta^*$ that maximizes this log-likelihood is not easy. To make the problem easier, now assume some hidden or latent ramdom variables $Y=[y_1,\cdots,y_N]^T$, where $y_n=m \in \{1,\cdots,M\}$ if the nth sample $X_n$ is generated by the mth component $p_m(X\vert\theta_m)$ of the mixture distribution. Now the log-likelihood can be written in terms of both $X$ and $Y$:
$\displaystyle log[L(\Theta\vert X,Y)]$ $\textstyle =$ $\displaystyle log[p(X,Y\vert\Theta)]=log[\prod_{n=1}^N p(X_n,Y\vert\Theta)]
=\sum_{n=1}^N log\; [p(X_n\vert Y,\Theta)p(Y)]$  
  $\textstyle =$ $\displaystyle \sum_{n=1}^N log\; [\alpha_{y_n} p_{y_n}(X_n\vert\theta_{y_n})]$  

The last equal sign is due to the definition of $y_n$, i.e., $X_n$ is known to be generated by the $y_n$th distribution component $p_{y_n}$, therefore all other terms in the summation $p(X\vert\Theta)=\sum_m \alpha_m p_m(X\vert\theta_m)$ can be dropped. The expectation of the log-likelihood with respect to $Y$ is

\begin{displaymath}Q(\Theta,\Theta^{(t)})=E_Y \;log[L(\Theta\vert X,Y)]
=\sum_Y log[L(\Theta\vert X,Y)]\; p(Y\vert X,\Theta^{(t)}) \end{displaymath}

But as any $y_n\in \{1,\cdots,M\}$ can only take one of these $M$ integers, the expectation above can be written as
$\displaystyle Q(\Theta,\Theta^{(t)})$ $\textstyle =$ $\displaystyle E_Y \;log[L(\Theta\vert X,Y)]
=\sum_{m=1}^M [\sum_{n=1}^N log\; [\alpha_{m} p_{m}(X_n\vert\theta_{m})] ]
P(m\vert X_n,\Theta^{(t)})$  
  $\textstyle =$ $\displaystyle \sum_{m=1}^M \sum_{n=1}^N log\;(\alpha_m)\;P(m\vert X_n,\Theta^{(...
...m_{m=1}^M \sum_{n=1}^N log\;[p_{m}(X_n\vert\theta_m)]P(m\vert X_n,\Theta^{(t)})$  

The two terms can be maximized separately as $\alpha_m$ and $\theta_m$ are not related.

The results obtained above can be summarized as below:

Although the above derivation seems tedius, but the final results all make sense intuitively: the coeficient $\alpha_m$ for the $m$th component distribution is the average of the posterior probabilities $P(m\vert X_n,\Theta^{(t)})$ for the $N$ samples to be from the mth component distribution, and the estimated mean vector $\mu_m$ is the average of the $N$ samples $X_n$, and the estimated covariance matrix $\Sigma_m$ is the average of the difference between $X$ and $\mu$ squared, both weighted by the posterior probability $P(m\vert X_n,\Theta^{(t)})$ that a given $X_n$ is from the mth component distribution $p_m(X)$. Also note that both the E and M steps are simultaneously carried out by these three iterative equations, from some arbitrary initial guesses of these parameters.


next up previous
Next: Appendix A: Kullback-Leibler (KL) Up: MCMC and EM Algorithms Previous: Expectation Maximization (EM)
Ruye Wang 2006-10-11