TableOfReal: To GaussianMixture...

TableOfReal: To GaussianMixture...

Creates a GaussianMixture from the selected TableOfReal by an expectation-maximization procedure.

Settings

Number of components: defines the number of Gaussians in the mixture.
Tolerance of minimizer: defines when to stop optimizing. If the relative difference between the likelihoods at two successive iteration steps differs by less then the tolerance we stop, i.e. when |(L(i-1)-L(i))/L(i)| < tolerance.
Maximum number of iterations: defines another stopping criterion. The EM iteration will stop when either the tolerance is reached or the maximum number of iterations. If zero is chosen, no iteration will be performed and the GaussianMixture will be initialized with the initial guess.
Stability coefficient lambda: defines the fraction of the total covariance that will be added to the each of the mixture covariance matrices during the EM iteration. This may prevent one or more of these matrices to become singular.
Covariance matrices are: defines whether the complete covariance matrices in the mixture have to be calculated or only the diagonal.
Criterion based on: defines how the likelihood of the data given the model is calculated.

Expectation–Maximization Algorithm

The Expectation–Maximization (EM) algorithm is an iterative procedure to maximize the likelihood of the data given a model. For a GaussianMixture, the parameters in the model are the centers and the covariances of all components in the mixture and their mixing probabilities.

The number of parameters depends on the number of components in the mixture and the dimension of the data. For a full covariance matrix we have to find dimension(dimension+1)/2 matrix elements and another dimension vector elements for its center. This makes the total number of parameters that have to be estimated for a mixture with Number of components components equal to numberOfComponents · dimension(dimension+3)/2 + numberOfComponents.

For diagonal covariance matrices the number of parameters reduces considerably.

The EM iteration has to start with a sensible initial guess for all the parameters. For the initial guess, we derive our centers from positions on the 1-σ ellipse in the plane spanned by the first two principal components. We then make all covariance matrices equal to a scaled-down version of the total covariance matrix, where the scaling factor depends on the number of components and the quotient of the between and within variance. Initially all mixing probabilities will be chosen equal.

How to proceed from the initial guess with the EM to find the optimal values for all the parameters in the Gaussian mixture is explained in great detail by Bishop (2006).

Links to this page

GaussianMixture & TableOfReal: Improve likelihood...