MMSE (minimum mean-square error)

Last updated on May 4, 2023

MMSE (minimum mean-square error) is a commonly used statistical technique for estimating an unknown random variable based on a noisy measurement of that variable. It is widely used in signal processing, communication systems, and other areas of engineering and science where estimation problems arise.

The MMSE estimator is a linear estimator that minimizes the mean-square error between the estimated random variable and the true value of that variable. In other words, it provides an estimate that is as close as possible to the true value, on average, given the available measurements and noise.

To understand the MMSE estimator, it is useful to first consider the problem of estimating a deterministic quantity. Suppose we have a quantity x that we wish to estimate, but we can only measure it with some noise. Let y be our noisy measurement, so that:

y = x + n

where n is the noise. We assume that n is a zero-mean Gaussian random variable with variance σ^2, i.e., n ~ N(0, σ^2). Our goal is to find an estimator ẑ of x that minimizes the mean-square error:

E[(x - ẑ)^2]

where the expectation is taken over all possible values of x and n.

The optimal estimator for this problem is the conditional expectation of x given y, i.e., ẑ = E[x|y]. This estimator is optimal in the sense that it minimizes the mean-square error, and it is also linear, i.e., ẑ = a*y + b, where a and b are constants.

To see why this estimator is optimal, consider the mean-square error:

E[(x - ẑ)^2] = E[(x - E[x|y] + E[x|y] - ẑ)^2]

= E[(x - E[x|y])^2] + E[(E[x|y] - ẑ)^2] + 2E[(x - E[x|y])(E[x|y] - ẑ)]

The first term is the variance of x, which is fixed and cannot be reduced by any estimator. The second term is the mean-square error of the estimator, which is minimized by choosing ẑ = E[x|y]. The third term is the covariance between x and the estimator error, which is zero since the estimator error is orthogonal to the subspace spanned by y.

In practice, we do not know the true value of x, so we cannot directly compute E[x|y]. However, we can estimate it using a sample of the measurements. Let y1, y2, ..., yn be n independent and identically distributed (i.i.d.) noisy measurements of x, and let Y = [y1, y2, ..., yn] be the measurement vector. Then the MMSE estimator of x is given by:

Ẑ = E[x|Y] ≈ a*Y + b

where a and b are chosen to minimize the mean-square error of the estimator.

To compute the MMSE estimator, we need to find the conditional distribution of x given Y. By Bayes' rule, we have:

p(x|Y) = p(Y|x)*p(x)/p(Y)

where p(x) is the prior distribution of x, p(Y|x) is the likelihood function of Y given x, and p(Y) is the marginal distribution of Y.

Assuming that the prior distribution of x is Gaussian with mean μ and variance σ^2, i.e., x ~ N(μ, σ^2), and that the likelihood function of Y given x is also Gaussian with mean x and variance σ^2, i.e., Y|x ~ N(x, σ^2*I), where I is the identity matrix, we can show that the conditional distribution of x given Y is also Gaussian with mean and variance given by:

μ_Y = μ + σ^2/(nσ^2 + 1)(Y - μ)

Σ_Y = σ^2/(n*σ^2 + 1)*I

where n is the number of measurements and I is the identity matrix.

To see why these expressions hold, we can compute the conditional distribution directly using Bayes' rule:

p(x|Y) ∝ p(Y|x)*p(x)

= exp[-1/(2σ^2)*||Y - x||^2]exp[-1/(2σ^2)||x - μ||^2]

= exp[-1/(2σ^2)*(||Y - x||^2 + ||x - μ||^2)]

= exp[-1/(2σ^2)(||Y - μ||^2 + ||μ - x||^2 + 2(Y - μ)^T(x - μ))]

The exponent is a quadratic function of x, so the conditional distribution is also Gaussian with mean and variance given by:

μ_Y = (σ^2/(nσ^2 + 1))[(1/σ^2)*Y + (n/σ^2)*μ]

= μ + σ^2/(nσ^2 + 1)(Y - μ)

Σ_Y = σ^2/(n*σ^2 + 1)*I

Now that we have the conditional distribution of x given Y, we can compute the MMSE estimator using the formula:

Ẑ = E[x|Y] ≈ a*Y + b

where a and b are chosen to minimize the mean-square error of the estimator. The mean-square error is given by:

E[(x - Ẑ)^2] = E[x^2] - 2E[x*Ẑ] + E[Ẑ^2]

= E[x^2] - 2E[x*(aY + b)] + E[(aY + b)^2]

= E[x^2] - 2aE[xY] - 2bE[x] + a^2E[Y^2] + 2ab*E[Y] + b^2

We want to minimize this expression with respect to a and b. Taking the partial derivatives with respect to a and b and setting them equal to zero, we get:

a = σ^2/(n*σ^2 + 1)

b = μ - a*E[Y]

Therefore, the MMSE estimator of x based on n noisy measurements Y is given by:

Ẑ = a*Y + b

where a and b are given by the above equations.

The MMSE estimator is a linear estimator, so it can be implemented using a matrix-vector multiplication:

Ẑ = A*Y + b

where A is a matrix of size nxn and b is a vector of size n, given by:

A = (σ^2/(n*σ^2 + 1))*I

b = μ - (σ^2/(n*σ^2 + 1))*E[Y]

The MMSE estimator has several desirable properties, including optimality in the mean-square sense, linearity, and computational efficiency. However, it requires knowledge of the noise variance σ^2, which is often unknown in practice and must be estimated from the data. It also assumes Gaussian noise and prior distributions, which may not be accurate in all cases.

Extensions to the basic MMSE estimator have been developed to address these issues, including robust estimators that are less sensitive to outliers and non-Gaussian noise, and Bayesian estimators that incorporate prior knowledge about the signal and noise distributions. These extensions are beyond the scope of this explanation, but it is worth noting that the MMSE estimator provides a solid foundation for these more advanced techniques.

In summary, the minimum mean-square error (MMSE) estimator is a linear estimator that provides the optimal estimate of a random variable given noisy measurements. The MMSE estimator is derived from the conditional distribution of the variable given the measurements, and it minimizes the mean-square error of the estimate. The MMSE estimator requires knowledge of the noise variance and assumes Gaussian noise and prior distributions, but it provides a computationally efficient and widely used approach to signal estimation in many fields, including signal processing, communications, and control theory.