BP (belief propagation)

Last updated on Mar 3, 2023

Belief propagation (BP) is a message-passing algorithm used for probabilistic inference on graphical models, such as Bayesian networks or Markov random fields. The algorithm calculates the marginal probabilities of the variables in the model, given some evidence. The key idea behind BP is to represent the joint probability distribution over all variables in the model as a product of conditional probabilities, each depending on a subset of the variables. BP then iteratively updates these conditional probabilities, based on messages received from neighboring nodes in the graph, until a convergence criterion is met.

To understand how BP works, let us consider a simple example of a Bayesian network. A Bayesian network is a graphical model that represents a set of variables and their conditional dependencies using a directed acyclic graph. Each node in the graph corresponds to a variable, and the edges represent the probabilistic dependencies between the variables.

Each node in the graph represents a random variable, and the edges indicate the probabilistic dependencies between the variables. For example, variable B depends on variable A, and variable C depends on both A and B.

To perform probabilistic inference using BP, we need to calculate the marginal probabilities of the variables given some evidence. Let us suppose that we are given evidence that variable A takes on the value true. We want to calculate the marginal probabilities of variables B and C given this evidence.

The first step in BP is to initialize the messages sent between the nodes in the graph. A message from node i to node j represents the belief of node i about the state of node j, given the evidence and the messages received from other nodes in the graph. For example, the message from node A to node B represents the belief of node A about the state of node B, given the evidence that A is true and the messages received from other nodes in the graph.

The messages are initialized as follows:

The message from node A to node B is 1 if A is true, and 0 otherwise.
The message from node C to node B is 1 if C is true, and 0 otherwise.
The message from node B to node A is 1, representing the prior belief that B is equally likely to be true or false.
The message from node B to node C is 1, representing the prior belief that B is equally likely to be true or false.

Note that the messages are normalized to sum to 1, so that they can be interpreted as conditional probabilities.

The next step in BP is to iteratively update the messages until a convergence criterion is met. The update rule for each message is based on Bayes' rule and the product rule of probability:

The message from node i to node j is proportional to the product of the incoming messages to node i, excluding the message from node j, and the conditional probability of node i given its parents, marginalized over the other variables. In symbols:

m_{i->j}(x_j) \propto \sum_{x_i} p(x_i | pa_i) \prod_{k \in nb_i \setminus j} m_{k->i}(x_i)

where m_{i->j}(x_j) is the message from node i to node j, x_i and x_j are the values of the variables represented by nodes i and j, respectively, pa_i is the set of parents of node i, nb_i is the set of neighbors of node i, and \setminus j denotes the set of neighbors of node i excluding node j.

The normalization factor

The normalization factor for the message from node i to node j is the sum of the unnormalized messages over all possible values of x_j:

Z_{i->j} = \sum_{x_j} m_{i->j}(x_j)

Once the messages have been updated, the marginal probabilities of each variable can be calculated by marginalizing the product of the incoming messages to that variable. In symbols:

p(x_i | e) \propto \prod_{j \in nb_i} m_{j->i}(x_i)

where p(x_i | e) is the marginal probability of variable i given the evidence e, and nb_i is the set of neighbors of node i.

This equation is based on the fact that the joint probability distribution over all variables can be factorized as a product of conditional probabilities:

p(x_1, ..., x_n) = \prod_{i=1}^n p(x_i | pa_i)

where pa_i is the set of parents of node i.

The BP algorithm can be summarized as follows:

Initialize the messages using the evidence and the prior probabilities.
Repeat until convergence:

For each node i, update the outgoing messages to its neighbors j based on the incoming messages from the other neighbors k and the conditional probabilities of node i given its parents.
Normalize the messages.

Calculate the marginal probabilities of each variable using the product of the incoming messages.

The convergence of BP is guaranteed for tree-structured graphs, where there are no loops in the graph. For graphs with loops, the convergence is not guaranteed, and the algorithm may get stuck in local optima. However, in practice, BP often works well even for graphs with loops, and there are several techniques to improve its performance, such as damping and message scheduling.

BP has many applications in machine learning, computer vision, natural language processing, and other fields where probabilistic modeling is used. For example, BP can be used for image denoising, stereo matching, speech recognition, and machine translation, among others.

In conclusion, belief propagation is a powerful message-passing algorithm for probabilistic inference on graphical models. It allows us to calculate the marginal probabilities of the variables in the model given some evidence, by iteratively updating messages between nodes in the graph. While BP is guaranteed to converge for tree-structured graphs, its performance on graphs with loops can be improved by using damping and message scheduling techniques. BP has many practical applications in machine learning and other fields, and it is an important tool in the probabilistic modeling toolkit.