CDF (cumulative distribution function)

Last updated on 13 Mar 2023

The cumulative distribution function (CDF) is a fundamental concept in probability theory that describes the probability of a random variable taking on a value less than or equal to a particular value. In other words, the CDF gives the cumulative probability distribution of a random variable.

To understand the CDF, we need to first understand the probability density function (PDF). The PDF is a function that describes the probability distribution of a continuous random variable. It gives the probability density of the random variable at each possible value. The area under the curve of the PDF over a range of values gives the probability that the random variable falls within that range.

For example, suppose we have a random variable X that is uniformly distributed between 0 and 1. The PDF of X is a horizontal line of height 1 between 0 and 1, and 0 everywhere else. The probability that X falls within the range [a,b] is the area under the PDF curve between a and b. This is given by:

P(a <= X <= b) = integral of PDF between a and b

The CDF is defined as the integral of the PDF from negative infinity up to a given value of x. It gives the probability that the random variable X is less than or equal to x.

Formally, the CDF of a continuous random variable X is defined as:

F(x) = P(X <= x) = integral of PDF from negative infinity to x

The CDF gives the cumulative probability distribution of X up to x. It is a monotonically increasing function that ranges from 0 to 1 as x goes from negative infinity to positive infinity.

Let's take the example of the uniform distribution on [0,1]. The PDF of X is given by:

f(x) = 1 for 0 <= x <= 1 0 otherwise

The CDF of X is:

F(x) = P(X <= x) = integral from 0 to x of f(t) dt = x for 0 <= x <= 1 = 0 for x < 0 = 1 for x > 1

The graph of the CDF of the uniform distribution is a diagonal line that goes from (0,0) to (1,1).

One of the main advantages of the CDF is that it allows us to calculate the probability of a random variable taking on a value within a particular range. For example, if we want to find the probability that X is between 0.2 and 0.7, we can simply subtract the CDF values at 0.2 and 0.7:

P(0.2 <= X <= 0.7) = F(0.7) - F(0.2) = 0.7 - 0.2 = 0.5

This gives us a much simpler way to calculate probabilities than using the PDF, which requires integration over the range of interest.

Another advantage of the CDF is that it allows us to calculate the percentiles of a random variable. The p-th percentile of a random variable X is the value x_p such that the probability that X is less than or equal to x_p is p. In other words, the p-th percentile is the value that cuts off the lowest p% of the distribution.

For example, the 50th percentile of the uniform distribution on [0,1] is simply 0.5, since this is the value that cuts off the lowest 50% of the distribution. The 95th percentile of the standard normal distribution is approximately 1.645, meaning that 95% of the distribution is below this value.

To find the p-th percentile of a random variable X, we can use the inverse of the CDF, also known as the quantile function. The quantile function takes a probability p as input and returns the value x_p such that P(X <= x_p) = p. In other words, it gives us the value of X that cuts off the lowest p% of the distribution.

Formally, the quantile function of a continuous random variable X is defined as:

Q(p) = inf{x : P(X <= x) >= p}

where inf{x} denotes the infimum of the set of values x that satisfy the inequality.

Using the quantile function, we can find the p-th percentile of X by computing Q(p). For example, to find the 95th percentile of the standard normal distribution, we would compute Q(0.95) using the standard normal CDF or a table of standard normal quantiles.

In summary, the CDF is a fundamental concept in probability theory that gives the cumulative probability distribution of a random variable. It allows us to calculate the probability of a random variable taking on a value within a particular range and to find the percentiles of the distribution. The CDF is a monotonically increasing function that ranges from 0 to 1 as the value of the random variable goes from negative infinity to positive infinity. The inverse of the CDF, known as the quantile function, can be used to find the values of the random variable that cut off a specified percentage of the distribution.