/Probability Theory for Physically Based Rendering

## Probability Theory for Physically Based Rendering

Introduction

Rendering frequently involves the evaluation of multidimensional definite integrals: e.g., the visibility of an area light, radiance arriving over the area of a pixel, radiance arriving over a period of time, and the irradiance arriving over the hemisphere of a surface point. Evaluation of these integrals is typically done using Monte-Carlo integration, where the integral is replaced by the expected value of a stochastic experiment.

This document details the basic process of Monte-Carlo integration, as well as several techniques to reduce the variance of the approach. This will be done from a practical point of view: it is assumed that the reader is not intimately familiar with probability theory, but still wants to benefit from it for the development of efficient yet correct rendering algorithms.

Definite Integrals

A definite integral is an integral of the form $int_{a}^{b}f(x)dx$, where $[a,b]$ is an interval (or domain), $x$ is a scalar and $f(x)$ is a function that can be evaluated for each point in the interval. As worded by Wikipedia, a definite integral is defined as the signed area in the $x$-plane that is bounded by the graph of $f$, the $x$-axis and the vertical lines $x=a$ and $x=b$ (Figure 1a).

The concept extends intuitively to higher dimensions: for a definite double integral, signed area becomes signed volume (Figure 1b) and in general, for definite multiple integrals, this becomes the signed hyper-volume.

In some cases, the area can be determined analytically, e.g. for $f(x)=2$: for the domain $[a,b]$, the area is simply $2(b-a)$. In other cases an analytical solution is impossible, for example when we want to know the volume of the part of the iceberg that is above the water (Figure 1c). In such cases, $f(x)$ can often only be determined by sampling.

Numerical Integration

We can estimate the area of complex integrals using numerical integration. One example of this is the Riemann sum. We calculate this sum by dividing the region in regular shapes (e.g. rectangles), that together form a region that is similar to the actual region. The Riemann sum is defined as:

$$tag{1}S=sum_{i=1}^{n}f(x_i)Delta x_i$$

Here, $n$ is the number of subintervals, and $Delta x_i=frac{b-a}{n}$ is the width of one subinterval. For each interval $i$, we sample $f$ at a fixed location $x_i$ in the subinterval (in Figure 2: at the start of the subinterval).

Note that as we increase $n$ the Riemann sum will converge to the actual value of the integral:

$$tag{2}int_{a}^{b}f(x)dx=lim_{||Delta x||to 0}sum_{i=1}^{n}f(x_i)Delta x_i$$

Riemann sums also work in higher dimensions (Figure 3). However, we run into a problem: for a function with two parameters, the number of subintervals must be much larger if we want to have a resolution that is comparable to what we used in the 2D case. This effect is known as the curse of dimensionality, and is amplified in higher dimensions.

We will now evaluate the accuracy of the Riemann sum for the following (deliberately convoluted) function:

$$tag{3}f(x)=left |sin left (frac{1}{2}x+frac{pi}{2} right )tan frac{x}{27}+sin left (frac{3}{5}x^2 right )+frac{4}{x+pi+1}-1right |$$

A plot of the function over the domain $[-2.5,2.5]$ is shown below. For reference, we calculate the definite integral $int_{-2.5}^{2.5}f(x)$ via Wolfram Alpha, which reports that the area is $3.12970$. The plot on the right shows the accuracy of numerical integration using the Riemann sum for increasing $n$.

To put some numbers on the accuracy: for $n=50$, the error is $~2times10^{-3}$. At $n=100$, the error is $~3times10^{-4}$. Another order of magnitude is obtained for $n=200$.

sums, consider these resources:

Monte
Carlo (1)

For rendering, few integrals (none?) are univariate. This means that we quickly meet the curse of dimensionality. On top of that, sampling a function at regular intervals is prone to undersampling and aliasing: we may miss important values in the function, or end up with unintended interference between the sampled function and the sample pattern (Figure 5).

We solve these problems using a technique known as Monte Carlo integration. Similar to the Riemann sum this involves sampling the function at a number of points, but unlike the deterministic pattern in the Riemann sum, we turn to a fundamentally non-deterministic ingredient: random numbers.

Monte Carlo integration is based on the observation that an integral can be replaced by the expected value of a stochastic experiment:

$$tag{4}int_{a}^{b}f(x)dx=(b-a)E left [f(X) right ]approx frac{b-a}{n}sum_{i=1}^{n}f(X)$$

In other words: we sample the function $n$ times at random locations within the domain (denoted by capital $X$), average the samples, and multiply by the width of the domain (for a univariate function). As with the Riemann sum, as $n$ approaches infinity, the average of the samples converges to the expected value, and thus the true value of the integral.

A
Bit of Probability Theory

It is important to grasp all the individual concepts here. Let’s start with the expected value: this is the value we expect for a single sample. Note that this is not necessarily a possible value, which may be counter-intuitive. For example, when we roll a die, the expected value is $3.5$: the average of all possible outcomes, $(1+2+3+4+5+6)/6=21/6=3.5$.

The second concept is random numbers. It may sound obvious, but what we need for Monte Carlo integration are uniformly distributed random numbers, i.e. every value must have an equal probability of being generated. More on this later.

A third concept is deviation, and, related to that, variance. Even when we take a small number of samples, the expected average value as well as the expected value of each individual sample is the same. However, evaluating Equation 4 rarely will actually produce this value. Deviation is the difference between the expected value and the outcome of the experiment: $X-E(X)$.

In practice, this deviation has an interesting distribution:

This is a plot of the normal distribution or bell curve: it shows that not all deviations are equally probable. In fact, ~68.2% of our samples are within the range $-1sigma..1sigma$, where $sigma$  (sigma) is the standard deviation. Two useful ways to describe ‘standard deviation’ are:

• Standard deviation is a measure of the dispersion of the data.
• Standard deviation is the expected absolute deviation: $sigma=Eleft [left |X-E left (Xright )right |right ]$.

To determine the standard deviation we have two methods:

1. Standard deviation $sigma=sqrt{frac{1}{n}sum_{i=1}^{n}left (X_i-Eleft [Xright ]right )^2}$: this works if we have a discrete probability distribution and the expected value $E[X]$ is known. This is true for dice, where $X={1,2,3,4,5,6}$ and $E[X]=3.5$. Plugging in the numbers, we get $sigma=1.71$.
2. Alternatively, we can calculate the sample standard deviation using $sigma=sqrt{frac{1}{n-1}sum_{i=1}^{n}left (X_i-Xright )^2}$. More information on Wikipedia.

Sanity check: does this make sense? If $sigma=1.71$, we claim that 68.2% of the samples are within 1.71 from 3.5. We know that ${2,3,4,5}$ satisfy this criterion, $1$ and $6$ do not. Four out of six is 66.7%. Only if our dice would have been able to produce any value in the range $[1..6]$ we would have hit the 68.2% mark exactly.

Instead of standard deviation we frequently use the related term variance, which is defined simply as $Varleft [Xright ]=sigma^2$. Being a square, variance is always positive, which helps in our calculations.

Monte Carlo (2)

In an earlier section we evaluated Equation 3 using a Riemann sum. We will now repeat this experiment using Monte Carlo integration. Recall that Monte Carlo integration is defined as

$$tag{5}int_{a}^{b}f(x)dx=(b-a)E left [f(X) right ]approx frac{b-a}{n}sum_{i=1}^{n}f(X)$$

A straight translation to C code:

double sum = 0;
for( int i = 0; i < n; i++ ) sum += eval( Rand( 5 ) - 2.5 );
sum = (sum * 5.0) / (double)n;

The result for $n=2$ to $n=200$ is shown in the graph below. This suggests that Monte Carlo integration performs much worse than the Riemann sum. A closer inspection of the error shows that for $n=200$ the average error of the Riemann sum is $0.0002$, while the error for Monte Carlo is $0.13$.

In higher dimensions, this difference is reduced, but not eliminated. The following equation is an expanded version of the one we used before, taking two parameters:

$$f(x,y)=left |sinleft( frac{1}{2}x+frac{pi}{2}right )tan frac{x}{27}+sin left ( frac{1}{6}x^2right )+frac{4}{x+pi+1}-1right |left |sinleft ( 1.1yright )cosleft (2.3xright )right |$$

$(6)$

Over the domain $x∈[-2.5,2.5],y∈[-2.5,2.5]$, the volume bounded by this function and the $xy$-plane is $6.8685$. At $n=400$ (20×20 samples), the error of the Riemann sum is $0.043$. At the same sample count, the average error using Monte Carlo integration is $0.33$. This is better than the previous result, but the difference is still significant. To understand this problem we investigate a well-known variance reduction technique for Monte Carlo integration: stratification.

Stratification improves the uniformity of random numbers. In Figure 8a, eight random numbers are used to sample the function. Since each number is picked at random, they often are not evenly distributed over the domain. Figure 8b shows the effect of stratification: the domain is subdivided in eight strata, and in each stratum a random position is chosen, improving uniformity.

The effect on variance is quite pronounced. Figure 9a shows a plot of the results with and without stratification. Figure 9b shows the error in the estimate. For $n=10$, the average error for 8 strata is $0.05$; for 20 strata $0.07$ and for 200 strata it drops to $0.002$. Based on these results it is tempting to use a large number of strata. Stratification has drawbacks however, which amplify with an increasing stratum count. First of all, the number of samples must always be a multiple of the number of strata and secondly, like the Riemann sum, stratification suffers from the curse of dimensionality.

Importance Sampling

In the previous sections, we have sampled equations uniformly. An extension to the Monte Carlo integrator allows us to change that:

$$tag{7}int_{a}^{b}f(x)dx=(b-a)E left [f(X) right ]approx frac{b-a}{n}sum_{i=1}^{n}frac{f(X)}{p(X)}$$

Here, $p(X)$ is a probability density function (pdf): it specifies the relative probability that a random variable takes on a particular value.

For a uniform random variable in the range $0..1$, the pdf is simply 1 (Figure 10a), which means that each value has an equal probability of being chosen. If we integrate this function over the domain $[0,0.5]$ we get $0.5$: the probability that $X. For $X>frac{1}{2}$, we obviously get the same probability.

Figure 10b shows a different pdf. This time, the probability of generating a number smaller than $frac{1}{2}$ is 70%. This is achieved with the following code snippet:

float SamplePdf()
{
if (Rand() < 0.7f) return Rand( 0.5f );
else return Rand( 0.5f ) + 0.5f;
} 

This pdf is defined as:

$$tag{8}p(x)=left{begin{matrix}1.4,if x

The numbers $1.4$ and $0.6$ reflect the requirement that the probability that $x is 70%. Integrating the pdf over $[0..frac{1}{2}$ yields $1.4timesfrac{1}{2}$, and $0.6timesfrac{1}{2}$ equals $0.3$. This illustrates an important requirement for pdfs in general: the pdf must integrate to 1. Another requirement is that $p(x)$ cannot be zero if $f(x)$ is not zero: this would mean that parts of $f$ have a zero probability of being sampled, which obviously affects the estimate.

of the pdf:

• A single value of the pdf does not represent a probability: the pdf can therefore locally be
greater than 1 (e.g., in the pdf we just discussed).
• The integral over (part of) the domain of the pdf however is a probability, and therefore the pdf
integrates to 1.

A single value can be interpreted as the relative likelihood that a particular value occurs.

Note that the normal distribution is a probability density function: it provides us with a probability that some random variable falls within a certain range. In the case of the normal distribution, this random variable is deviation from the mean. Like a well-mannered pdf, the normal distribution integrates to 1.

Equation 7 thus allows for non-uniform sampling. It compensates for this by dividing each sample by the relative chance it is picked. Why this matters is illustrated in Figure 11a. The plotted function features a significant interval where its value is $0$. Sampling this region is useless: nothing is added to the sum, we just divide by a larger number. Recall the iceberg in Figure 1c: there is no point in sampling the height in a large area around the iceberg.

A pdf that exploits this knowledge about the function is shown in Figure 11b. Notice that this pdf actually is zero for a range of values. This does not make it an invalid pdf: the function is zero at the same locations. We can extend this idea beyond zero values. Samples are best spent where the function has significant values. In fact, the ideal pdf is proportional to the function we are sampling. A very good pdf for our function is shown in Figure 12a. An even better pdf is shown in Figure 12b. In both cases, we must not forget to normalize it, so it integrates to 1.

The pdfs in Figure 12 pose two challenges:

1. how do we create such pdfs;
2. how do we sample such pdfs?

The answer to both questions is: we don’t. In many cases the function we
wish to integrate is unknown, and the only way to determine where it is
significant is by sampling it – which is precisely what we need the pdf for; a
classic chicken and egg situation.

In other cases however, we have a coarse
idea of where we may expect the function to yield higher values, or zero
values. In those cases, a crude pdf is often better than no pdf.

We may also be able to build the pdf on-the-fly. A couple of samples estimate the shape of the function, after which we aim subsequent samples at locations where we expect high values, which we use to improve the pdf, and so on.

In the next article we apply these concepts to rendering. A significant challenge is to construct pdfs. We will explore several cases where pdfs can help sampling.