Notes for 3D Gaussian Splatting renderer

This blogpost is a web version of my notes for my “WebGPU 3D Gaussian Splatting” renderer. I’ve focused only on the forward path, especially what to do after we already have the covariance matrix. There are 2 main approaches here:

  • Use the larger eigenvalue to find a square around the Gaussian’s mean. Most often used with a tiled renderer.
  • Use eigenvectors to find the axis of the ellipsis. Place vertices on these axes.

I’ve described the math for both approaches.

These are my notes, the notation is going to be sketchy. Don’t worry, my tiny handwriting would have been much worse to read! I don’t work in this sector. It’s been a while since I had to use LaTeX on this blog. If you get something out of this page, then it’s a value added for you. If not, well, it’s not my problem.

Our main sources will be:

Goal

Each Gaussian’s transform is described using position, rotation, and scale. When projected onto a screen, it will have the shape of an ellipsis. You might remember from the principal component analysis that the eigenvectors of the covariance matrix form the axis of this ellipsis. Our goal is to get the image space covariance matrix.

Multivariate Gaussian distribution with its eigenvectors.

Principal component analysis of a multivariate Gaussian distribution. The vectors shown are the eigenvectors of the covariance matrix.

Covariance 3D

Each Gaussian has a scale (vector [sx,sy,sz][sx, sy, sz]) and rotation (quaternion with real part qrq_r and imaginary parts qi,qj,qkq_i, q_j, q_k). We need transformation matrices for both.

Scale:

S=[sx000sy000sz]S = \begin{bmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & s_z \end{bmatrix}

Rotation (assuming unit quaternion):

R=[12(qj2+qk2)2(qiqjqkqr)2(qiqk+qjqr)2(qiqj+qkqr)12(qi2+qk2)2(qjqkqiqr)2(qiqkqjqr)2(qjqk+qkqr)12(qi2+qj2)]R = \begin{bmatrix} 1 - 2 (q_j^2 + q_k^2) & 2 (q_i q_j - q_k q_r) & 2 (q_i q_k + q_j q_r) \\ 2 (q_i q_j + q_k q_r) & 1 - 2 (q_i^2 + q_k^2) & 2 (q_j q_k - q_i q_r) \\ 2 (q_i q_k - q_j q_r) & 2 (q_j q_k + q_k q_r) & 1 - 2 (q_i^2 + q_j^2) \\ \end{bmatrix}

From the definition of covariance matrix you might notice that XXTXX^T is proportional to the sample covariance matrix of the dataset X. If the mean is 0, the sum becomes a dot product. Like matrix multiplication. How does covariance change if we have 2 transformation matrices? Let O=RSO=RS, the combined transformation. Then the covariance of this 3D Gaussian is Σ=OOT\Sigma=OO^T. From the properties of the transposition, we know that (AB)T=BTAT(AB)^T = B^T A^T. Then Σ=RSSTRT\Sigma=RSS^TR^T. The scale matrix is diagonal, so ST=SS^T=S. The transposition of the rotation matrix is equal to its inverse. This might come in handy during debugging. Be prepared for left/right-handedness issues.

Covariance is the parameter of the Gaussian. It’s better if you don’t think of “calculating the covariance matrix given scale and rotation”. Instead, the Gaussian is parametrized by the covariance. So it happens that RSSTRTRSS^TR^T gives a covariance matrix. Not sure if this makes sense, but I found this to be clearer. Additionally, backpropagation is easier with rotation and scale instead of the raw matrix. Covariance matrices have to be symmetric and positive semi-definite.

Example model that looks incorrect.

Model rendered with the inverted rotation matrix.

Covariance 2D

In 3D graphics, to project a vertex into a clip space we multiply its position by view and projection matrices. Let WW denote our view matrix. For Gaussians we will use the following Jacobian instead of a projection matrix:

t=pWt = p W
J=[focalx/t.z0(focalxt.x)/(t.zt.z)00focaly/t.z(focalyt.y)/(t.zt.z)000000000]J = \begin{bmatrix} focal_x / t.z & 0 & -(focal_x \cdot t.x) / (t.z \cdot t.z) & 0 \\ 0 & focal_y / t.z & -(focal_y \cdot t.y) / (t.z \cdot t.z) & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix}
  • p - position of the Gaussian in world space.
  • t - position of the Gaussian in view space.

Check the explanation for the Jacobian in “EWA Volume Splatting”, section 4.4 “The Projective Transformation”. From what I understand, the mapping to perspective space can introduce perspective distortion. Depending on where on screen the Gaussian is rendered, it would have different size? The paper mentions that this mapping is not affine. I.e. it does not preserve parallelism. Use ray space instead. Find more details in:

Example model that looks incorrectly.

Rendered model with projection matrix used instead of Jacobian.

The values for focal distance depend on the camera settings. Based on the perspective matrix values and geometric definition of tangent:

const focalX = projectionMat[0] * viewport.width * 0.5;
const focalY = projectionMat[5] * viewport.height * 0.5;

The perspective matrix calculation only takes half of the provided field of view. Hence multiplication by 0.5. Depending on our coordinate system, you might have to transpose this matrix.

With this, we can calculate Σ\Sigma':

Σ=JWΣWTJT\Sigma' = JW \Sigma W^T J^T

This uses the same trick as we have used for Σ\Sigma calculations.

We are only interested in the 2D projection of the Gaussian (we discard depth). Get the 2D variance matrix from the Σ\Sigma' by removing the third, and fourth rows and columns:

Σ=[abcdefghijklmnop][abef]=cov2d\Sigma' = \begin{bmatrix} a & b & c & d \\ e & f & g & h \\ i & j & k & l \\ m & n & o & p \end{bmatrix} \Leftrightarrow \begin{bmatrix} a & b \\ e & f \end{bmatrix} = cov2d

If the determinant of the cov2d is ≤ 0 then the shape would be a parabola or a hyperbola instead of an ellipsis.

At this point, the reference implementation adds 0.3I0.3 \cdot I to the cov2d. It’s done to ensure that the matrix is invertible. You can find the detailed explanation in AI葵’s “Gaussian Splatting Notes (WIP)”.

Calculating eigenvalues

I recommend reading the Eigenvalues_and_eigenvectors article on Wikipedia. The “Matrix examples” section has nice examples.

Let’s calculate the eigenvalues. To make notation simpler, I’ve assigned a letter to each element of the 2D matrix cov2d.

cov2d=[abbc]cov2d = \begin{bmatrix} a & b \\ b & c\end{bmatrix}
det(cov2dλI)=[aλbbcλ]=λ2(a+c)λ+acb2det(cov2d - \lambda I) = \begin{bmatrix} a-\lambda & b \\ b & c-\lambda\end{bmatrix} =\lambda^2 - (a+c)\lambda + ac - b^2

det(cov2dλI)=0det(cov2d - \lambda I) = 0 when (calculating the roots of the quadratic equation):

λ1=0.5(a+c+a22ac+4b2+c2)\lambda_1 = 0.5 \cdot (a+c + \sqrt{a^2 - 2ac + 4b^2 + c^2})
λ2=0.5(a+c+a22ac+4b2+c2)\lambda_2 = 0.5 \cdot (a+c + \sqrt{a^2 - 2ac + 4b^2 + c^2})

Remember that (a22ac+4b2+c2)=(ac)2+4b2a^2 - 2ac + 4b^2 + c^2) = (a-c)^2 + 4b^2. Might make the implementation easier.

The λ1\lambda_1 and λ2\lambda_2 (the eigenvalues) are the radiuses of the projected ellipsis along major and minor axes. Since the value for the square root is always >1 (at least in our case), then λ1\lambda_1 > λ2\lambda_2. You would expect this from the ellipsis projection.

Right now we have 2 options. We can either get the eigenvectors to derive vertex positions. This is not what the “3D Gaussian Splatting” paper did. They’ve collected all the Gaussians for each tile of their tiled renderer using conservative approximation. We will derive both solutions, starting with the one used in the paper.

Method 1: Project Gaussian to a square

Calculate tiles

The Gaussian distribution has a value even at infinity. This means that every Gaussian affects every pixel. Most of them with negligible results. We want to know which closest tiles it affects the most. Look at the 1D Gaussian distribution, and find radius “r” so that [r,r][-r, r] contains most of the area. For normal distribution with μ=0,σ=1\mu=0, \sigma=1:

Error=1rrex2/22πError = 1 - \int_{-r}^r \frac{e^{-x^2/2}} {\sqrt{2 \pi}}
Normal distribution N(0, 1).

Normal distribution with μ=0,σ=1\mu=0, \sigma=1. Percentage numbers for relative area between each standard deviation.

For example, if you picked r=3r=3, then the integral has a value of ~0.9973. Such an approximation has an error of ~0.27%. As you might suspect, the variance will affect the calculations. We end up with r=3σr = 3 \sqrt{\sigma}. Sigma is a variance of the Gaussian. For us, it’s λ1,λ2\lambda_1, \lambda_2 (where λ1\lambda_1 has a bigger value).

let radius = ceil(3.0 * sqrt(max(lambda1, lambda2)));

The radius is in pixels. We also have the Gaussian’s position projected into an image space. Every tile inside the radius around the projected position is affected by the Gaussian. To map it to tiles, we calculate the affected square: projectedPosition.xy +- radius.xy.

let projPos = mvpMatrix * gaussianSplatPositions[idx];
// (we can also cull the Gaussian here if we want)
projPos /= projPos.w; // perspective divide
let projPosPx = ndc2px(projPos.xy, viewportSize);
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/59f5f77e3ddbac3ed9db93ec2cfe99ed6c5d121d/cuda_rasterizer/auxiliary.h#L46
let BLOCK_X = 16, BLOCK_Y = 16; // tile sizes
let blockSize = vec2<i32>(BLOCK_X, BLOCK_Y);
// block count for x,y axes
let grid = (viewportSize.xy + blockSize.xy - 1) / blockSize.xy;
let rectMin = vec2<i32>(
  min(grid.x, max(0, (int)((projPosPx.x - radius) / BLOCK_X))),
  min(grid.y, max(0, (int)((projPosPx.y - radius) / BLOCK_Y)))
);  
let rectMax = vec2<i32>(
  min(grid.x, max(0, (int)((projPosPx.x + radius + BLOCK_X - 1) / BLOCK_X))),
  min(grid.y, max(0, (int)((projPosPx.y + radius + BLOCK_Y - 1) / BLOCK_Y)))
);

With this, we know which tiles are affected by each Gaussian.

Depth sorting

For each tile, we will need its Gaussians sorted by the distance to the camera (closest first). For sorting on the GPU, Radix sort is popular, but Bitonic sort is much easier to implement. On the CPU, you can use your language’s built-in sort for a terrible performance. Or Counting sort for something 6+ times better (according to my app’s profiler).

Gaussian’s value at pixel

You might know the general form of the Gaussian distribution:

f(x)=1σ2π exp( 12 (xμσ)2)f(x) = \frac{1}{\sigma \sqrt{2\pi}} ~ exp({   -\frac{1}{2}   (\frac{x-\mu}{\sigma})^2 })

“EWA Volume Splatting” provides a function (eq. 9) for the density of an elliptical Gaussian GV(xp)G_V(x − p) centered at a point pp with a variance matrix VV:

GV(xp)=12πV12 exp( 12 (xp)T V1(xp))G_V(x-p) = \frac{1}{2\pi |V|^{\frac{1}{2}}} ~ exp(   -\frac{1}{2}   (x-p)^T   V^{-1}(x-p) )

The above formula depends on the dimensionality. Only the (2π)d/2(2\pi)^{-d/2} changes. Here, “d” denotes dimensionality (2 in our case).

In our case, V=cov2d matrix. We need its determinant (V|V|) and inverse V1V^{-1}:

V1=adj(V)/VV^{-1} = adj(V) / |V|
adj([abbc])=[cbba]adj(\begin{bmatrix} a & b \\ b & c\end{bmatrix}) = \begin{bmatrix} c & -b \\ -b & a\end{bmatrix}

In “EWA Volume Splatting”, the inverse of the variance matrix is called the conic matrix. You might have seen this word used as a variable name.

Projected Gaussian it's mean marked a vector from an example pixel center.

Projected 3D Gaussian. μ\mu (“p” in “EWA Volume Splatting” notation) is the projected mean of the Gaussian. The “x” is the center of an example pixel.

We will now look at  12 (xp)T V1(xp){  -\frac{1}{2}  (x-p)^T  V^{-1}(x-p)}. Let’s (xp)=[x0y0](x-p) = \begin{bmatrix} x_0 & y_0 \end{bmatrix} and:

V1=[abbc]V^{-1} = \begin{bmatrix} a & b \\ b & c \end{bmatrix}

After matrix multiplication and a dot product:

12 (xp)T V1(xp)12(ax02+2bx0y0+cy02)-\frac{1}{2}  (x-p)^T  V^{-1}(x-p) \Leftrightarrow -\frac{1}{2} (ax_0^2 + 2bx_0y_0 + cy_0^2)

We can then calculate exp(12(ax02+2bx0y0+cy02))exp(-\frac{1}{2} (ax_0^2 + 2bx_0y_0 + cy_0^2)) as an exponential falloff from the mean point of the Gaussian at the pixel p. Multiply the result by the Gaussian’s innate opacity (learned during the training). The 12πV12\frac{1}{2\pi |V|^{\frac{1}{2}}} was removed (?). Your trained model file will not display correctly if you add it back in.

Blending

Rendering transparent objects is always done starting from the closest ones. Blend N Gaussians for a pixel using the following formula (eq. 3 from “3D Gaussian Splatting” paper):

C=iNciαij=1i1(1aj)C = \sum_{i \in N} c_i \alpha_i \prod_{j=1}^{i-1}(1-a_j)
  • cic_i - RGB color of the i-th Gaussian.
  • αi\alpha_i - a combination of Gaussian’s innate transparency at the current pixel wrt. its projected position and the opacity learned during the training.

This equation can be rephrased as:

  1. The closest Gaussian writes its color cic_i. Let’s say that α0=0.7\alpha_0 = 0.7, then we write result0=rgba(0.7c0,0.7)result_0 = rgba(0.7 \cdot c_0, 0.7).
  2. Next closest Gaussian (let’s say α1=0.9\alpha_1 = 0.9) blends its color: result1=rgba(result0.rgb+0.9c1(1result0.a),0.9(1result0.a))result_1 = rgba(result_0.rgb + 0.9c_1 \cdot (1 - result_0.a), 0.9 \cdot (1 - result_0.a)).
  3. Third closest Gaussian (let’s say α2=0.5\alpha_2 = 0.5) blends its color: result2=rgba(result1.rgb+0.5c2(1result1.a),0.5(1result1.a))result_2 = rgba(result_1.rgb + 0.5c_2 \cdot (1 - result_1.a), 0.5 \cdot (1 - result_1.a)).
resultn+1.rgb=resultn+cn+1αn+1(1resultn.a)result_{n+1}.rgb = result_n + c_{n+1} \cdot \alpha_{n+1} \cdot (1-result_n.a)
resultn+1.a=αn+1(1resultn.a)result_{n+1}.a = \alpha_{n+1} \cdot (1-result_n.a)

Once the first Gaussian has written color with α0=0.7\alpha_0 = 0.7, the subsequent Gaussians can only affect the remaining 0.3 of the color. The 2nd Gaussian had α1=0.9\alpha_1 = 0.9. It writes its color with effective opacity (10.7)0.9=0.27(1 - 0.7) * 0.9 = 0.27. The 3rd Gaussian can only affect (10.7)(10.9)=0.30.1=0.03(1 - 0.7) \cdot (1 - 0.9) = 0.3 \cdot 0.1 = 0.03 of the output color. This repeats till we have gone through all Gaussians that touch this pixel. Writing to resulti.aresult_i.a is a convenient way to carry over alpha calculations to the next Gaussians.

At some point, the impact of subsequent Gaussians will be negligible. From the paper:

“When we reach a target saturation of 𝛼 in a pixel, the corresponding thread stops. At regular intervals, threads in a tile are queried and the processing of the entire tile terminates when all pixels have saturated (i.e., 𝛼 goes to 1). (…) During rasterization, the saturation of 𝛼 is the only stopping criterion. In contrast to previous work, we do not limit the number of blended primitives that receive gradient updates. We enforce this property to allow our approach to handle scenes with an arbitrary, varying depth complexity and accurately learn them, without having to resort to scene-specific hyperparameter tuning”.

Code

Below is the corresponding code from the original repo. I’ve changed the formatting, and substituted comments to correlate the code with the equations above.

float2 xy = collected_xy[j];
float2 d = { xy.x - pixf.x, xy.y - pixf.y }; // (x-p), in my equations it's [x_0, y_0]
// Values for conic. Assigning the values that I have used:
// a = con_o.x
// b = con_o.y
// c = con_o.z
float4 con_o = collected_conic_opacity[j];
float power = -0.5f * (con_o.x * d.x * d.x + con_o.z * d.y * d.y) - con_o.y * d.x * d.y;
if (power > 0.0f)
continue;
// con_o.w is the Gaussians' alpha learned during the training.
// alpha is the opacity if there was no blending
float alpha = min(0.99f, con_o.w * exp(power));
if (alpha < 1.0f / 255.0f) // skip if alpha too small
continue;
// Blending accumulation. T is from the previous iteration
float test_T = T * (1 - alpha);
if (test_T < 0.0001f) {
done = true;
continue;
}
// Eq. (3) from 3D Gaussian splatting paper.
for (int ch = 0; ch < CHANNELS; ch++) {
// Gaussian's learned color * opacity
C[ch] += features[collected_id[j] * CHANNELS + ch] * alpha * T;
}
T = test_T;// blending accumulation to reuse during next loop

We have now seen how to render the Gaussian by projecting it as a square. This technique is often used in tiled (compute shader-based) renderers. Or, we can use the eigenvectors to calculate vertex positions in the rendered pipeline.

Method 2: Calculate eigenvectors

Sorting and the index buffer

We will be blending the Gaussians into the framebuffer. Firstly, sort them based on the depth (closest to the camera first). A popular way is to use an index buffer. Render each Gaussian as a quad (2 triangles). With triangle list topology, each triangle has 3 indices. Total 6 indices (vertex shader invocations) per Gaussian. In the vertex shader, we need to know both the splat index and which vertex of the splat we are working with. Steps:

  1. Sort the Gaussians and write the Gaussian’s index into a u32[] buffer of size BYTES_U32 * splatCount.
  2. Create a buffer for “unrolled” indices of size BYTES_U32 * splatCount * 6.
  3. Fill the “unrolled” based on the sorted Gaussians (see code below).
  4. Read splatIdx and vertexInSplatIdx inside vertex shader:
    1. @builtin(vertex_index) in_vertex_index: u32 or gl_VertexID depending on the shading language.
    2. let splatIdx = in_vertex_index / 6u;
    3. let vertexInSplatIdx = in_vertex_index % 6u;

Code for “unrolling” the sorted splatIds to encode both splatIdx and vertexInSplatIdx:

for (let i = 0; i < splatCount; i++) {
  const splatIdx = sortedSplatIndices[i];
  // write 6 consecutive values
  unrolledSplatIndices[i * 6 + 0] = splatIdx * 6 + 0;
  unrolledSplatIndices[i * 6 + 1] = splatIdx * 6 + 1;
  unrolledSplatIndices[i * 6 + 2] = splatIdx * 6 + 2;
  unrolledSplatIndices[i * 6 + 3] = splatIdx * 6 + 3;
  unrolledSplatIndices[i * 6 + 4] = splatIdx * 6 + 4;
  unrolledSplatIndices[i * 6 + 5] = splatIdx * 6 + 5;
}

Afterward, bind the index buffer and draw splatCount * 6 vertices.

Vertex shader - eigenvectors

We will continue from the point where we have calculated λ1\lambda_1 and λ2\lambda_2. Remember this image from the beginning of the article?

Multivariate Gaussian distribution with its eigenvectors.

Principal component analysis of a multivariate Gaussian distribution. The vectors shown are the eigenvectors of the covariance matrix.

Continuing with the eigenvalues and eigenvectors article, calculate the eigenvector for λ1\lambda_1:

[aλ1bbcλ1][v1v2]=[00]\begin{bmatrix} a-\lambda_1 & b \\ b & c-\lambda_1\end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}
{v1(aλ1)+v2b=0v1b+v2(cλ1)=0\begin{cases} v_1(a-\lambda_1)+v_2b = 0\\ v_1b + v_2(c-\lambda_1)=0 \end{cases}

Expressing v2v_2 in terms of v1v_1, so we have [v1,v2]T[v_1, v_2]^T that we can normalize:

v1(aλ1)+v2b=v1b+v2(cλ1)v_1(a-\lambda_1) + v_2b = v_1b + v_2(c-\lambda_1)
v2bv2c+v2λ1=v1bv1a+v1λ1v_2b - v_2c + v2\lambda_1 = v_1b - v_1a + v_1\lambda_1
v2(bc+λ1)=v1(ba+λ1)v_2(b-c+\lambda_1) = v_1(b-a+\lambda_1)
v2=v1(a+b+λ1)/(bc+λ1)v_2 = v_1 (-a+b+\lambda_1) / (b-c+\lambda_1)

The eigenvector:

[v1v1(a+b+λ1)/(bc+λ1)]\begin{bmatrix} v_1 \\ v1 (-a+b+\lambda_1) / (b-c+\lambda_1) \end{bmatrix}

We have the values for a,b,c,λ1a, b, c, \lambda_1. We will treat this as a normalized vector - the directions of the ellipsis major/minor axis. In code, we can substitute any value for v1v_1, as we will normalize it anyway. You can also solve:

1=v12+(v1(a+b+λ1)/(bc+λ1))21 = \sqrt{ v_1^2 + ( v_1 (-a+b+\lambda_1) / (b-c+\lambda_1) )^2 }
let diagonalVector = normalize(vec2(1, (-a+b+lambda1) / (b-c+lambda1)));
let diagonalVectorOther = vec2(diagonalVector.y, -diagonalVector.x);
let majorAxis = 3.0 * sqrt(lambda1) * diagonalVector;
let minorAxis = 3.0 * sqrt(lambda2) * diagonalVectorOther;

cov2d is a symmetric matrix and λ1λ2\lambda_1 \neq \lambda_2, so the eigenvectors are orthogonal. We scale respective axis vectors by radii using (just as in the “tiled” renderer) r=3σr = 3 \sqrt{\sigma}. This works but may lead to problems at oblique angles. Most implementations clamp it to 1024.

I’ve seen some implementations multiply σ{\sigma} by 2\sqrt{2} instead. Either I’ve missed something, or they (if I had to guess) forgot to divide focalxfocal_x and focalyfocal_y by 2.

The fixes result in the following implementation:

let majorAxis = min(3.0 * sqrt(lambda1), 1024) * diagonalVector;
let minorAxis = min(3.0 * sqrt(lambda2), 1024) * diagonalVectorOther;
// depends on which quad vertex you calculate
let quadOffset = vec2(-1., -1.);// or (-1., 1.), or (1., 1.), or (1., -1.)
projPos.xy += quadOffset.x * majorAxis / viewportSize.xy
            + quadOffset.y * minorAxis / viewportSize.xy;
result.position = vec4<f32>(projPos.xy, 0.0, 1.0);
result.quadOffset = quadOffset;

Fragment shader

Full fragment shader at the end of this section. Whole 6 LOC.

Let’s imagine that our vertices produced a square. Then imagine our fragment shader draws a circle inside of this square (something like length(uv.xy - 0.5) < 1). This is exactly what we will do. With 2 modifications:

  • Smudge a circle so it does not have hard edges. Cause you know, it’s Gaussian.
  • Use Gaussian’s color and opacity for blending.
Circle gradient when vertices form a square and rhombus.

Circle gradient when vertices form a square and rhombus.

To solve the first issue, we go back to “EWA Volume Splatting”, section 6 “Implementation”:

Fragment from 'EWA Volume Splatting' explaining e to the power of minus half radius.

Derivation of e12re^{-\frac{1}{2}r}. From “EWA Volume Splatting”.

The equation (9) is the one we have seen during tile renderer discussions:

GV(xp)=12πV12 exp( 12 (xp)T V1(xp))G_V(x-p) = \frac{1}{2\pi |V|^{\frac{1}{2}}} ~ exp({   -\frac{1}{2}   (x-p)^T   V^{-1}(x-p) })

I’m not sure about the derivation of e12re^{-\frac{1}{2}r}. It’s the Gaussian’s opacity based on the pixel’s distance from the center. The Gaussian’s color also includes transparency. This means color blending. That’s the reason why we have sorted the Gaussians before the rendering.

Blending settings

In graphic API we do not have access to colors written by previous invocations of the fragment shader. Instead, use the following blend functions:

 color: {
  srcFactor: "one-minus-dst-alpha",
  dstFactor: "one",
  operation: "add",
},
alpha: {
  srcFactor: "one-minus-dst-alpha",
  dstFactor: "one",
  operation: "add",
},

They are equivalent to what we have discussed for the tiled renderer. Make sure that the clearColor for the framebuffer sets the alpha component to 0.

Final fragment shader

The whole fragment shader in WGSL:

@fragment
fn fs_main(fragIn: VertexOutput) -> @location(0) vec4<f32> {
  let r = dot(fragIn.quadOffset, fragIn.quadOffset);
  if (r > 1.0){ discard; } // circle
  let splatOpacity = fragIn.splatColor.w;
  let Gv = exp(-0.5 * r);
  let a = Gv * splatOpacity;
  return vec4(a * fragIn.splatColor.rgb, a);
}

You might notice that both rendering methods calculated either exp(12(ax02+2bx0y0+cy02))exp(-\frac{1}{2} (ax_0^2 + 2bx_0y_0 + cy_0^2)) or exp(12r)exp(-\frac{1}{2}r). In truth, both are exchangeable if you know how to derive them and adjust for vertex positions.

References

Thanks for reading!