AR Pixel Art Frustum

Sat Jul 27, 2024

TL;DR: Skip to the bottom for the pseudo-code.

It's pretty common when rendering pixel art to render at half (or even lower) scale and then scale it up. This mimics the look of lower resolution displays and can give your game that retro feel.

While this works well for most games, it can be a problem when you're working with AR. Objects in the distance can be just too small to see, blipping into and out of the distance. In a typical game, this can be managed by setting a particular field of view. Or ensuring the assets are the right size and at the right distance to be displayed in a nice way. But normally, this is done by using an orthographic camera, making everything the same size regardless of distance to the camera.

But, unfortunately, none of those are really options for AR.

The camera has a fixed field of view. The developer can't really change the physical hardware now, can they? And the assets can't really be set to a specific size. The user can always just move closer or further away from them.

So what can be done?

What if we render the model to its own reduced resolution texture? Then we can render that texture to the screen at the correct size. This way, the model can be rendered at a lower resolution to get our pixel art look, but we can also keep a minimum number of pixels at a distance, ensuring the model is always visible.

NOTE: This is also a technique that can be used in other situations. For example, when distant mountains in a normal game. Why continuously render a high resolution model every frame when the orientation of the mountain changes may not even change because it's so far away? Why not just render it once, cache it, and then render that cached texture at the same location? This is a technique that can be used in many situations!

So let's get started! First step is to render the model to a texture.

But we don't want to oversize the texture and waste memory. So we will size the texture based on the bounding box of the model.

Which leads to another problem. Now that the texture is not the same size as the screen, we can't use the same camera. We need a new one that can render the model to the center of the texture, while still looking like it was rendered from the perspective of the original camera.

So how do we get this new camera matrix?

Let's start with a quick refresher of a regular projection matrix.

$M_{projection} = [\begin{matrix} \frac{1}{aspect \cdot \tan (\frac{{fov}_{y}}{2})} & 0 & 0 & 0 \\ 0 & \frac{1}{\tan (\frac{{fov}_{y}}{2})} & 0 & 0 \\ 0 & 0 & \frac{- (far + near)}{far - near} & \frac{-2 \cdot far \cdot near}{far - near} \\ 0 & 0 & -1 & 0 \end{matrix}]$

`aspect`	`=`
`fov_y`	`=`
`near`	`=`
`far`	`=`

drag to rotate

$M_{perspective} = [\begin{matrix} 1.00 & 0 & 0 & 0 \\ 0 & 1.00 & 0 & 0 \\ 0 & 0 & -2.20 & -9.60 \\ 0 & 0 & -1 & 0 \end{matrix}]$

In the below, I will denote M_rc where r and c are the row and column of the matrix. With M₀₀ being the first row and first column of the matrix, ie: the top-left value.

If any of the below doesn't make sense, try playing around with the sliders, and see how the camera frustum changes. That might make it easier to visualize how these numbers affect what is rendered.

Let's start with the field of view, fov_y. We use the subscript to denote that this is specifically for the angle between the top clipping plane and the bottom clipping plane. This affects M₀₀ and M₁₁.

The aspect ratio, aspect, is the ratio of the width of the view to the height of the view. This makes the view wider or taller. In the above, this affects only M₀₀.

Hopefully this gives you a good idea of what M₀₀ and M₁₁ control. They are the angles between the left/right and bottom/top clipping planes respectively.

near and far are the distances to the near and far clipping planes respectively. These are much easier to visualize than the field of view, so hopefully no further explanation is needed.

Those 4 values are all that is needed to create our perspective matrix.

Now that we know how our original camera works, how do we create our new camera?

First let's try to figure out how the field of view needs to change.

I think it's easiest to start by looking at what we want to achieve, and then work backwards to figure out how to get there.

`width_target`	`=`
`height_target`	`=`

drag to rotate

$M_{target} = [\begin{matrix} 5.00 & 0 & 0 & 0 \\ 0 & 5.00 & 0 & 0 \\ 0 & 0 & -2.20 & -9.60 \\ 0 & 0 & -1 & 0 \end{matrix}]$

If you played around with the sliders and managed to get the projection of the "model" to match the size of the near and far planes, then you should see that the field of view (and thus the the matrix) is the same. This may or may not be obvious. If the texture is the same size as the screen, then it will be the same. But this gives us an important starting point. If the normalized device coordinates (NDC) of the model match the NDC of the screen, then the field of view will be the same.

Remember, the screen's NDCs will always be from -1 to 1, by definition.

So let's calculate the NDC of the model. We can do this by multiplying the model's top-left and bottom-right corners of the model's bounding box by the inverse of the projection matrix.

$\begin{matrix} {NDC}_{target bottom left} = {M_{camera}}^{-1} \cdot P_{bottom left} \\ {NDC}_{target top right} = {M_{camera}}^{-1} \cdot P_{top right} \end{matrix}$

Then we can find the difference between the two to get a factor to scale the field of view by.

NOTE: because the NDCs are in the range of -1 to 1, the difference will have a range of 2, so we need to take this into account too.

$\begin{matrix} M_{00} = M_{00} \cdot \frac{2}{(P_{right} - P_{left})} \\ M_{11} = M_{11} \cdot \frac{2}{(P_{top} - P_{bottom})} \end{matrix}$

Now that we have the new field of view, we need to figure out how to offset the object because it's not always going to conveniently be perfectly in the center of the screen for us.

The trick to this one can be found in the specification of the glFrustum function in OpenGL. That function takes in a left/right and a bottom/top instead of a field of view and aspect ratio. And it uses these values to set mysterious A and B values in the matrix. Because other than these two values, our existing perspective matrix looks identical.

$\begin{matrix} M_{glFrustum} = [\begin{matrix} \frac{2 \cdot near}{right - left} & 0 & A & 0 \\ 0 & \frac{2 \cdot near}{top - bottom} & B & 0 \\ 0 & 0 & C & D \\ 0 & 0 & -1 & 0 \end{matrix}] \\ A = \frac{right + left}{right - left} \\ B = \frac{top + bottom}{top - bottom} \\ C = \frac{- (far + near)}{far - near} \\ D = \frac{-2 \cdot far \cdot near}{far - near} \end{matrix}$

So what are these new A and B values doing? There's only a zero across the diagonal from them, so they aren't rotations.

These are actually skew and shear values. They basically say, for every unit along the Z-axis that you move, move along the X-axis or Y-axis by this corresponding amount.

But how can we calculate these values? When creating an AR application, we don't know the left/right/top/bottom. They are hidden inside the camera's perspective matrix. There are ways to extract them, but it's not easy. Is there a nifty trick that we can use similar to how we modified the field of view? We didn't need the left/right/top/bottom for that one.

In fact, there is!

A and B are actually in NDC space. So we can use the NDC of the model to calculate them. These values were 0 in our previous examples, because the center of the model was at 0, 0 in NDC space!

So all we have to do is calculate the NDC of the model's center.

NOTE: one gotcha is that these need to be in the NDC space of our modified frustum matrix, not the original. Which means we can't simply use the previously calculated top-right and bottom-left coordinates. There are ways to calculate these, but for our purposes, we can just recalculate the coordinate using our matrix.

${NDC}_{center} = {M_{modified fov}}^{-1} \cdot P_{center}$

NOTE: Don't forget to normalize the NDC (the N part of NDC!). This is a mistake I made. Just make sure to divide the X and Y components by the W.

Then drop these values into the formula for A and B, and you have your new frustum!

`x_target`	`=`
`y_target`	`=`
`z_target`	`=`

drag to rotate

$M_{target} = [\begin{matrix} 5.00 & 0 & 0.00 & 0 \\ 0 & 5.00 & 0.00 & 0 \\ 0 & 0 & -2.20 & -9.60 \\ 0 & 0 & -1 & 0 \end{matrix}]$

Putting it all together into one little demo to play with, we get this:

`aspect`	`=`
`fov_y`	`=`
`near`	`=`
`far`	`=`
`width_target`	`=`
`height_target`	`=`
`x_target`	`=`
`y_target`	`=`
`z_target`	`=`

drag to rotate

$M_{target} = [\begin{matrix} 5.00 & 0 & 0.00 & 0 \\ 0 & 5.00 & 0.00 & 0 \\ 0 & 0 & -2.20 & -9.60 \\ 0 & 0 & -1 & 0 \end{matrix}]$

And there you have it! A new camera matrix that will render the model to a texture at the correct size, and then render that texture to the screen at the correct size.

Here is some useful pseudo-code, in case you want to implement this in your own application. Or just for myself, so that I don't have to figure it all out again next time 😅


// NOTE: Mat4 is in column-major order. So the indices are [column][row].
function calculateModelMatrix(arCameraMatrix: Mat4) {
    let modelMatrix = arCameraMatrix;

    // Calculate the NDC of the model's texture.
    // These are ideally found using the bounds of the model's bounding box.
    // Also worth noting is that these are just the NDC of the billboarded
    // quad that the texture will be rendered to.

    let ndcBottomLeft = modelMatrix * modelBottomLeft;
    ndcBottomLeft /= ndcBottomLeft.w;
    let ndcTopRight = modelMatrix * modelTopRight;
    ndcTopRight /= ndcTopRight.w;

    // Calculate the new field of view.
    modelMatrix[0][0] *= 2.0 / (ndcTopRight.x - ndcBottomLeft.x);
    modelMatrix[0][0] *= 2.0 / (ndcTopRight.y - ndcBottomLeft.y);

    // Calculate the new offset values.
    let ndcCenter = modelMatrix * modelCenter;
    ndcCenter /= ndcCenter.w;

    modelMatrix[2][0] += ndcCenter.x;
    modelMatrix[2][1] += ndcCenter.y;

    return modelMatrix;
}

That doesn't look too scary, now, does it? 😄