Sunday, November 17, 2019

Rendering in Low Level Emulation mode. Part III


In the previous article I described a new approach to processing N64 low-level polygons. This approach helped me to solve several serious issues with LLE rendering in general. However, main practical benefit was expected in Golden Eye, Perfect Dark and Killer Instinct Gold - the only games, which can't work properly without LLE support. Does the new method help to fix LLE-related issues in these games? Unfortunately, no. For example, KI with old method looks like this:
It looks nearly the same with the new method.

So, the problem is somewhere else. The source of the problem was discovered quite quickly. The sky on the screen-shot above has rendered by one low-level triangle command. As I explained before, low-level triangle commands render not triangles but trapezoids. In this case one triangle command renders the sky rectangle. I noticed that lower vertices of the rectangle have negative W coordinate. Normally, W can not be negative. Polygon with negative W vertex coordinate must be clipped. The microcode running on RSP performs the clipping. However, sky polygons in KI, sky and water polygons in GE and PD are exceptions. Crafty Rare programmers sent raw low-level polygons data directly to RDP bypassing RSP processing. That is why these games need LLE support even in HLE mode. Probably the code, which generates this low-level data is buggy and sometimes produces incorrect result. You may run KI and see that sometimes the sky is correct and few seconds later it is wrong again.

AL RDP plugin has no problems with such polygons. After long debugging I found that it implements an interesting feature: texture coordinates clamp. It is not the standard tile clamp explained in N64 manuals. It is rather a sanity test: if texture coordinate can't be calculated correctly it force clamped to some special value. Negative W is one of the cases, which triggers that clamping. I dumped texture coordinates calculated by AL RDP for KI case. Look at this diagram:
It shows how S coordinate changes from top to bottom of the rectangle. It wraps several times, but at the bottom becomes constant. It is where W coordinate turns negative. The sky polygon alone looks like this (click to see full size):
As you may see, the very bottom part of the polygon is filled with some constant color. This part usually is covered by other geometry, but I hacked the AL RDP sources to get that picture.

AL RDP software plugin emulates work of RDP and renders that polygon line by line. When W becomes negative at some line, RDP clamps texture coordinate to some constant. That constant coordinate points to some texel inside the texture, and this texel is used for all pixels in the line.

Hardware render can't work this way. Color, depth and texture coordinates provided per vertex and interpolated for each pixel inside the polygon. Interpolation is a smooth function. In this case texture coordinates do not behave smoothly and interpolation does not work as it should.

I found a solution. All coordinates work properly while W is positive. If W becomes negative for some vertex (or vertices), the algorithm searches for the Y coordinate, where W changes its sign. Then part of the polygon from the top to that Y is rendered. The part below Y rendered too, but all vertices of that part have the same texture coordinate, so it is filled with some constant color fetched from the texture. The result:

That fix also solved the water texture issue in Golden Eye Frigate level. However I met another issue there: the colors of water and sky were somehow wrong, not as dark as they have to be:
The color combiners for water and sky mix texture color with shading color. Since texturing looks correct, the problem should be in the shading color. I compared color of vertices calculated by GLideN64 with color calculated by AL RDP at the same points. The results were very close. I decided to hack the color combiner for water: remove texturing and draw only the shading color:
This result puzzled me at first. The input data is nearly the same but the output is dramatically different. Color of top vertices is dark and rather blue, so the result should be as on the right screen shot from AL RDP. Then I noticed that value of W is very high for top vertices but is very low at the bottom:

This explains the problem. N64 hardware is not powerful enough to perform perspective correction for colors. It uses plain Gouraud shading, that is simple interpolation of vertex color. GLideN64 powered by OpenGL. Modern OpenGL applies perspective correction to all outputs of vertex shader by default, including shading color of course. Perspective correction makes shading color almost constant in that case, because the differences in vertex color intensity compensated by differences in vertex W. Luckily, OpenGL allows to disable perspective correction for any parameter. I disabled perspective correction for shading color and finally got the correct result:
Thus, the LLE-specific issues in KI, GE and PD have been fixed. GLideN64 LLE rendering still has unsolved issues mentioned in the previous article. This work has WIP status. Alpha builds available to project's patrons on Patreon.com.

Sunday, November 3, 2019

Rendering in Low Level Emulation mode. Part II


In the previous article I described how N64 hardware does polygons rasterization and how graphics plugins convert low-level rasterization data into vertices for PC hardware. I also mentioned the problems of the current approach. So, the task is to solve the problems.

When I finally understood how current code for low-level triangles works (yes, I had borrowed it years ago without understanding of its work), I had no good idea why its results are so poor. Everything looked quite reasonable.

When I have problems with understanding how the RDP works, I consult with sources of angrylion's software graphics plugin aka AL RDP. This plugin is very precise emulator of RDP, a digital model of the actual chip. The sources can give you information about RDP work, which you will not find in documentations. Extraction of information this way is not easy but often it is the shortest way. I found how edge walking implemented here and compared it with GLideN64 code. No surprises here - AL RDP code is more complex and precise, it does many more checks and manipulations with input data and output result. So I decided to adapt that code for my needs in hope that it will bring better results. I spend a week or two on experiments but got as bad or even worse results as before. Most likely I did mistakes somewhere. Anyway, my disappointments in results turned into disappointments in the approach itself.

At some moment I got an idea: why should we walk the same way as actual hardware does? We have three edges, we need to find where they are intersecting. This is simpler to do with just solving lines equations! Actually, this idea came to olivieryuyu first. He started to learn that matter earlier and provided me with articles and documents how things work and how low-level data can be used to extract vertices. I did not pay attention on it first, trying to improve the old method, which already works somehow.

So, let's see how can we get vertices without edge walking. We need the line equation for each edge. Equation of a line looks as this:
y = mx + y0
where m is the slope and y0 is the y coordinate of its intersection with the y-axis.
Open our diagram for low-level triangles again:
We have inverse slope for each edge, so the equation of a line in our case is
x = ky + x0
where k is edge's inverse slope and x0 is the x coordinate of its intersection with the x-axis.
We have intersection point for the major edge H, which is XH. We also have intersection point for the first minor edge M, which is XM. Both intersection points have the same y coordinate, which is coordinate of the previous scan-line. Let's set y-axis to that y coordinate for convenience and denote it as Y0.
Thus, we have two equations:
x = DxHDy * y + XH
x = DxMDy * y + MH
where y = (Y - Y0).
We can easily solve the system of these two equations and get exact coordinates of intersection point of the edges. Good. Let's denote coordinates of that point as (X1,Y1).
How to get parameters values for that point? It is simple: with DpDe. Value of P at (X1,Y1) is
P1 = P + DpDe * (Y1 - Y0)

Next vertex is intersection point of minor edges M and L. As you see on the diagram, y coordinate of mid vertex is in range [YM,YM+1]. XL also does not coincide with vertex x coordinate in general case. L edge intersects XL, but where? "XL is calculated where the L edge intersects the next subpixel at or below the mid vertex." The intersection can be at YL or YL+1. Can we calculate it exactly? It looks as we can. Let's calculate XM` for edge M at YM:
XM' = DxMDy * YM + MH
If XM` is the same as XL then mid vertex coordinates are (XL,YM).
Otherwise, edge L intersects point (XL,YM+1), so we can build line equation for L using this point, and then find intersection point of M and L edges. Let's denote mid vertex coordinates as (X2,Y2).

Now we know coordinates of the second vertex, but the vertex is not on the major edge. How will we get values of all parameters for that vertex? It is not hard either:

  • find the point on the major edge at mid vertex Y, that is Y2. Let's name x coordinate of that point as XH', so the point's coordinates are (XH',Y2).
  • calculate parameters for point (XH',Y2) using DpDe.
  • We know X2 and XH', so we can calculate the distance X2-XH', and thus we can calculate value of every parameter with DpDx:
    P2 = P + DpDe * (Y2 - Y0) + DpDx * (X2 - XH')

Since we have line equations for all three edges, we can find intersection point of the major edge H and minor edge L, which is the third vertex of our triangle. This vertex lies on edge H, so its parameters calculated the same way as for the first vertex. Thus, we've got three vertices per low-level triangle. The task completed.

The algorithm looks simple, does not it? I made first-shot implementation pretty quickly, but then I spent weeks trying to make it work at least not worse than the older method, and some more time to make it better. The devil in the details.

Which problems of the old method are solved with the new one:

  • Performance. New method produces 3 vertices and thus 1 hardware triangle per low-level triangle. When low-level triangle represents trapezoid, it still requires 4 vertices and thus two hardware triangles, but it is a rare case. Thus, the new method produces significantly less data for PC hardware than the old one. I would be happy to say that it led to significant boost in performance, but it is not true. The difference in performance is almost negligible. The bottle-neck of LLE rendering is not in amount of data and number of polygons, it is in number of API calls. Each LLE triangle rendered with separate GL draw call, while in HLE mode the plugin caches triangles when possible. I had no idea how to implement triangles caching for LLE mode, and, truly speaking, did not care much about it. This time I looked at the problem more carefully and found a simple but efficient way to cache LLE triangles. The speed-up after that optimization can be called tremendous. Now GLideN64 is even faster than Z64, which is also well optimized.
  • Lines support. As I explained before, line is a special case of low-level triangle, and it requires a special support in the code. It was not done for the old method, so lines rendered incorrectly. I've implemented support for special cases for the new method:

    It is still not perfect if compare with HLE mode lines, and I hope to improve it.
  • Sticking out polygons. I see them no more:
  • Depth compare issues. Some poke-through issues have been fixed:
  • Issues with shading disappeared:
  • Depth buffer emulation issues. I adapted my software depth render to work with new LLE triangles. Depth buffer based effects such as coronas now work correct, as you may see on the previous screen shot.
Unsolved problems:
  • Poke-through issue with decal surfaces. Decal textures worked poorly with old method and still work the same with the new one:

    I still don't know where the problem hides.
  • Gaps between polygons. No positive changes here. I just hope that the new method brings no regressions in compare with the old one.
And the most interesting from the practical view question: does the new method help to fix issues with Golden Eye and Killer Instinct Gold? No, it does not.

To be continued.