Saturday, November 23, 2013

Frame buffer emulation. Part I.


Early N64 emulators had no frame buffer emulation (FBE). I already don't remember who did the first step in that direction. However, I remember how it started in Glide64 project. Early versions of Glide64 also had no FBE. I started to work on this problem after version 0.42 release. It was summer of 2003. My computer was powered by 3dfx Voodoo3 card. The only way to implement FBE on that card was read rendered frame from video card memory, convert that data into N64 format and put in RDRAM area, so it can be used as a texture image later. That approach has three major disadvantages:
  • Hi-res image from the video card is scaled down to low N64 resolution. It looks ugly when it is rendered back as texture.
  • Emulation of auxiliary frame buffers is generally incorrect with this approach. These buffers must be rendered off-screen, but since we render only into the main video buffer, they all are visible.
  • It is slow in general.
Read large amount of data from AGP/PCI cards was very slow process. While reading one-two frames did not create performance problems, read every frame from the card made it unplayable. Thus, I had to find a way to detect when frame buffer read is actually necessary. I invented two-pass frame buffer emulation. On the first pass plugin briefly scanned incoming display list and analyzed how frame buffers will be used in the current frame. The second pass did normal rendering, but if frame buffer usage was detected on the first pass, buffer read was done in proper time. That approaches allowed me effectively emulate many of “Main frame buffer as texture” effects and even some of “Auxiliary frame buffers” effects described in previous article. Despite the fact that the frame buffer usage detection algorithm was very imperfect and full of heuristics, version 0.5 had the best FBE of its time. Many effects, including motion blur, was first time emulated in that version. Since frame buffers taken from the video card were scaled down to N64 resolution the result did not look very cool, but it was much much better than nothing.
The next big step was made when Orkin promoted the idea of hardware frame buffer emulation (HWFBE). The HWFBE idea is to use hardware capabilities to create auxiliary render buffers and then use them instead of frame buffers in RDRAM. That approach kills all major drawbacks of the traditional “read from video card – put to RDRAM”. Orkin implemented that idea in his OpenGL plugin glN64. Only few effects were emulated correctly, including the pause screen in Zeldas. But the working ones looked really impressive.

I'd like to implement something similar in Glide64, but my Voodoo3 did not allow me to do anything like that. Suddenly I got help from Hiroshi 'KoolSmoky' Morii, who was a famous person in 3dfx world. Hiroshi presented me Voodoo5 card and pointed on Glide3x API extensions, which allowed me to create auxiliary frame buffers in texture memory of that card.  On April 2004 Glide64 "Miracle Edition" with HWFBE support was released. Glide API frame buffer extensions allowed me to emulate any manipulations, which N64 does with its frame buffers. But, as usual with 3dfx, there was a serious problem. Voodoo5 had only circa 24 megabytes of free texture memory which I could use for my needs. 24 megabytes for all N64 textures (and later for all high-res textures) and for all frame buffers. 3dfx cards did not support non-power-of-two texture sizes. Thus, I had to use 1024x1024 RGB textures for resolution up to 1024x768, and 2048x2048 textures for higher resolutions. In later case one full screen texture buffer takes 8 megabytes (for 16bit RGB texture). Motion blur emulation requires two full screen texture buffers. Two buffers cut 2/3 of available texture memory. Obviously, I could not allocate more buffers. Thus, I had to be very careful with texture buffers allocation. My algorithm of frame buffer usage detection came very opportunely. With its help I allocated texture frame buffers in proper moments and did proper manipulations with them. The mechanism of HWFBE in Glide64 had been constantly improved, and now it can emulate a lot of very complex effects. However, some games, including Mario Tennis, left not emulated despite of all my efforts.


Goals and targets

Now, ten years after my first FBE implementation the situation is very different. I have modern video card with unlimited power, OpenGL API also evolved greatly. Thus, the goal is to implement HWFBE mechanism, which will be free of hacks and heuristics as much as possible. It is impossible to get rid of all hacks with HWFBE because HWFBE is a hack by itself: plugin violates normal emulation process using texture frame buffers instead of texture data in RDRAM.
As I described in the previous chapter, HWFBE emulation in Glide64 is limited by capabilities of target hardware, namely my old good Voodoo5. I just haven't enough video memory to play with. Now I can allocate as many texture buffers as needed. Thus, the main idea of new HWFBE mechanism is very simple:
  • Whenever N64 application allocates color or depth buffer the plugin allocates corresponding buffer in texture memory and use it for rendering.
  • When N64 application uses texture which lies in address space of allocated texture buffer, the plugin uses texture buffer instead of the original texture.
  • When emulator calls Video Interface UpdateScreen command, texture buffer is copied into main frame buffer and displayed on screen.
As you can see, HWFBE implementation requires:
  • hardware support of texture frame buffers
  • mechanism of frame buffers management: create, find by address,  remove
  • mechanism of mapping original texture parameters into frame buffer texture coordinates.


The first step

OpenGL has several ways to create frame buffers in texture memory. The modern way is use of Frame Buffer Objects (FBO). FBO is very powerful tool, and it perfectly fit to HWFBE emulation. I had to implement frame buffers management and mapping. GLideN64 is based on sources of Orkin's OpenGL graphics plugin glN64. The sources already contained Orkin's implementation of HWFBE. Since Orkin used prehistoric OpenGL mechanism for texture buffers, all program code for texture buffer allocation was moved to /dev/null. However, the functions for management of frame buffers list and for mapping original textures to frame buffer textures were implemented quite well and were adopted with little-to-moderate modifications. Thus, I can't say that I started that work from scratch.
Since the mechanism looks simple, I made first shot implementation quite quickly. However, I spent a lot of time before plugin started to show anything. Finally I made the plugin work with FBE enabled as good as without FBE. That is:
  1. Current frame successfully rendered into corresponding FBO
  2. The FBO successfully copied into video card's frame buffer and shown on screen.

Moving forward

Next step was to make actual frame buffer effects work. The first target was pause screen in Zelda OOT. It's quite complex scenario, which use main color buffer as background image and auxiliary color buffer for Link's portrait. Again, many fixes had been done before it started to work. Also, emulators add an unexpected problem. I spent many time trying to make the background visible. It was blank except the narrow strip on the top:
After hours of debugging I found that all but the first two texrect commands for background just missing in the incoming display list. This is the result of “subscreen delay fix” cheats code, enabled by default. The cheat removes long pause between pressing “Start” button and appearance of the pause screen. Now I know the price of that optimization. Glide64 works fine with that cheat because it uses a hack: it renders whole background texture when it meets the first texrect command with frame buffer texture and ignores the rest of texrects. I removed the cheat and got working pause screen. So, the cheat is incompatible with honest frame buffer emulation.

Another problem I met working on this game (and later with many other games) was the old well known problem of HFWBE: frame buffer texture is used instead of normal one: 

Plugin substitutes all textures which are inside address space of any frame buffer for the texture of that frame buffer. The game may discard that frame buffer and use its space for other textures, but it is done outside of the plugin; thus plugin does not know that previously allocated frame buffer is not valid anymore. I was engaged in ding-dong battle with that problem yet in Glide64. The battle continues in this plugin. I used many approaches to check validness of texture frame buffers, but for every approach there is a game, which does not work with it.
When I finished with Zelda OOT, I started to check which effects are working. Pause screen in Banjos - check: 

 Motion blur in Rally Challenge 2000 – check:

 CamSpy in Perfect Dark – check: 

Cars in Lego Racers – check:

Dynamic shadows – nope. N64 games usually use 8bit monochrome auxiliary frame buffers to render dynamic shadows. I had to adopt my code for that case. Done. Dynamic shadows – check:

Go next. Pause screen in Zelda MM – few fixes, check:

Motion blur in Zelda MM – another bunch of fixes, check:

All the fixes I did to make some game working were not a hacks, like “if game is Zelda, do this else do that”. On the contrary, each fix made the algorithm more general and suitable for more cases. Also, I fixed a lot of bugs in the plugin itself. The original code was 10 years old, and many things were missing or incorrect.

The assault

Finally I decided that the current implementation of HWFBE is good enough to storm the Peak: Mario Tennis. The game uses insane number of frame buffer effects in each frame, switching between frame buffers literally hundreds times per frame. Of course, it again refused to work. The bloody long assault started. The game offered resistance on every possible position. Weird sizes of auxiliary buffers; use the same space for depth buffer then for color buffer then for depth buffer again within one frame; bugs in legacy code - everything was against me. The frame buffer validness check became paranoid, but still was not good enough.  But nothing can stay against modern OpenGL and persistent debugging. Finally, two month after start the works on HWFBE emulation of Mario Tennis reached ‘playable’ level. Sometimes glitches appear here and there, but in general it is playable and enjoyable. Is it Victory? Well, the goal achieved. But the work is not finished yet. There are several frame buffer effects which still do not work. Pokemons, Paper Mario, Animal Forest are still waiting for fix. New HWFBE prove its effectiveness, so this is just a matter of time.


Thursday, October 24, 2013

Port to Mupen64Plus v2.0

One of the project's goals is "make Android port"Since "Mupen64Plus, Android Edition" is based on "Mupen64Plus v2.0",  I have to port my plugin to Mupen64Plus v2.0 first. Today my very incomplete and probably incorrect port started to work. This port took much more of my time than the previous port to Mupen64. Undefined references, missing exports etc. Nevertheless, it works now. New step to the goal is done.

Sunday, October 20, 2013

Frame buffer emulation. Intro

Frame buffer emulation is a set of techniques, which developer of hardware-rendering graphics plugin uses to emulate manipulations with color and depth buffer areas on the real console. This is one of the most important and hardest to implement part of the plugin. It is important, because many games uses manipulations with the buffers and will not work correct without it. It is hard because it is necessary somehow substitute the buffers in console memory by the content of frame buffer in PC video card. This is the introduction topic, where I will show you variety of ways of manipulations with color buffers used by N64 programmers to implement special effects.

Main frame buffer as texture

The main frame buffer is the one we see on the TV screen. It has RGB format and screen size. The most obvious use of main color buffer content is as background texture. It is often used for pause screens:

Consequent main frames can be blended with each other to get motion blur effect: 

Main buffer texture can be deformed or manipulated to get special effects:

Background image also can be post-processed by CPU to get special effects, like blur:

Some games use only part of the main buffer for special effects like TV monitors:

Or lens effect:

Primitive emulation of main frame buffer is quite simple: render frame, load frame buffer content from the video card, scale it down to the original resolution and copy to the proper place in RDRAM. Since textures are loaded from RDRAM, it will work.

CPU direct frame buffer writes

Graphics usually processed by RCP co-processor. However, since CPU and the co-processor share the same main memory, CPU can directly write an image to memory and pass it to Video Interface, thus bypassing RCP. Usually such images are static company or game logos at the start of the games:

But they can be dynamic as well and even video can be shown this way:

It is not hard to read an image from RDRAM and render it on screen. The problem is that graphics plugin implements RCP and when CPU bypasses RCP it bypasses the plugin as well. That is plugin does not know when it must stop waiting for display list and switch to RDRAM reads.

Auxiliary frame buffers

Besides main "normal" frame buffers arbitrary number of auxiliary ones can be allocated. Auxiliary buffer is not intended to be shown on TV screen; it is never passed to Video Interface and is used only as texture in main frame buffer. It can be of any size and any texture format. Auxiliary buffers used in many games for variety of effects. Some games render 3D models in auxiliary buffer and then use them as texture because manipulations with 2D object is much easier: it has only 4 vertices. Examples:
Link in pause screen, Zelda OOT

Cars in LEGO Racers

8bit auxiliary buffers often used for dynamic shadows:

Complex combination: main frame buffer is used for infrared vision effect and large number of auxiliary buffers is used to create camouflage cloak which melds with the environment:

Pokemons portraits are rendered in auxiliary frame buffers:

and many many more.

Very special auxiliary frame buffers

There are few cases, where auxiliary frame buffer is not used as texture after rendering, but is used as a base for creating another texture. The most creative example is "2D lighting" in Paper Mario. The game uses 2D sprites for characters, so normal dynamic shading is not applicable. However, in some areas parts of the sprite become highlighted when the character comes closer to a source of light:

To do that trick the main color sprite is blended with another special sprite, which defines color intensity of the main sprite. That special sprite is color-indexed texture. Color of that texture is taken from its palette. Palette is loaded into texture memory along with the texture, and texture's texels are indices in the palette. Usually palettes are part of game resources and loaded from the ROM, but palette for "2D lighting" texture generated dynamically using auxiliary frame buffer. Special texture of palette size is blended with two constant colors and rendered into the auxiliary frame buffer and then this memory area is used as palette for intensity texture. This effect was not emulated for very long time, because developers could not understand that non trivial behavior of the program. Usual frame buffer emulation techniques can't help to emulate it.

Yoshi's Story uses 8bit auxiliary buffers to dynamically build color-indexed background textures, which then will be used for rendering the main scenes:

Score board in Mario Tennis is dynamically updatable texture. When text in the scoreboard need to be updated, the game creates auxiliary frame buffer at the address of the scoreboard texture and renders new text right into it. Then updated texture is applied to its object on the main scene.
Apropos of Mario Tennis, this game is the absolute champion in frame buffer usage. Let's take typical frame:

Dynamic shadows under each character and under the ball. Auxiliary frame buffers.
Linesmen sprites first rendered into auxiliary buffers, then used in main scene.
Ball hot trail with hot air effect. Complex effect. Several auxiliary buffers with the ball trail are blended with the main frame used as texture and the result is rendered back to the main frame. As the result the objects on screen are deformed under the ball and that looks like refraction of light in hot air.
Score board and speed gauge – dynamically updated textures.
And this is not all. There are also many interesting effects in replays, including motion blur. Now you see, why this game is so hard to emulate.

Special effects based on depth buffer

Depth buffer format differs from any texture formats and its content does not used for rendering. Thus, graphics plugins usually just use depth buffer of the video card and do not care about depth buffer area in the RDRAM. However, in some cases filling this area with correct values is essential.

Coronas. Few games, including both Zeldas, draw coronas over sources of light, like torches. The game either draws coronas as a whole over all objects or not draws it at all. The decision "draw the coronas or not" is made on CPU level and is based on values in the depth buffer. If depth buffer is filled with zeros, coronas will never be drawn. If depth buffer is filled with default fill value the coronas will always be seen. Thus, correct emulation of that effect requires depth buffer area filled with actual data. Glide64 uses software depth buffer rendering for that:

While depth buffer format is not texture, it can be treated as texture if necessary. Perfect Dark uses depth buffer as a color buffer to render into it a special “texture” containing information necessary for drawing coronas in this game:

Depth buffer copy. Few games use scenes consisting of 3D models moving over 2D background. Some of objects on the background can be visually "closer" to user than 3D model, that is part of the 3D model is "behind" that object and that part must not be drawn. For fully 3D scene problem "object behind other object" is usually solved by depth buffer. 2D background has no depth, and depth buffer by itself can't help. Zelda OOT solves that problem by rendering auxiliary 3D scene with plain shaded polygonal objects, corresponding to the objects on the background. Thus, the scene gets correct depth buffer. Then the background covers this scene and 3D models rendered over the background are cut by depth buffer when the models are behind the original polygonal objects.
In Resident Evil 2 all screens are 3D models over 2D backgrounds. But the game does not render auxiliary 3D geometry to make depth buffer. Instead, the game ROM contains pre-rendered depth buffer data for each background. That depth buffer data is copied into RDRAM and each frame it is rendered as 16bit texture into a color buffer which then is used as the depth buffer. To emulate it on PC hardware the depth buffer data must be converted into format of PC depth buffer and copied into PC card depth buffer.

Depth buffer as fog texture. As I said: "depth buffer format differs from any texture formats and its content does not used for rendering." There is one exception. Depth buffer data consist of 16bit integers. Thus, in theory depth buffer content can be treated as 16bit RGBA or 16bit Intensity-Alpha texture. Depth buffer as RGBA is a color garbage. However, if use it as 16bit IA texture and take only intensity component, it can be used as fog texture. To emulate that feature we either need depth buffer area in RDRAM filled with correct values (as in case with coronas) or make PC card use N64 depth buffer format. The first way produces low-resolution fog texture which looks bad on PC monitor; the second way is technically hard (impossible?):

Video Interface emulation

Video interface (VI) transfers color frame buffer to video DAC (D-A converter) and specifies filters used during transfers. PC video card has its own video interface, and emulation of N64 VI usually reduced to frame buffers swap at right time. However, there is a nuance. VI specifies start and end scan lines on TV screen. That is defines height of displayed image. VI also sets the address of displayed frame buffer. Few N64 games create special effects just by playing with these parameters. For example, slide film effect in Beetle Adventure Racing:

Two slides here are two already rendered frame buffers, placed one by one in the main memory. Video Interface just moves origin of the displayed buffer from one slide to another. No additional rendering, just smart work with the address in memory:

Emulation of VI effects directly related to frame buffer emulation.

Friday, October 11, 2013

Linux port

Today Linux port started to work. Base plugin was ported to Linux by blight years ago, so most of the works were already done. Some actualization required, since the code did not compiled and then did not linked. CMake helped me to attach necessary modules to the project and finally I managed to build it. The first successful start shown that my hardware lighting is incompatible with NVIDIA cards/drivers. I just turned it off. Then I found that the way I used to render frame buffer objects to screen does not work on Linux. Fixed. Found and fixed memory corruption problem, which I could not find on Windows. Now the port works, but some stuff does not work properly yet.

Tuesday, September 24, 2013

Hardware LOD

MIP mapping is a basic technique of 3D computer graphics. It is supported by both PC and N64 hardware and is based on the level of detail (LOD). Cite: “LOD is used to determine or change the levels of detail for objects that reside far away or near. Objects located far away can be rendered with much less detail by setting the level, which greatly improves the drawing rate.” MIP mapping is used by many N64 games and often is essential for correct emulation. However, differences in implementation of that technique in N64 make using PC hardware MIP mapping very hard (impossible?):
  • On PC all texture's mipmaps are loaded as one texture. Then, texture magnification parameter is set to MIPMAP and further the texture can be used as any other texture. N64 mipmaps are separate tiles placed in texture memory. Particular tile index is a function of the Level of Detail of the primitive. The combiner must be set to blend two mipmap tiles using fraction of calculated LOD. Thus, in order to use PC MIP mapping the graphics plugin must somehow guess that the loaded tiles will be used as mipmaps, build one PC texture from them and then ignore combiner settings replacing them by plain texture usage. It is very indirect and hackish.
  • The mipmap levels of PC texture must be consistent. To be consistent, every mipmap level of a texture must be half the dimensions (until reaching a dimension of one and excluding border texels) of the previous mipmap LOD. N64 tiles for different mipmap levels allowed being the same, and it is used in Super Mario 64 for example. ‘Detail’ and ‘Sharpen’ modes make the situation even worse.
  • N64 mipmap levels are created for 320x240 screen resolution. The most detailed tile will always be selected on PC due to high level of details at higher PC resolutions. That not always is the desired result.
In my previous article I described the problem with hardware implementation of N64 lighting (shading). That problem is unimportant because software implementation of N64 lighting works perfectly. The software implementation is enough because lighting is calculated per vertex, and calculated vertices are passed to hardware. LOD is calculated per pixel. Thus, it must be done on hardware level. 3dfx hardware did not support custom pixel processing, so I could not implement hardware MIP mapping emulation in Glide64. I made very rough (per-polygon) approximation of LOD fraction calculation, which worked fine in few games and bad in others. The result in general was unsatisfactory, and I waited for a chance to make it right.

At first glance GLSL is a perfect tool for hardware implementation of N64 LOD calculation. Cite a N64 manual: “LOD is computed as a function of the difference between perspective corrected texture coordinates of adjacent pixels to indicate the magnification/demagnification of the texture in screen space (texel/pixel ratio). The LOD module also calculates an LOD fraction for third axis interpolation between MIP maps.” So, the task is to take texture coordinates of the current and previous fragments, put them in the LOD calculation formula and take fraction of resulted LOD. The formula uses ‘pow()’ and ‘log()’ functions supported by GLSL, fraction is also standard operation. The way looked clear, and I started the work expecting fast results. Experienced OpenGL programmers already see the trap on that way. I’m a newbie and my fall in that trap was very confusing: there is no such thing as “previous fragment” in GLSL. The fragment knows nothing about state of adjacent pixels. This is logical. RDP renders one pixel per cycle (or per two cycles in two-cycle mode) and it knows everything about pixels which were drawn before the current one. PC video card has several (many) pixel processors which work simultaneously.

Thus, there is no direct way for fragment shader to get texture coordinates of adjacent fragments. That was a bad news for me. I started to look for indirect way to fulfill the task and after digging in manuals and forums found a solution. Fragment shader can’t get information about other fragments, but it can read any texels from available textures. Pixel’s texture coordinates are two digits, which can be saved as color components of a texture. So, the solution is to draw mip-mapped textures in two passes:
  • The first pass uses an auxiliary texture frame buffer and the special shader, which draws fragment’s texture coordinates into the auxiliary LOD texture.
  • The second pass uses LOD calculation shader, which takes texture coordinates of adjacent fragments from the LOD texture and then standard shader renders fragment using calculated LOD fraction.
I’m not sure that the found solution is the best possible, as a newbie I was happy to find any working one. I spent two weeks to make it working somehow. Many technical problems made this task hardest among previously done. Nevertheless, hardware emulation of N64 LOD calculation is proved to be possible. Also, I learned Frame Buffer Object (FBO), which will be my best friend in the next long and hard quest: frame buffer emulation.
Peach-Bowser portrait transition. MIP mapping with two mipmap tiles of equal size.

Perfect dark. Walls and floor textures use MIP mapping.

Update: neobrain suggested me to use dFdx and dFdy functions to get the "difference between perspective corrected texture coordinates of adjacent pixels" needed to calculate the LOD. The functions really do the job, and now my LOD calculation shader function is simple, straightforward and close to the original N64 one. Also, it works much better than my previous texture-based two-pass implementation. neobrain, thank you very much for that hint!

Wednesday, September 11, 2013

Hardware lighting

Being impressed by the power of GLSL, I decided to implement another new feature: hardware lighting. N64 does all necessary calculations for vertex transformation and lighting in Reality Signal Processor (RSP), which is a part of multimedia Reality Co-Processor (RCP). Thus, it is safe to say that N64 uses hardware transformation and lighting (HW T&L).

3dfx hardware did not support HW T&L, so I could not implement hardware lighting when I worked on Glide64. From the other side, other graphics plugins, which dedicated to work on more advanced hardware also use software lighting calculation. I can't say that no plugins use hardware lighting because I have not read sources of all plugins, but I saw only software implementation of the lighting. I see two reasons for not using hardware capabilities for emulation of N64 lighting:

  • It is not necessary. No matter, will you calculate vertices's color in your plugin or your card will do it – visual result is the same. The only difference is in performance. N64 games are low polygonal, and there is no problem for PC CPU to do all necessary calculations. For example, UltraHLE worked full speed on Pentium II processor.
  • Lighting is another core feature of 3D graphics, which N64 implemented on its own way. In the simplest case, for lighting calculation we need color of the surface (provided via vertices), normals to the surface in each of its vertices, color of the light and light direction. OpenGL expects that all these components are provided to calculate lighting.  Vertex color is blended with light color in proportions defined by the angle between light direction and vertex normal. Pixels get their color values through interpolation of vertices's colors. This is known as Gouraud shading. N64 uses basically the same mechanism, but with essential optimization: N64 vertex structure use the same fields for color and normal. Thus vertex has either color information, when lighting is not used, or normal, but not both. But surface color is still necessary for lighting calculation. N64 does the trick: it provides light color already blended with surface color for the surface it is applied, and RSP calculates intensity of that color for particular vertex. This mechanism is not very suitable for implementation via standard OpenGL/DirectX functionality.

It is not hard to rewrite software lighting calculations with GLSL and get hardware lighting. I could write a vertex shader to calculate vertex color and obtain the same result as with software implementation. It may be interesting as an exercise, but it would give us nothing new. So, I decided to implement per pixel lighting. This is known as the Phong shading model. Phong shading provides more realistic lighting in compare with traditional Gouraud shading, but it requires much more computational resources and usually is not used in games. This picture illustrates the difference between the two shading models:

First, I disabled software lighting calculation and put lights data into OpenGL light structures. Then, I wrote fragment shader, which calculates fragment color using lights and interpolated normal. It works.

What can I say about the result? N64 3D models are low polygonal. From the one side it makes Phong shading easy task for modern cards. From the other side, the models are not smooth enough and the difference between Gouraud and Phong shading is not drastic. Result lighting is slightly more accurate, more distinct:

The old games became to look a bit nicer; my color combiner module became more complex.

There is one side effect of hardware lighting: implementation of cel (or toon) shading is possible, as it is based on lighting intensity calculation. The main idea of cel shading is to set several levels of lighting intensity and make shading constant between the levels.  I made rough implementation of cel shading:
The result does not look very cool. May be it can be tuned. Also, it should look better with specially made “cell-shading” textures, e.g with Djipi's Cel Zelda texture pack.

Saturday, August 31, 2013

Chapter 1. Combiner

Problems with emulation of N64 graphics sub-system using PC hardware rendering induced by the difference between N64 and PC graphics hardware capabilities. Many things in N64 work very different from PC approach. Twelve years ago the situation was much worse: some N64 features were impossible to reproduce on PC hardware; others required complex code, which also had to take into account capabilities of user's video card and these capabilities varies greatly. Now every modern card supports shader programs, and using shaders we can program N64 most difficult to emulate features.


One of complex tasks was emulation of N64 color combiner. The color combiner equation itself is quite simple and the formula is the same for all combine modes. Nevertheless, the problem was complex. The combine equation could not be completely reproduced with standard Glide3x API as well as with standard OpenGL of that time (1998-2001). Thus, many N64 combine modes could not be reproduced correctly on PC, only an approximation was possible, with lost of color information.


OpenGL extensions introduced with GeForce cards helped to reduce the problem. Glide API also got new extensions after Voodoo4/5 release. In particular, Glide3x combiner extension was more flexible than N64 combiner and with it I was able to reproduce N64 combiners 100% correct … but not always. The main problem is the variety of color inputs, which N64 combiner can substitute to the equation. N64 can use in the same equation two standard constant color input, plus some additional parameters can be used as another constant color input, plus noise plus fractional part of per-pixel calculated LOD plus vertex color plus textures color. And besides that N64 often uses two cycle rendering when combiner consists of two equations and the second one uses output color from the first one as its input. Glide API has only one constant color (and another one can be used in texture combiner with an API extension). Two cycles mode can be reproduced if the first cycle combines textures and the second cycle blends output of the first cycle with non-texture color inputs. But crafty N64 programmers seldom used good to emulate combine modes. Most of time they used several constant color inputs and blend them with texture and vertex colors in both cycles. That made automatic translation of N64 combiners to Glide3x ones impossible. I had to use various tricks like pre-calculation of constant expressions. For each combine mode I had to think, how it can be reproduced. Thus, each combine mode in Glide64 is manually implemented. Circa ten thousands of modes, several thousands of corresponding static functions, 16665 lines of code. Titanic work. Sisyphean toil, as some modes I could not reproduce 100% correct despite of all my contrivances.


So, the first goal I wanted to achieve in the new plugin was automatic translator of combine modes. That is where my acquaintance with OpenGL Shading Language or GLSL began. With GLSL the task proved to be not just simpler, it became easy. I don't need vertex transformation, so vertex shader is trivial. Fragment shader can have as many external constant parameters as necessary, so there was no problem to pass all necessary constant color inputs to it. Combiner equation is always the same, so I just parsed its inputs and substitute them to the body of fragment shader. Initial implementation was circa 300 lines of useful code. 300 lines for universal translator, which automatically produces 100% correct combiners. Compare with 16665 lines of code in Glide64.


Initial implementation of combiner emulation put the project on the same level with Glide64: all combine modes are supported, but not all color inputs are emulated. In particular, LOD fraction and noise emulation is missing. LOD calculation is complex task by itself. Glide64 has only rough software approximation of that parameter.  Noise by itself is not complex, but Glide API does not support random color, and I did not find a good way to implement it. To my surprise, GLSL also does not have a function which generates random output, at least at the moment. From the other side, the language is powerful enough to implement such a function. Quick dig in the Internet led me to the webgl-noise project. This is an open-source project with variety of GLSL implementations of noise function. I added one to my project. Now I have a tool to implement noise input as well as randomized color and alpha dithering. My combiner implementation became more complex: 400 lines of code.

The screen shots below illustrate how it works:
Super Smash Bros. Character selection screen: noise is used for unknown characters icons.
Zelda Majora's Mask: noise in icons

Mission Impossible: randomized color dithering in menu screen


In year 2002 I started to work on Glide64 – a graphics plugin for Nuntendo 64 emulators. 10 years I was spending most of my free time on that hobby. The goal was ambitious: to make the best graphics plugin with hardware rendering. Today several top N64 emulators use Glide64-based builds as their main graphics plugin, so the goal seems to be achieved. For those 10 years I fed up with emulation development. In 2012 I stopped to work on the project and stopped to deal with emulators. Only work, family and native applications.


After one year of such simple life I found that something is missing. I've been working on the same projects using the same tools. No need to learn new stuff. When I participated in the emu project, I had to learn a lot of stuff, which did not intersect with my main work at that moment. Later situation changed and some of that “redundant” knowledge became indispensable. Permanent learning is essential. But learning “in advance” is boring, and I'm lazy. So, I had to find interesting topic to learn and motivation to work on it.


From the other side, while I was happy to forget about debugging all these strange graphics glitches and digging in old manuals, I felt regret that some my ideas left unimplemented. Some ideas were impossible to implement with Glide API, for others I just did not find time.


Thus some day in 2013 I decided to return to N64 emulation and start new graphics plugin project. Due to severe lack of spare time, project's goals are not ambitious:

·         implement my old ideas

·         support features and games, which I failed to implement with Glide64

·         make Linux port

·         make Android port

So, main focus is on new stuff. Problems which are already solved in Glide64 (e.g. hi-res textures support) have low priority. All end-user features and demands like GUI, stability, speed, compatibility also have low priority. Such minimalist and selfish approach makes development much easier.


Next topics describe milestone features implemented in the project.