Saturday, November 23, 2013

Frame buffer emulation. Part I.


Early N64 emulators had no frame buffer emulation (FBE). I already don't remember who did the first step in that direction. However, I remember how it started in Glide64 project. Early versions of Glide64 also had no FBE. I started to work on this problem after version 0.42 release. It was summer of 2003. My computer was powered by 3dfx Voodoo3 card. The only way to implement FBE on that card was read rendered frame from video card memory, convert that data into N64 format and put in RDRAM area, so it can be used as a texture image later. That approach has three major disadvantages:
  • Hi-res image from the video card is scaled down to low N64 resolution. It looks ugly when it is rendered back as texture.
  • Emulation of auxiliary frame buffers is generally incorrect with this approach. These buffers must be rendered off-screen, but since we render only into the main video buffer, they all are visible.
  • It is slow in general.
Read large amount of data from AGP/PCI cards was very slow process. While reading one-two frames did not create performance problems, read every frame from the card made it unplayable. Thus, I had to find a way to detect when frame buffer read is actually necessary. I invented two-pass frame buffer emulation. On the first pass plugin briefly scanned incoming display list and analyzed how frame buffers will be used in the current frame. The second pass did normal rendering, but if frame buffer usage was detected on the first pass, buffer read was done in proper time. That approaches allowed me effectively emulate many of “Main frame buffer as texture” effects and even some of “Auxiliary frame buffers” effects described in previous article. Despite the fact that the frame buffer usage detection algorithm was very imperfect and full of heuristics, version 0.5 had the best FBE of its time. Many effects, including motion blur, was first time emulated in that version. Since frame buffers taken from the video card were scaled down to N64 resolution the result did not look very cool, but it was much much better than nothing.
The next big step was made when Orkin promoted the idea of hardware frame buffer emulation (HWFBE). The HWFBE idea is to use hardware capabilities to create auxiliary render buffers and then use them instead of frame buffers in RDRAM. That approach kills all major drawbacks of the traditional “read from video card – put to RDRAM”. Orkin implemented that idea in his OpenGL plugin glN64. Only few effects were emulated correctly, including the pause screen in Zeldas. But the working ones looked really impressive.

I'd like to implement something similar in Glide64, but my Voodoo3 did not allow me to do anything like that. Suddenly I got help from Hiroshi 'KoolSmoky' Morii, who was a famous person in 3dfx world. Hiroshi presented me Voodoo5 card and pointed on Glide3x API extensions, which allowed me to create auxiliary frame buffers in texture memory of that card.  On April 2004 Glide64 "Miracle Edition" with HWFBE support was released. Glide API frame buffer extensions allowed me to emulate any manipulations, which N64 does with its frame buffers. But, as usual with 3dfx, there was a serious problem. Voodoo5 had only circa 24 megabytes of free texture memory which I could use for my needs. 24 megabytes for all N64 textures (and later for all high-res textures) and for all frame buffers. 3dfx cards did not support non-power-of-two texture sizes. Thus, I had to use 1024x1024 RGB textures for resolution up to 1024x768, and 2048x2048 textures for higher resolutions. In later case one full screen texture buffer takes 8 megabytes (for 16bit RGB texture). Motion blur emulation requires two full screen texture buffers. Two buffers cut 2/3 of available texture memory. Obviously, I could not allocate more buffers. Thus, I had to be very careful with texture buffers allocation. My algorithm of frame buffer usage detection came very opportunely. With its help I allocated texture frame buffers in proper moments and did proper manipulations with them. The mechanism of HWFBE in Glide64 had been constantly improved, and now it can emulate a lot of very complex effects. However, some games, including Mario Tennis, left not emulated despite of all my efforts.


Goals and targets

Now, ten years after my first FBE implementation the situation is very different. I have modern video card with unlimited power, OpenGL API also evolved greatly. Thus, the goal is to implement HWFBE mechanism, which will be free of hacks and heuristics as much as possible. It is impossible to get rid of all hacks with HWFBE because HWFBE is a hack by itself: plugin violates normal emulation process using texture frame buffers instead of texture data in RDRAM.
As I described in the previous chapter, HWFBE emulation in Glide64 is limited by capabilities of target hardware, namely my old good Voodoo5. I just haven't enough video memory to play with. Now I can allocate as many texture buffers as needed. Thus, the main idea of new HWFBE mechanism is very simple:
  • Whenever N64 application allocates color or depth buffer the plugin allocates corresponding buffer in texture memory and use it for rendering.
  • When N64 application uses texture which lies in address space of allocated texture buffer, the plugin uses texture buffer instead of the original texture.
  • When emulator calls Video Interface UpdateScreen command, texture buffer is copied into main frame buffer and displayed on screen.
As you can see, HWFBE implementation requires:
  • hardware support of texture frame buffers
  • mechanism of frame buffers management: create, find by address,  remove
  • mechanism of mapping original texture parameters into frame buffer texture coordinates.


The first step

OpenGL has several ways to create frame buffers in texture memory. The modern way is use of Frame Buffer Objects (FBO). FBO is very powerful tool, and it perfectly fit to HWFBE emulation. I had to implement frame buffers management and mapping. GLideN64 is based on sources of Orkin's OpenGL graphics plugin glN64. The sources already contained Orkin's implementation of HWFBE. Since Orkin used prehistoric OpenGL mechanism for texture buffers, all program code for texture buffer allocation was moved to /dev/null. However, the functions for management of frame buffers list and for mapping original textures to frame buffer textures were implemented quite well and were adopted with little-to-moderate modifications. Thus, I can't say that I started that work from scratch.
Since the mechanism looks simple, I made first shot implementation quite quickly. However, I spent a lot of time before plugin started to show anything. Finally I made the plugin work with FBE enabled as good as without FBE. That is:
  1. Current frame successfully rendered into corresponding FBO
  2. The FBO successfully copied into video card's frame buffer and shown on screen.

Moving forward

Next step was to make actual frame buffer effects work. The first target was pause screen in Zelda OOT. It's quite complex scenario, which use main color buffer as background image and auxiliary color buffer for Link's portrait. Again, many fixes had been done before it started to work. Also, emulators add an unexpected problem. I spent many time trying to make the background visible. It was blank except the narrow strip on the top:
After hours of debugging I found that all but the first two texrect commands for background just missing in the incoming display list. This is the result of “subscreen delay fix” cheats code, enabled by default. The cheat removes long pause between pressing “Start” button and appearance of the pause screen. Now I know the price of that optimization. Glide64 works fine with that cheat because it uses a hack: it renders whole background texture when it meets the first texrect command with frame buffer texture and ignores the rest of texrects. I removed the cheat and got working pause screen. So, the cheat is incompatible with honest frame buffer emulation.

Another problem I met working on this game (and later with many other games) was the old well known problem of HFWBE: frame buffer texture is used instead of normal one: 

Plugin substitutes all textures which are inside address space of any frame buffer for the texture of that frame buffer. The game may discard that frame buffer and use its space for other textures, but it is done outside of the plugin; thus plugin does not know that previously allocated frame buffer is not valid anymore. I was engaged in ding-dong battle with that problem yet in Glide64. The battle continues in this plugin. I used many approaches to check validness of texture frame buffers, but for every approach there is a game, which does not work with it.
When I finished with Zelda OOT, I started to check which effects are working. Pause screen in Banjos - check: 

 Motion blur in Rally Challenge 2000 – check:

 CamSpy in Perfect Dark – check: 

Cars in Lego Racers – check:

Dynamic shadows – nope. N64 games usually use 8bit monochrome auxiliary frame buffers to render dynamic shadows. I had to adopt my code for that case. Done. Dynamic shadows – check:

Go next. Pause screen in Zelda MM – few fixes, check:

Motion blur in Zelda MM – another bunch of fixes, check:

All the fixes I did to make some game working were not a hacks, like “if game is Zelda, do this else do that”. On the contrary, each fix made the algorithm more general and suitable for more cases. Also, I fixed a lot of bugs in the plugin itself. The original code was 10 years old, and many things were missing or incorrect.

The assault

Finally I decided that the current implementation of HWFBE is good enough to storm the Peak: Mario Tennis. The game uses insane number of frame buffer effects in each frame, switching between frame buffers literally hundreds times per frame. Of course, it again refused to work. The bloody long assault started. The game offered resistance on every possible position. Weird sizes of auxiliary buffers; use the same space for depth buffer then for color buffer then for depth buffer again within one frame; bugs in legacy code - everything was against me. The frame buffer validness check became paranoid, but still was not good enough.  But nothing can stay against modern OpenGL and persistent debugging. Finally, two month after start the works on HWFBE emulation of Mario Tennis reached ‘playable’ level. Sometimes glitches appear here and there, but in general it is playable and enjoyable. Is it Victory? Well, the goal achieved. But the work is not finished yet. There are several frame buffer effects which still do not work. Pokemons, Paper Mario, Animal Forest are still waiting for fix. New HWFBE prove its effectiveness, so this is just a matter of time.