CHIP-8 emulator in Rust. Part 1

Sunday, May 9, 2021

I have been interested in the topic of emulation for many years; in particular, game console emulation. I started playing games under emulators when I was a teenager and I was fascinated by how a piece of software would let me play a game in a completely different architecture than the original one.

I’ve you’ve ever been interested in learning about emulation, maybe you’ve learned about the CHIP-8; a specification of a simple virtual machine designed to play 8bit 2D videogames in the mid 1970s. The connection between the CHIP-8 and emulation development is that the specification of the CHIP-8 is very simple and makes the perfect target for writing your first emulator. In fact, that’s what I did in 2014: I wrote my first emulator, targeting the CHIP-8 in C++. Please, don’t take that C++ code too seriously! I’ve never been experienced in C++, and I wasn’t very experienced in software development by then.

A year ago a friend of mine was interested in both practicing bit and bytes manipulation and programming in Rust. I believe I suggested him to write a CHIP-8 emulator, and I decided to join the fun as well, and write my second CHIP-8 emulator in Rust, as a weekend project.

In this article I will explain some basic concepts of the CHIP-8 and how to design an emulator for it in Rust, following patterns common to simple emulators. The emulator will run under a PC desktop OS (I use Linux with x86_64) using the SDL libraries. In a future article I will explain how to port the emulator to a baremetal ARM embedded microcontroller.

You can find the source code of the emulator in github

CHIP-8

The CHIP-8 is a specification for a simple virtual machine for 2D games. It’s virtual because there was not a hardware implementation, and it was designed so that implementations of the virtual machine (emulators) could be written for different machines allowing the same games to be played in different environments. Since it was designed in the 1970s, the specs of the VM are very low compared to current standards.

Here are some features of the CHIP-8

Monochrome 64x32 pixels display
4KB of Memory
- 512B reserved
- 3.5KB of RAM shared with the game program
16 x 8 bit registers
16 key keypad input
delay timer
sound timer
36 instructions

If you’re interested in building an emulator for the CHIP-8 I recommend you to read Cowgod’s Chip-8 Technical Reference v1.0

Emulator architecture

The emulator will be split into two parts: frontend and backend. The backend is developed as a Rust library; and the fronted imports the backend library to build an executable. The reasoning behind this division is to keep the backend completely isolated from the input and output so that it has maximum portability, and leave the implementation details of input and output at the frontend. This way if we want to port the emulator to a new system we only need to take care of making a new frontend and we can reuse the backend.

To implement this division I made use of Rust workspaces. The root of the repository contains a Cargo.toml with the following contents:

[workspace]

members = [
    "chip8",
    "sdl",
]

Where chip8 is the Rust project corresponding to the backend, and sdl is the Rust project corresponding to the desktop frontend.

Backend

The backend is implemented in the chip8 folder and contains the core implementation of the CHIP-8 Virtual Machine.

Structure of the emulator

The CHIP-8 emulator is implemented via a struct:

pub struct Chip8<R: RngCore> {
    mem: [u8; MEM_SIZE],
    v: Regs, // Register Set
    i: u16,  // Index Register
    pc: u16, // Program Counter
    stack: [u16; 0x10],
    sp: u8,                                     // Stack Pointer
    dt: u8,                                     // Delay Timer
    st: u8,                                     // Sound Timer
    keypad: u16,                                // Keypad
    fb: [u8; SCREEN_WIDTH * SCREEN_HEIGTH / 8], // Framebuffer
    tone: bool,                                 // Tone output enable
    time: isize,                                // Overtime in microseconds
    rng: R,                                     // Instance of a random number generator
}

which contains an abstraction of all the hardware components of the CHIP-8 as well as some auxiliary fields. A part from the CPU and memory, the CHIP-8 contains very simple hardware, so I implemented all the required methods on the Chip8 struct. For more complex hardware (like programmable timers, or a more complex display), I would have created abstractions via new structs (which would appear in the Chip8 struct) with their own methods.

From this point, there are basically three important parts: Instruction functions, Step function and Frame function, described below. There are other useful functions that expose the internal state of the CHIP-8 (tone and fb), and a constructor (new) and a ROM loader (load_rom).

If you want to understand in detail the code in the Instruction functions and the Step function, you’ll need to take a look at the reference of the CHIP-8.

Instruction functions

The implementation of all the CPU instructions.

All these functions return an integer which corresponds to the time taken to run the instruction.

NOTE: Usually an emulator cares about cycles taken by instructions, but since the CHIP-8 is a Virtual Machine there’s no such specification, so I’m using the average time taken by one of the original implementations for each instruction.

The instructions take different combinations of inputs made of registers and immediate values.

Here’s an example of one instruction: ADD

    /// Op: Set Vx = Vx + b.
    fn op_add(&mut self, x: Reg, b: u8) -> usize {
        let (res, overflow) = self.v[x].overflowing_add(b);
        self.v[x] = res;
        self.v[Reg(0xf)] = if overflow { 1 } else { 0 };
        self.pc += 2;
        45
    }

This instruction takes a register and an immediate value, and stores the result of adding the register value with the immediate value. Also, an overflow flag is stored in register Vf. You can also see that the program counter (pc) is incremented by 2 because this instruction is sequential and each CHIP-8 instruction is 2 bytes.

Step function

A function that decodes and executes one instruction, and returns the elapsed time.

The program of the CHIP-8 is encoded in bytes and we need to decode pairs of bytes to make instructions. This is done with a match following the specification. After the instruction is decoded, the corresponding instruction function is called.

Snippet of the beginning of this function:

    /// Execute the instruction defined by (w0, w1).  Returns the number of microseconds elapsed.
    fn exec(&mut self, w0: u8, w1: u8) -> Result<usize, Error> {
        Ok(match w0 & 0xf0 {
            0x00 => match w1 {
                0xe0 => self.op_cls(),
                0xee => self.op_ret(),
                _ => self.op_call_rca_1802(nnn!(w0, w1)),
            },
            0x10 => self.op_jp(nnn!(w0, w1)),
            0x20 => self.op_call(nnn!(w0, w1)),
            0x30 => self.op_se(self.v[Reg(lo_nib(w0))], w1),
            0x40 => self.op_sne(self.v[Reg(lo_nib(w0))], w1),
            0x50 => self.op_se(self.v[Reg(lo_nib(w0))], self.v[Reg(hi_nib(w1))]),
            0x60 => self.op_ld(Reg(w0 & 0x0f), w1),
            // [...]

Frame function

A function that executes instructions and simulates hardware for the duration of a frame.

The simulation of hardware is really simple, it involves decrementing counters of the timers, and updating the tone enable flag.

To execute the instructions, we add the time corresponding to a frame to the overtime of the last frame in order to compensate a longer frame with a shorter one. Then, in a loop, we just fetch two consecutive bytes at the program counter, execute them, and subtract the elapsed time to the frame remaining time, until we’ve used all the frame time.

Implementation code:

    /// Emulates the execution of instructions continuously until the emulated instructions total
    /// elapsed time reaches the equivalent of a frame.
    pub fn frame(&mut self, keypad: u16) -> Result<(), Error> {
        self.keypad = keypad;
        if self.dt != 0 {
            self.dt -= 1;
        }
        self.tone = if self.st != 0 {
            self.st -= 1;
            true
        } else {
            false
        };
        self.time += FRAME_TIME;

        while self.time > 0 {
            if self.pc as usize > MEM_SIZE - 1 {
                return Err(Error::PcOutOfBounds(self.pc));
            }
            let w0 = self.mem[self.pc as usize];
            let w1 = self.mem[self.pc as usize + 1];
            let adv = self.exec(w0, w1)?;
            self.time -= adv as isize;
        }
        Ok(())
    }

SDL Frontend

Now that we have the emulator backend implemented as a library, we can create an instance of the Chip8 struct and interact with it handling all inputs and outputs with an operating system dependent library: SDL. SDL stands for Simple DirectMedia Layer, and it’s library with support for many platforms that allows handing input events (keyboard, joystick and mouse) and output events (display and sound) in a way abstracted from the platform native APIs.

The SDL fronted is implemented in the sdl folder.

The program will be started via CLI, so I’m using an argument parsing library called clap to handle two arguments: the path to the ROM file, and the display scale. (Since the original display is only 64x32 pixels, rendering it without scaling would make it too small in a desktop display):

    let app = App::new("Chip8-rs")
        .version("0.0.1")
        .author("Dhole")
        .arg(
            Arg::with_name("scale")
                .short("s")
                .long("scale")
                .value_name("N")
                .help("Sets the scaling factor")
                .takes_value(true)
                .default_value("8")
                .validator(|scale| match scale.parse::<u32>() {
                    Ok(_) => Ok(()),
                    Err(e) => Err(format!("{}", e)),
                }),
        )
        .arg(
            Arg::with_name("path")
                .help("Path to the rom file")
                .index(1)
                .required(true),
        )
        .get_matches();

    let scale = app
        .value_of("scale")
        .map(|s| s.parse::<u32>().expect("scale flag can be parsed as u32"))
        .expect("scale argument is defined");
    let path = app.value_of("path").expect("path argument is defined");

Next I read the ROM contents from disk by the specified path, create an instance of the Chip8 struct and load the ROM. Notice that the Chip8 constructor takes a random value as input to be used as a seed for the random number generation. I intentionally left the seed generation to the frontend because it’s platform dependent:

    let mut rom = Vec::new();
    fs::OpenOptions::new()
        .read(true)
        .open(path)?
        .read_to_end(&mut rom)?;

    let mut chip8 = Chip8::new(rand::random());
    chip8.load_rom(&rom)?;

Next I set up the SDL environment for audio and video, and then implement the main loop which does the following:

Poll key events to detect keydown and keyup events to update the keypad bit-vector state.

        for event in event_pump.poll_iter() {
            match event {
                // [...]
                Event::KeyDown {
                    keycode: Some(keycode),
                    ..
                } => {
                    keypad |= match keycode {
                        Keycode::Num1 => 1 << 0x1,
                        Keycode::Num2 => 1 << 0x2,
                        Keycode::Num3 => 1 << 0x3,
                        Keycode::Num4 => 1 << 0xC,
                        Keycode::Q => 1 << 0x4,
                        Keycode::W => 1 << 0x5,
                        // [...]
                        _ => 0,
                    };
                }
                // [...]

Simulate a Chip8 frame

        chip8.frame(keypad)?;

Play a tone if the tone enable flag is set

        if chip8.tone() {
            device.resume();
        } else {
            device.pause();
        }

Update the display texture

        tex_display.with_lock(None, |buffer: &mut [u8], pitch: usize| {
            for y in 0..chip8::SCREEN_HEIGTH {
                for x in 0..chip8::SCREEN_WIDTH / 8 {
                    let byte = chip8.fb()[y * chip8::SCREEN_WIDTH / 8 + x];
                    for i in 0..8 {
                        let offset = y * pitch + (x * 8 + i) * 3;
                        let v = if byte & 1 << (7 - i) != 0 { 255 } else { 0 };
                        buffer[offset] = v;
                        buffer[offset + 1] = v;
                        buffer[offset + 2] = v;
                    }
                }
            }
        })?;

Copy the display texture to the window canvas, to be stretched to fill the canvas

        canvas.clear();
        canvas.copy(&tex_display, None, None)?;

Present the window canvas

        canvas.present();

Sleep the remaining frame time, so that the loop is run 60 times per second

        let now = Instant::now();
        let sleep_dur = frame_duration
            .checked_sub(now.saturating_duration_since(timestamp))
            .unwrap_or(Duration::new(0, 0));
        ::std::thread::sleep(sleep_dur);
        timestamp = now;

Here’s a capture of the end result, running the INVADERS game:

chip8 SDL frontend running the INVADERS game

About SDL video nomenclature

If you’re not familiar with SDL, you may find it hard to follow the part related to displaying the framebuffer, so I’ll explain some SDL concepts here.

Window: a window is what you expect, a frame managed by the Window Manager of your operating system where you can draw things inside, and it is also associated with input events (keyboard and mouse).
Canvas: an area that can be drawn and shown on screen. In SDL, the Window contains a canvas.
Texture: a 2D rectangular image loaded into memory (if we are using accelerated SDL, like in our case, the texture is loaded into GPU memory).

With this in mind, here’s a description of the process of updating the display:

First a texture is created
In the main loop:
- Lock the texture to access it’s buffer. Locking is necessary because the texture is being managed by SDL, and it can’t be read or written without coordination.
- Update the texture content. Since the texture format is RGB24 that’s 1 byte for each color of the RGB (with 3 bytes being 24 bits), I need to update 3 bytes in the buffer.
- Then copy the texture to the canvas. Notice that the canvas size can be bigger than the Chip8 screen, whereas the texture is the exact same size as the Chip8 screen. This copying procedure allows the texture to be scaled by hardware into the canvas.
- Present the window canvas.

Conclusion

In this first article I explained briefly what the CHIP-8 is, and I think it’s a very good candidate for your first emulator. Then I showed how I organized the emulator implementation into a backend and frontend, which is a pattern that can be applied to any emulator implementation and comes with the benefit of portability.

The backend implementation contains the core parts of a simple emulator, making it easy to follow.

The frontend shown in this article is implemented in SDL, and the description gives all the details of how to use SDL for an emulator (except for the sound part which I didn’t explain because the CHIP-8 only plays a single tone, which is not very interesting).

I hope you enjoyed this article, and I encourage you to write a CHIP-8 emulator in your favorite language!