Demo of Super Mario Bros. 3 gameplay.
The main components of a NES emulator are the CPU, PPU (picture processing unit), APU (audio processing unit, which I skipped), memory mappers, a cartridge decoder, and joypad/video input/output.
The NES CPU is based on the 6502 microprocessor and runs at around 1.77MHz. It's an 8-bit processor with a 16-bit address bus. It has instructions for manipulating a fixed sized 256 byte stack, interrupt handling, several addressing modes, and is little-endian.
The 6502 and derivatives were used in many computers, including the BBC Micro, Commodore 64, Apple IIe, and Tamagotchis.
The NES variant of the 6502 has about 70 instructions (varies depending if you count undocumented ones or not), and 8 memory addressing modes. The processor has a PC (program counter), SP (stack pointer), and some registers: A (accumulator), X/Y (index registers), and P (processor flags).
The CPU's 16-bit memory map addresses 2KiB of RAM, several IO registers, and the game cartridge ROM. Each NES game cartridge contains one or more ROM chips, and the majority of the 16-bit address space maps directly to the cartridge ROM. To run a game, the CPU simply fetches the instruction at $FFFC and starts executing.
Many games are larger than the 32KiB of memory space typically used by the cartridge ROM. Games can workaround this limitation by remapping parts of the cartridge address space to different memory banks. This is known as memory bank switching, or 'mapping', and requires additional circuitry in the cartridge. For example, one particular mapper can remap the memory range $8000-$BFFF (16KiB) to different memory banks, controlled by writing the bank number to a register. I implemented the four most popular mappers.
The Picture Processing Unit (PPU) generates the output video, and is relatively complicated to implement correctly. Each frame consists of foreground sprites (e.g. Pac-Man ghosts), and a background image. The background consists of 32x30 tiles of 8x8 pixels each, giving a total screen size of 256x240 pixels. The PPU uses a fixed colour palette. It's colourful so I'll show it here:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x00 | ||||||||||||||||
0x10 | ||||||||||||||||
0x20 | ||||||||||||||||
0x30 |
Flags can be set to enable colour/greyscale mode, emphasise the red/green/blue colour channels, and show/hide the sprites/background.
The CPU interacts with the PPU via 9 registers. These provide access to the sprite memory area, the PPU's internal address space, the control flags, and scroll position.
The PPU implements collision detection by raising a CPU interrupt when sprite #0 collides.
Here are some more screenshots:
There are lots of good NES emulators and references. The ones I used are: nesdev wiki, nestopia, this 6502 reference, the nestest test suite (linked from there), and more tests from here.