The main components of a NES emulator are the CPU, PPU (picture processing unit), APU (audio processing unit, which I skipped), memory mappers, a cartridge decoder, and joypad/video input/output.
The NES CPU is based on the 6502 microprocessor and runs at around 1.77MHz. It's an 8-bit processor with a 16-bit address bus. It has instructions for manipulating a fixed sized 256 byte stack, interrupt handling, several addressing modes, and is little-endian.
The 6502 and derivatives were used in many computers, including the BBC Micro, Commodore 64, Apple IIe, and Tamagotchis.
The CPU's 16-bit memory map addresses 2KiB of RAM, several IO registers, and the game cartridge ROM. Each NES game cartridge contains one or more ROM chips, and the majority of the 16-bit address space maps directly to the cartridge ROM. To run a game, the CPU simply fetches the instruction at $FFFC and starts executing.
The Picture Processing Unit (PPU) generates the output video, and is relatively complicated to implement correctly. Each frame consists of foreground sprites (e.g. Pac-Man ghosts), and a background image. The background consists of 32x30 tiles of 8x8 pixels each, giving a total screen size of 256x240 pixels. The PPU uses a fixed colour palette. It's colourful so I'll show it here:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x00 | ||||||||||||||||
0x10 | ||||||||||||||||
0x20 | ||||||||||||||||
0x30 |
Flags can be set to enable colour/greyscale mode, emphasise the red/green/blue colour channels, and show/hide the sprites/background.
The CPU interacts with the PPU via 9 registers. These provide access to the sprite memory area, the PPU's internal address space, the control flags, and scroll position.
The PPU implements collision detection by raising a CPU interrupt when sprite #0 collides.
Here are some more screenshots:
There are lots of good NES emulators and references. The ones I used are: nesdev wiki, nestopia, this 6502 reference, the nestest test suite (linked from there), and more tests from here.