'Hello World' on the GameBoy

12/05/2022

GameBoy "Hello World"

On a recent trip to my parent's home I had a dig through their attic where they keep the dusty stuff that we as a family have gathered over the years. I hadn't realized how much of my own things were there and left their house with a lot of junk that I don't have space for. Among that junk was a (still working!) GameBoy that my dad bought for us around 2005 from a storage room sale in Texas. Already old by that point, we only ever had two games for it: Pokemon Red and Mario Bros. Nevertheless, I played it quite a bit and have an anachronistic nostalgia for it.

I also have a technical interest in the GB. It has a little (and kinda strange) CPU, little bit of RAM, 60 fps wierd sized LCD screen. Like other cartridge based consoles, the games aren't just games, they're expansion boards that can have their own RAM and computing power. Its a very cool little system that was underpowered for its time - which arguably helped with its sales - but could be massaged to deliver a wide range of experiences: compare Tetris to Mario Bros to Zelda to Pokemon to Japanese-only release Pocket Love which turns the GB into a mobile dating sim machine.

Getting ahold of my old GameBoy made me want to get my feet wet in the world of GB development, so I read the wonderful Pan Docs, downloaded the bgb emulator, set up Rednex GB development kit, and sat down to write some assembly. After playing around a bit, I decided to set some goals for a more structured excersize: do a GB "Hello World" from scratch. For me "scratch" meant making my own bitmap font, converting it to a format the GB could use (the famous 2bpp format), and displaying it on the screen without looking at any code examples. A simple challenge that hits some basic functionality but doesn't get too bogged down in things like timing. It turned out to be instructive and has made me all the more enthusiastic to actually make a playable game, so I wanted to share the process in a tutorial format. You can find the code and font files mentioned in the post here. Note that the flavor of assembly I use is Intel, ie follows ld [destination] [source] pattern. The mmnemonics for the assembly instructions and macros are pretty hum-drum, but you can reference the Rednex assembler manual here if anything is confusing.

Breaking "Hello World" down

Hello World type examples or tutorials are usually done to give an intro or brief overview of a programming language. It's materially useful for a new user to know how to output a string to the screen or a log or something and the average person can usually make the leap from "Hello World" to outputting some other pieces of state of the system, which helps with debugging. The flip side of Hello World is that it generally glosses over the difficult or nuanced parts of what is happening: how is the string represented in memory, how are individual characters represented, how is that data moved from program memory to wherever else it has to be, how does the system decide on how my salutation will look (big vs small, color, placement), etc. It makes sense to abstract out all of this detail if the goal of the excersize is to just let a person output a string or give them a taste for a system's sytax, but digging into those details is vital if you a) want to get a deeper insight into how things are done on a low level or b) the system you're working with doesn't provide the necessary abstractions to ignore them. Working with the GB, I fit into both a and b, so my Hello World is going to get into the nitty gritty.

Having a clear goal in mind (print "Hello World" on the screen of a GB) and not having a clear path in mind to get there, let's from the end and worked backward.

Our canvas and paints

Ultimately, we want to darken LCD pixels in the shape of "Hello World". Before we determine how to do this, lets get to know the medium upon which we will be drawing. The GB LCD is 160x144 pixels and measures 47x43mm. It can dislpay 4 colors: black, dark grey, light grey, and white. The GB provides a few graphical abstractions which we are going to need to use to greet the universe. First, the graphical primitive is a tile instead of a pixel. Tiles are 8x8 pixel chunks which are defined by 16 bytes of data. Each of the 64 pixels in a tile are described by 2 bits, which determines which one of 4 colors the pixel will be shaded. Tile data is organized such that each byte pair defines one of the 8 lines of pixels, starting from the top of the tile. The LSB of the first byte is combined with the LSB of the second byte to give the necessary info about the first pixel in the line, with each subsequent pair of correspoding bits describing the pixel to the right. This format has been fittingly dubbed 2 bits-per-pixel, and, while it may seem confusing at first, once you get used to it its pretty easy to reason about (here is a good place to learn more with a nice demo). Tile data is placed in video RAM starting at location 0x8000 and can go up to 0x97FF. We don't tell the GB to directly display the tile data on the screen, instead we store an index to the tile data we want to display in a different part of VRAM, called the tile map. Each 1 byte index will cause the corresponding tile to be drawn, starting from the top left and proceeding right. Each line of tiles is 32 tiles and there is enough space for 32 lines. You might notice this is a lot more tiles than the screen can show; only the first 20 tiles in a line and the first 18 lines will be shown, but the extra that exist offscreen are still useful since they can be easily scrolled using a set of registers.

Knowing all of this, we can sketch out the steps for putting a static image on the screen:

break the image up into 8x8 pixel tiles encoded in the 2bpp format
store the tile data on our ROM somewhere
copy the tile data from our ROM to video RAM starting at location 0x8000
map the index of the tile data to where we want each tile to appear on the screen
put the indexes into video RAM starting at location 0x9800

It doesn't sound too bad, but of course there is a bit more we'll have to do to get this to work in reality. For example, video RAM is only accessible to the CPU when the pixel processing unit isn't using it (ie during VBlank or when the LCD is off) so we have to make sure we aren't trying to copy something to VRAM when we shouldn't be.

What are we putting on the screen?

We know the general outline of how to get something on the screen, but what are we actually going to put there? Usually, in a Hello World situation, we would be satisfied with calling "print" or "std::cout" or "console.log" and getting the words to appear in the terminal in whatever font our particular setup is using. Typography - and everything that comes with it - is incredibly complex and contains within it heaps of hard problems. For the most part when we type and display text on screens, we are dealing with sophisticated encodings, font systems, anti-aliasing and spacing rules, etc. We are not going to be dealing with all of that - both because we're just trying to display "Hello World" and because we're doing so on a GB - but we do need to put letters on a screen so we'll have to come up with a way of doing that. We could settle with making a pixel drawing of the words, manually cutting it into tiles and encoding it into 2bpp, but the spirit of a Hello World is to be able to move quickly to outputting other strings. In order to do that, we're going to need more than just a static image of "Hello World"; we going to need an alphabet.

For the purpose of this exercise, I've made my own font with upper and lower case letters all fitting into 8x8 pixels per letter. You can download the png font file here. I would encourage everyone to try making an 8x8 pixel font; I have no artistic or typographic experience but I had a blast making mine in MS Paint in 45 minutes or so. It kicked off an interest in typography for me and I'm definitely going to have another try at it.

In any case, now that we have a tiled alphabet, we need to put it into the 2bpp format. I would recommend to try at least one or two letters by hand to get a more intuitive sense of the format, but you can drop it into something like the GameBoy Tile Designer to quickly get the tile data in the right format. The upper case 'A' in my font looks like this when formatted: 0x18 0x18 0x24 0x24 0x42 0x42 0x42 0x42 0x7E 0x7E 0x42 0x42 0x42 0x42 0x42 0x42. You can put this data directly into the assembly file with a label or you can put in into a file to be included to keep your code a bit cleaner. The Rednex assembler has some useful macros for defining constant data like tiles, but I just use DB followed by 16 bytes so that each line is a tile.

Getting the font in memory

Transferring the font from where it starts on the ROM to VRAM is pretty simple. We need a source pointer, a destination pointer, and the size of the data we are transferring. We transfer one byte at a time from source to destination, looping while there is more data to go. This is how it looks in assembly:

LoadFont:
    push AF ; Save all the registers that we clobber during the subroutine
    push HL
    push DE
    push BC
    ld BC, 848 ; counter, equals the number of bytes of the font tile data
    ld DE, $8000 ; destination for font data
    ld HL, UPPER_FONT ; source of font data
.loop:
    ldi A, [HL] ; get the current byte and increment source pointer
    ld [DE], A ; store the current byte in destination
    inc DE ; increment the destination pointer
    dec BC ; decrement LSB of counter
    ld A, 0
    or C ; check if the LSB of the counter is zero
    or B ; check if the MSB of the counter is zero
    jr nz, .loop ; if not, loop
.finish:
    pop BC ; Restore all the registers
    pop DE
    pop HL
    pop AF
    ret

We know the source, destination, and size so we hard-code them above, but it isn't hard to parametrize them. Everything between the .loop and .finish labels is basically a do-while loop, where at the end we decrement the counter and, if it isn't zero, loop. The GameBoy's CPU provides 16-bit increment/decrement instructions, but they don't set any flags and there aren't any 16-bit compare instructions, so we have to manually test each byte of the counter to see if it is zero. We do that above by zero-ing the A register then or-ing both bytes of the counter. If either byte is non-zero, register A will have something in it, flag z will not be set, and we'll jump back up to the .loop label.

Mapping the tiles to the screen

We have the tile data in memory, but we need to map it to the screen to acually display our message. There are two places in the GameBoy memory map for putting tile maps, 0x9800 - 0x9BFF and 0x9C00 - 0x9FFF. They're exactly the same, we just have to pick one and then set it as the active tile map. We're going to use the lower range for our map, since it is active by default. But what do we put in the time map? Each byte of the tile map corresponds to an 8x8 area of the screen (plus some space off the side and bottom), on which the GameBoy will draw the tile that is in the offset in the tile map at that position. So if we want to put our letter 'H' in the very top left of the screen, we need to put 8 in 0x9800 in memory, since 'H' is 8 tiles from the beginning of the tile data section of memory. Essentially, we need to map the value of the characters in the string we are trying to print to the offset of the corresponding tile for that character in the tile data. One way of doing this would be declaring a character map in our assembly file; character maps tell the assembler to emit a different character code than that of the source file encoding when writing a raw character to memory. If we had CHARMAP "H", 8 in our assembly file, we could directly ld [$9800], "H". On the other hand, its pretty simple to map from UTF-8 to the offset. UTF-8 latin upper-case characters A-Z have codes 65-90 and lower-case a-z have codes 97-122. To map those codes onto the offsets (taking into account that the tile at offset 0 is blank) we do the following: if the character is capital, subtract 64 from the character code to get the offset, if lower-case subtract 70. This method results in a few more cycles per character, but I think its cooler to do the mapping in code (and the cycles aren't that big of a deal in our meagre program).

Now we know what we're putting into the tile map, but we also have to think about the string that we're trying to print. We need to know two things about it: where it starts and where it ends. Where it starts is easy: we can label the string and use the label, letting the assembler and linker deal with the details. There are a couple ways we can tell when the string ends: prefix the string data with the length of the data or append the string with some sentinel value. In the first case we can keep a counter as we process the string in order to know when to stop and in the latter case we can test each character code for the sentinel value to know when to stop. C-like languages have historically used "null-terminated" strings, while Pascal and its descendants used length-prefixed strings. They each have their own advantages and drawbacks (although it has been said that the widespread adoption of null-terminated strings was "the most expensive one-byte mistake"), but they're roughly equivalent in our case. I chose to go with a length-prefixed string and I'll leave it as an exercise for the reader to try to convert.

message: ; The string we'll print
    db 11
    db "Hello World"

With our string and character mapping figured out, we can put it all together and look at the function that prints our string. The function assumes that HL contains the pointer to the string when the the function is called.

PrintText:
    push AF ; Don't clobber the registers!
    push DE
    push BC
    ldi A, [HL] ; We've passed the pointer to the string in HL
    ld C, A ; Register C is going to hold our counter
    ld DE, $9800 ; Base address for tile map in VRAM

.loop:
    ldi A, [HL] ; Get char code and increment string pointer
    cp 97 ; Check if upper- or lower-case based on char code
    jr c, .upper ; If upper-case, subtract only 64
    sub 6 ; If lower-case, subtract 70
.upper:
    sub 64
    ld [DE], A ; Save the tile index in tile map
    inc DE ; Increment tile map pointer
    dec C ; Decrement the counter
    jr nz, .loop ; Loop if not zero

.finish:
    pop BC ; Restore the registers
    pop DE
    pop AF
    ret

This is pretty similar to the LoadFont function: we have a source, destination, and counter and we loop until the counter is zero. The cp 97, jr c, .upper and the two following subtractions take care of the mapping from UTF-8 to our font offset.

Setting up

The two functions above provide the core functionality for printing a message to the screen, but, like most "Hello World" examples, there is some boiler plate that needs to be in place to make stuff run. Some of what follows may not be necessary for certain emulators, but I include everything that is necessary to get the example running on an actual GameBoy. The first section of our program is the jump vectors, which sit from 0x0000 to 0x0060 in memory. These are used for the rst instructions and for interrupts; even though we don't make use of them here, it's nice to give them a safe place to jump to if they somehow get triggered, which is our program entry point:

SECTION "rst0", ROM0[$0000]
jp begin
SECTION "rst1", ROM0[$0008]
jp begin
SECTION "rst2", ROM0[$0010]
jp begin
SECTION "rst3", ROM0[$0018]
jp begin
SECTION "rst4", ROM0[$0020]
jp begin
SECTION "rst5", ROM0[$0028]
jp begin
SECTION "rst6", ROM0[$0030]
jp begin
SECTION "rst7", ROM0[$0038]
jp begin
SECTION "intVBlank", ROM0[$0040]
jp begin
SECTION "intLCDStat", ROM0[$0048]
jp begin
SECTION "intTimer", ROM0[$0050]
jp begin
SECTION "intSerial", ROM0[$0058]
jp begin
SECTION "intJoypad", ROM0[$0060]
jp begin

The next section is called the cartridge header and goes from 0x0100 to 0x014F. The header contains a bunch of info about the cartridge, but the most important areas are 0x0100 to 0x0103 - the place where the boot ROM jumps when it is finished - 0x0104 to 0x0133 - the logo shown during boot - and 0x014D - a checksum of the header. Why do these matter (besides the entry point)? When a GB starts, it executes a boot ROM, which displays the logo in the header and plays the classic ba-ding sound effect. This boot ROM also, infamously, does a little bit of snooping on the cartidge. It checks to make sure that the logo in the header matches the actual Nintendo logo and calculates the header checksum and ensures that it matches the one stored in the header; if either of these checks don't pass, the boot ROM locks up the console. This was Nintendo's attempt to prevent non-licensed cartridges from showing up, since any such homebrew would have to have a 2bpp encoded Nintendo logo contained in it, which Nintendo figured gave them the right to sue for trademark violation. These sorts of systems were kinda common at the time, but the courts didn't find them as effective as console makers would have hoped. In any case, we include the logo at 0x0104; the Rednex toolchain has a utility rgbfix which calculates and inserts the correct checksum value. The header section is as follows:

SECTION "header",ROM0[$0100]
    nop
    jp begin

; Nintendo logo
DB $CE,$ED,$66,$66,$CC,$0D,$00,$0B,$03,$73,$00,$83,$00,$0C,$00,$0D
DB $00,$08,$11,$1F,$88,$89,$00,$0E,$DC,$CC,$6E,$E6,$DD,$DD,$D9,$99
DB $BB,$BB,$67,$63,$6E,$0E,$EC,$CC,$DD,$DC,$99,$9F,$BB,$B9,$33,$3E

DB "HELLO WORLD",0,0,0,0,0   ; Cart name - padded to 16 bytes
DB 0, 0                      ; New licensee code
DB 0                         ; SGB flag
DB 0                         ; Cart type
DB 0                         ; ROM Size
DB 0                         ; RAM Size
DB 1                         ; Destination code
DB $33                       ; Old licensee code
DB 0                         ; Mask ROM version
DB 0                         ; Header checksum, set later by rgbfix
DW 0                         ; Global checksum

We also need to put our static data (the font and what we want to print) somewhere in our ROM. We do this just by labelling the data; we can use it in our code and the linker will work out the details:

INCLUDE "font.inc"

message:
    db 11
    db "Hello World"

Finally, we can get into some actual code:

begin:
    di ; Disable interrupts
    ld sp, $fffe ; Set the start of stack to the top of RAM (it "grows" down)

    ld A, %11100100 ; Set palette
    ld [$FF47], A ; Load palette into register
    ld A, 0
    ld [$FF42], A ; Scroll X to 0
    ld [$FF43], A ; Scroll Y to 0

    call OffScreen ; Turn the screen off before writing some data to VRAM

    call LoadFont ; Put the font data into VRAM
    call ClearScreen ; Clear any data in the tile map area of VRAM

    ld HL, message
    call PrintText

    ; Turn the screen back on (easier to do than turning it off)
    ld A, [$FF40] ; Get the current LCD Control register
    set 7, A ; Set the LCD/PPU control bit
    ld [$FF40], A

wait:
    nop
    jr wait

You can think of this as the "main" of our program. We do a couple of things to set the state of the GB in general (stop interrupts and set the stack pointer), prep the graphical system (set palette, set the scroll, and turn the screen off), load our font into VRAM tile data, make sure the tile map area is clear, print our message, turn the screen back on, then loop forever.

Displaying our text

A word about wrapping

To change the text displayed all we have to do is change the message (both the text and the length). If you have a message longer than 20 characters, it will run into the undisplayed columns of the tile map off to the right side of the screen. There are a bunch of ways to fix this, which I'll probably post more about later (word wrapping is an interesting problem!). The simplest way to achieve this without changing the code is to pad any string with spaces so it wraps. We can achieve the same thing conceptually, but without having to manually pad the string, with a simple algorithm. First, x is the current address into the tile map (somewhere in the range 0x9800 - 0x9BFF in our program) and taking x mod 32 gives the position in the current line. If x mod 32 is greater than 20, then we know that we are off of the visible part of the line and we need to increase x so that it points to the beginning of the next line.

PrintText:
    push AF ; Don't clobber the registers!
    push DE
    push BC
    ldi A, [HL] ; We've passed the pointer to the string in HL
    ld C, A ; Register C is going to hold our counter
    ld DE, $9800 ; Base address for tile map in VRAM

.loop:
    ldi A, [HL] ; Get char code and increment string pointer
    cp 97 ; Check if upper- or lower-case based on char code
    jr c, .upper ; If upper-case, subtract only 64
    sub 6 ; If lower-case, subtract 70
.upper:
    sub 64
    ld [DE], A ; Save the tile index in tile map
    inc DE ; Increment tile map pointer
    ; Check if needs to be wrapped around
	push DE ; saving DE, which is used to pass the dividend and 
			; by the function to return the result
	push BC ; saving BC, which is used to pass the divisor
	ld BC, 32 ; 32 is the divisor
	call fastMod
	pop BC ; restore BC
	ld A, E ; load the remainder into A
	cp 20 ; check if the remainder is equal to 20
	pop DE ; restore DE
	jr nz, .skipNewLine ; if the remainder was not 20, jump and don't add a new line
	ld A, E ; load LSB of tile map index into A
	add 12 ; add 12 to LSB, which puts it at the start of the next line
	ld E, A ; save LSB
	ld A, D ; load MSB of tile map index into A
	adc 0 ; carry over from LSB, if needed
	ld D, A	; save MSB

.skipNewLine:
	dec C ; decrement counter
	jr nz, .loop ; loop if counter is not 0

.finish:
    pop BC ; Restore the registers
    pop DE
    pop AF
    ret

The fastMod function is a common bit twiddling version of the modulus operation which works when the divisor is a power of two, which only requires two arithmetic operations - remainder = dividend & (divisor - 1) - although it takes a couple more machine operations since we're using 16-bit width integers in this case.

fastMod:
	push AF
	dec BC
	ld A, D
	and B
	ld D, A
	ld A, E
	and C
	ld E, A
	pop AF
	ret

Outro

And that's how you print "Hello World" on the GameBoy. This turned out a bit longer than I thought it would, although it goes to show the kind of detail that can usually be abstracted away with "Hello World" demos on higher level systems. I think this serves as a really good intro to putting things on the screen of a GameBoy in general and provides a few nice avenues for expansion and experimentation. It's relatively simple to add to this code and start scrolling the words or displaying some different types of graphics on screen. If you have any questions, comments, or find bugs in the code, feel free to reach out at cummings dot t287 at gmail dot com.