Wednesday, 4 March 2026

Reverse engineering Microbee Basic

TLDR version: I am reverse engineering Microbee Basic. My disassembled code is on my google drive. It's getting pretty complete, to the point where I can move code around, reassemble, and things still work.

Over the years, I've occasionally delved into the inner workings of the various Microbee Basics, generally for a specific purpose, like wanting to figure out how the cassette data is structured for restoring tapes, or more recently figuring out how the bee reads it's keyboard. With the creation of the original SuperPAK coreboard, I had a brief look to see how the PAK command worked, with the aim of seeing if we could know which PAK ROM we were from the PAK call (spoiler, it's in HL).

during a recent holiday, I made the decision to upgrade basic for the Freebee 4MB Pak Cart. I had a few goals, in increasing order of complexity:

  • Extend PAK beyond 256 * 8K = 2MB - Pak Cart has 4MB of Flash available, so I wanted to be able to type "PAK 320" and have that work.
  • Add commands to erase and copy PAKs. "DELETE D" should erase the whole 512K chip in the PAK D slot. "DELETE 63" should erase PAK 63. "COPYPAK 275 TO 63" should copy the contents of PAK 275 to PAK 63. COPYPAK D to A should copy the whole 512K PAK Cart from D to A.
  • Create a CP/M like directory structure on PAKs, so that I could have a PAK block with the directory on that PAK Cart, and be able to list it with "DIR A", run a program (potentially spanning multiple PAKs) with "RUN BLAH.M" where BLAH.M is the name of a program that can be viewed on the currently selected PAK Cart, and of course "LOAD BLAH.B" would check if BLAH.B exists on the currently selected PAK before trying to load from tape.

So as you can see it's getting increasingly ambitious, and to get things working we need to both find space in the basic ROMs to store our code, plus figure out enough of how the basic works that we can add our routines.

Finding the space is a doddle. Basic 5.29e, the premium BASIC, uses a bank select scheme to add a whole 8K ROM to the usual 16K basic. Examining this extended ROM shows that only 2K is used for the premium graphics routines, leaving us with 6K to play with. In fact the way that the ROM is implemented on the FreeBee Pak Cart coreboard, I can have an arbitrary dividing line between code that stays put (ROM B) and code that banks in and out (ROM A and C), as I just use a 32K EPROM with a lot of duplication. So I could squeeze all the staying put code into a 2K chunk, for example, and have 2 * 14K banks, for a total of 30K (2 * 14K + 2K) of available code space.

So now onto figuring out how it works so that I can make modifications. This is done through disassembly, using every possible hint to first figure out what's code and what's data, and then what routines do what in the code, and then working our way through all the routines to nut them out. Along the way we correct the disassembly stuff-ups where it's interpreted data as code, and to give meaningful names to routines and meaningful labels for jumps, with comments. Along the way we make sure our increasingly commented disassembly is correct by assembling it and performing a binary diff with the original ROM.

So let's start with the binary code for Basic 5.29e. It's a 24K file, stored as BASIC A, in memory from 8000h to 9FFFh when LV5 is 0 (ie at boot), followed by BASIC B, always in memory from A000h to BFFFh, followed by BASIC C, in memory from 8000h to 9FFF when LV5 is 1. So with our disassembler (I use Z80DASM, which is part of Z80ASM), we type:

z80dasm --origin=0x8000 --labels --output=basic.asm basic.bin

This gives us a very long file that looks like this (very first section):

; z80dasm 1.2.0
; command line: z80dasm --origin=0x08000 --labels --output=basic.asm basic.bin

	org 08000h

l8000h:
	jp l84c6h
	jp l84c6h
	jp la3e3h
	jp la3cbh
	jp la626h
	jp lacafh
l8012h:
	jp lab6dh
sub_8015h:
	jp laae6h
sub_8018h:
	jp lab26h
	jp lab17h
sub_801eh:
	jp l83d7h
l8021h:
	jp l8517h
sub_8024h:
	jp lad98h
l8027h:
	jp laf9eh
	jp la801h
	jp la7ceh
	jp lb035h
	jp lb040h
	jp lb04ch
	jp lb057h
	jp lb0a8h
	jp l80ebh
	jp l80bch
	jp l809bh
	jp l845fh
	jp l8433h
	jp l83c1h

The first section above is a "jump table". This is common in code of the era. You want to be able to advertise functions within your code, but you know that when you edit and reassemble the code things will move, so you start with a list of jumps to important functions, that way people can call the jump, which in turn goes to the function, which can then return to theirt code. While the function itself might move around, the jump never does.

So now we become detectives. There are some memory maps available that give us a bit of a hint about what's where in the basic. There's a reasonably good one in "Wildcards" by Ash, Burt, and Nallawalla. Let's use their insights to name the stuff in the jump table and add comments to everything:


;#############################################################################################
; Start of BASIC ROM A
;#############################################################################################

		org 08000h

		jp RESETTOHERE		; Start of BASIC
		jp RESETTOHERE		; BASIC warm start
		jp WAITMBEEKEY		; DGOS Wait for keyboard input - A register
		jp GETKEYIFANY		; DGOS Scan keyboard
		jp MBEEVDUFROMB		; DGOS Display character in B register
		jp GIVEPIOARM		; DGOS Give PIO an arm
l8012h:		jp CASSBYTEIN		; DGOS Get byte from cassette in A
sub_8015h:	jp CASSBLOCKIN		; DGOS Get block from cassette
sub_8018h:	jp CASSBYTEOUT		; DGOS Cassette byte out A
		jp CASSBLOCKOUT		; DGOS Cassette block out
sub_801eh:	jp RUNPROG		; Auto-execute address for saving BASIC program
l8021h:		jp BASWARMSTART		; Warm start for restoring Reset jump
sub_8024h:	jp HIRESINIT		; HIRES initialisation
l8027h:		jp LORESINIT		; LORES initialisation
		jp SETINVERSE		; INVERSE initialisation
		jp SETUNDERLIN		; UNDERLINE initialisation
		jp SETDOT		; SET dot: X = HL, Y = DE
		jp RESETDOT		; RESET dot returns Z if OK
		jp INVERTDOT		; INVERT dot
		jp TESTDOT		; Test for dot - NZ if set/error
		jp PLOTLINE		; PLOT a line
		jp GETCHAR		; Redirected input A
		jp PUTCHAR		; Redirected output A
		jp LPUTCHAR		; Redirected print output A
		jp WRMSTRCLRVAR		; Jump to BASIC with CLEAR
		jp READYMODE		; Jump to BASIC command level
		jp JMPBASFRPAK		; Jump to BASIC after NET or PAK

Every time we change a label, we do a global find and replace on the label, that way the routine gets labelled, as does every single call to the routine in the code. We use a method for labels that makes them obvious. I like to use all caps (shouty much), and I'm not afraid to use labels up to 16 characters. Yes, it means I have to hit TAB a lot, but so be it. Readability is _everything_.

Talking about readability, let's put lots of super-obvious breaks in the code. I like the ;############ sequence going all the way across the line. It makes a very obvious divider.

So the obvious next place to look is the first jump, where BASIC goes on power-up:


RESETTOHERE:
	di
l84c7h:
	ld sp,00080h
	call sub_a3c9h
	ld hl,lba6ah
l84d0h:
	ld a,(hl)
	or a
	jr z,l84dch
	ld b,a
	inc hl
	ld c,(hl)
	inc hl
	otir
	jr l84d0h

Cool, our first learning is already there from the find and replace. First thing the routine does is disable interrupts, then initialise the stack pointer to a very low address in memory. This kinda makes sense - we don't know how much memory the machine has on power up, and we know (from the memory map) that memory from 0000-0080 is a scratch pad, so we don't mind trashing that with our stack. Let's follow the first call to see what that does:


sub_a3c9h:
	reti

It's just a return from interrupt. This ensures that if there's a device that's triggered an interrupt prior to the reset, it's cleared so it's in a known state. As before, every time we learn something, we comment and label:


;#############################################################################################
; RESETTOHERE
;   input:	None.
;   output:	Performs a complete initialisation of the system.
;   affects:	Everything.
;#############################################################################################

RESETTOHERE:	di			; Disable interrupts
		ld sp,00080h		; We don't know how much RAM we have yet, so
					; Initialise stack pointer to low memory
		call RETURNINT		; Perform a reti
		ld hl,lba6ah
l84d0h:		ld a,(hl)
		or a
		jr z,l84dch
		ld b,a
		inc hl
		ld c,(hl)
		inc hl
		otir
		jr l84d0h

Now the next bit is a loop, with an exit in the jr z,l84dch bit. Looks like we get a byte from a table at lba6ah, if that's not zero, we use it as a counter, then we get the next byte from the table into C. The OTIR outputs the data pointed to by HL to the port pointed to by C, so this routine is clearly used for initialising devices from a table. These insights go into the code. First the table at lba6ah:


lba6ah:
	dec b
	ld bc,00f80h
	rla
	add a,e
	add a,e
	dec b
	inc bc
	adc a,d
	rst 38h
	sbc a,c
	or a
	ld a,a
	nop
	nop

As a disassembly, this looks really nonsensical, as it's not instructions, it's data. So let's go back to the binary ROM and open this bit up in a HEX editor to see what it is, noting that the HEX editor sees our code as starting at 0000h not 8000h, so a bit of address math is needed to find the section:


3A60: 08 FF 7F 00 00 00 00 00 00 C9 05 01 80 0F 17 83 
3A70: 83 05 03 8A FF 99 B7 7F 00 00 08 B6 C8 A3 C8 A3

Our first byte, which is used as a counter, is at BA6Ah. This is just 5. Next byte is a port address, Port 01. So we're sending five bytes (80 0F 17 83 83) to port 01. Going to the bee port map, Port 01 is the control register for PIO port A. So this code looks like it's initialising PIO port A. If we download the instruction manual for the PIO, we can follow it through. The next bit does much the same for Port 03, which is PIO port B. So let's remove the chunk of meaningless code at BA6Ah and substitute our data:


;#############################################################################################
; PORTINITDATA - data used to initialise PIO
;#############################################################################################

PORTINITDATA:	db 5, PIOACONTROL	; Initialise PIO A with five bytes
		db 080h			; Set interrupt vector 080h
 		db 00Fh			; Set port to output mode
 		db 017h, 083h, 083h	; Configure interrupt mask

 		db 5, PIOBCONTROL	; Initialise BIO B with five bytes
		db 08Ah			; Set interrupt vector 08Ah
 		db 0FFh			; Set port to control mode
		db 10011001b		; Set port b bit 0 (TAPEIN) to input
					; Set port b bit 1 (TAPEOUT) to output
					; Set port b bit 2 (RS232 CLK) to output
					; Set port b bit 3 (RS232 CTS) to input
					; Set port b bit 4 (RS232 RXD) to input
					; Set port b bit 5 (RS232 TXD) to output
					; Set port b bit 6 (SPEAKER) to output
					; Set port b bit 7 (VSYNC) to input
 		db 0B7h, 01111111b	; Set interrupt mask & enable interrupts
					; for bit transitions on bit 7 (VSYNC)
		db 0, 0			; Signify end of PORTINITDATA

Note I've started labelling the ports as something meaningful - PIOACONTROL rather than 001h. So somewhere up the top of our code we need to equate our label for PIOACONTROL to 001h. Let's put everything we know about the bee hardware ports into a file HARDWARE.asm, and include that:


;#############################################################################################
; Hardware Constants - memory organisation
;#############################################################################################

SCRATCH:	equ 00000h		; Basic Scratch Area
BASIC:		equ 08000h		; Start of Basic ROM

;#############################################################################################
; Hardware Constants - ports
;#############################################################################################

PIOADATA:	equ 000h		; PIO Port A data
PIOACONTROL:	equ 001h		; PIO Port A control
PIOBDATA:	equ 002h		; PIO Port B data
PIOBCONTROL:	equ 003h		; PIO Port B control

We'll add to this file as we learn more. Now that we've nutted out our table, we comment the code that uses it:


;#############################################################################################
; RESETTOHERE
;   input:	None.
;   output:	Performs a complete initialisation of the system.
;   affects:	Everything.
;#############################################################################################

RESETTOHERE:	di			; Disable interrupts
		ld sp,CTCV1		; We don't know how much RAM we have yet, so
					; Initialise stack pointer to low memory
		call RETURNINT		; Perform a reti
		ld hl,PORTINITDATA	; Point to port initialisation data
PORTINITLOOP:	ld a,(hl)		; Get counter for otir
		or a			; Set flags
		jr z,PORTINITFIN	; Zero - finished initialising ports
		ld b,a			; Load byte counter
		inc hl
		ld c,(hl)		; Get address of IO port from table
		inc hl			; point to data that gets sent to IO port
		otir			; send it
		jr PORTINITLOOP		; go get data for next device

Yay! We've worked our first bit out. This is essentially the process we follow for the whole ROM. Following calls and jumps, figuring out what's data and what's code, and then commenting and labelling so that it makes sense to us, not just to the CPU.

I've been doing this for maybe two months now pretty intensively. I've used a lot of great resources to figure stuff out: Wildcards, the Microbee Technical manual, and the Microbee Basic Software Hacker's Handbook, by Nigel Cottrill. I have commented maybe 70% of the code, and made some amazing insights along the way. For example, I've learned that most of the code, the core routines for Basic, are common to the Super-80; another machine that was developed at around the same time. That's because they both use "BASIC ETC" as a base, and they both just added their own IO routines to BASIC ETC and went from there.

It's also pretty clear that later writers of code did not have source code from earlier versions, as they became increasingly afraid to move code. The messiest bit was when they went from version 5.00 to 5.11, adding code to do colour. This code is an absolute shambles. The author didn't understand at all how the basic was structured. Rather than adding a keyword for colour alongside all the other keywords, they instead mangled the routines that tokenise the input line, searching for "COLO" in the line and substituting POKE commands inline. So then they needed to also mangle the code that detokenises for list so it does the opposite. Again, never moving code, instead having sudden jumps or calls to patches.

So as I mentioned at the top, I have a really good disassembly going, and I'm sharing it. You can assemble this code and it will give an exact byte-for-byte duplicate of Basic 5.29e. I've also got a highly modified version that I'm adding lots of stuff to for my Pak Cart Freebee. 9600 baud serial, 2400 baud cassette, more PAK, cleaned up colour commands, and much of the spaghetti untangled to free up masses of space. Here it is.

No comments: