Episode 2: Diving Into Assembly - Printing Numbers

Joshua Foster

01 Sep 2016 — 7 min read

By Konstantin Lanzet - CPU collection Konstantin Lanzet, CC BY-SA 3.0

Let's look at our "Hello, PCjr!" code from Episode 1:

[cpu 8086]
[org 100h]

mov dx, str_msg
mov ah, 9h
int 21h

mov ax, 4c00h
int 21h

str_msg: db 'Hello, PCjr!', 0ah, 0dh, '$'

Going line-by-line:

Tell NASM we're going to be targeting an 8086 CPU. The IBM PCjr is equipped with an 8088, which is identical to an 8086 except that it uses 8 bits instead of 16 for the bus. This way NASM will warn us if we use an instruction not supported by the 8088.
When DOS loads a .COM program into memory, it will jump to byte 256 and start executing whatever it finds there. 256 is 100 in hex notation, and in NASM we can write 0x100 or 100h.
Load the address of the variable "str_msg" (an array of characters reading "Hello, PCjr!", then a carriage return, a line feed, and a "$") into register DX.
Load the value 9h (9 hex, or 9 in decimal) into register AH (which is the high byte of register AX).
Invoke software interrupt 21h. DOS provides access to the screen, keyboard, file storage, and other services through these INT 21h calls. When we call this, DOS will perform a function for us based on what value we have put into AH. In this case, 9 is "write string to stdout". It will print characters from the address pointed to by DX to the screen, one at a time, stopping when it encounters a '$'. This page from scu.edu is a pretty good reference for INT 21h services.
Load the hex value 0x4c00 into register AX.
Call INT 21h again. What service will we invoke? Whatever is in AH. Remember AH is the high byte of AX. Since AX contains 0x4c00, AL is 0x00 and AH is 0x4c. Service 0x4c is "terminate the program". So we're done -- we get booted back out to the command prompt.

The rest of this post (and series) will assume you've read and digested the resources at the bottom. You will learn how a CPU works, how assembly instructions are formatted, what instructions are available for our 8088, and what they do.

So we can print a string of characters, great. What if we need to print a number? There is no INT 21h service to do that -- you can only print individual characters and strings. We'll have to convert the number to a string ourselves, then use INT21h service 9 to print it. To convert a single digit to a character, we'll need to find the ASCII code for it. The character '0' is decimal 48 (hex 0x30), so if we have a value 0 through 9 we can just add 48 to it to produce the corresponding ASCII character. So now the question is how to we separate a multi-digit number into individual digits? Well, if we divide the number by 10, we get our first digit back as the remainder. If we continue to do this until our division results in 0, we will have peeled off all the digits of the number in reverse order. If we were to push each digit onto the stack, we could pop them off one-at-a time into a string, then print that string to the screen!

So here's the basic algorithm, assuming the number we want to format is in AX and DI contains a pointer to an empty buffer:

Divide AX by 10, yielding the result in AX and the remainder in DX.
Push our single-digit remainder DX onto the stack.
Increment a counter CL (to keep track of how many digits we have)
If the result of the division was non-zero, go to step 1 and repeat, else continue.
Pop into AX one of the digits we pushed earlier.
Add 48 to it (you can also use hex 0x30, or the character '0', all the same value) to convert it to an ASCII character.
Write the character (which is in AL, the lower byte of AX) to the memory location pointed at by DI, then increment DI (the STOSB instruction, short for "store string byte", does all this for you in one step)
Decrement the counter CL. If it is not zero, we have more digits to process; repeat from step 5, else continue.
We're done... put a '$' at the end of the buffer so the printing routine knows where to stop.
We can now print the string by loading the buffer address into DX and a 9 into AH and calling INT 21h.

Here's what I came up with. It should handle any positive integer that can fit into AX (which is 16 bytes long, so 0-65535).

[cpu 8086]
[org 100h]

mov ax, 65432         ; Load AX with the number we want to print
mov di, buf16         ; Load DI with the address of an empty buffer

mov bx, 10            ; Load divisor and clear counter
mov cx, 0
peelOffDigits:
  div bx              ; This will give us the remainder in DX
  push dx             ; Push it onto the stack
  inc cl              ; Increment our counter
  test ax, ax         ; If no result, we're done
  jnz peelOffDigits
buildString:
  pop ax              ; Pop out the next digit
  add al, '0'         ; Add 48 to get the ascii value
  stosb               ; Store the char in AL into the location at
                      ; [DI], then increment DI
  loop buildString    ; Continue if we have more digits

mov byte [di], '$'    ; Terminate the string

mov dx, buf16         ; Load DX with the address of the buffer
mov ah, 9             ; Load AH with 9
int 21h               ; Call INT21h fn 9 to print the buffer

mov ax, 4c00h         ; Call INT21h fn 0x4c to exit the program
int 21h

buf16: times 16 db 0  ; An empty 16-byte buffer

I won't explain this program line-by-line, I'll again refer you to the list of links at the bottom of the post. I'll just point out some important points:

The last line is telling NASM that we want a piece of memory 16 bytes long, initially filled with zeroes, and we want to refer to it buf16. Data ALWAYS goes after the program code.
DI is the "destination index" register. Normally if you want to write to an area of memory byte-by-byte, you point DI at it, then use STOSB which copies from AL into the address pointed at by DI, then increments DI to point it to the next byte in memory.
PUSH pushes the value of a register onto the 'stack', a sort-of temporary storage area. POP pulls the value from the top of the stack into a register. It is very important that POPs match PUSHES both in location and data size -- for example if you PUSH AX then PUSH DL, then POP AX, you'll actually pull DL and one of the bytes of AX into AX.
The TEST instruction performs a logical AND on the two given registers and sets some flags in the flags register based on the result. One of the flags is ZF, basically "was the result of the last operation zero". JNZ looks at this flag to determine whether it should jump or not. So "TEST XX, XX... JNZ label" translates to "if XX is non-zero, jump to label 'label', else continue on".
The LOOP instruction is sort of a test and jump in one. It decrements the counter register CX, and if it is not zero, jumps to the specified label. So "MOV CX, n... label:... LOOP label" will perform the code in between 'label:' and LOOP label n times.
When we append the terminator '$' to the string we have to specify that we're only putting a byte (not a word) into the location at DI. If we were moving a register into DI NASM would know how many bytes to move based on the size of the register, but when we use a constant (called 'immediate mode'), we must specify it.

So let's run it and see what we get:

A hot bowl of nothing. To boot, it doesn't even terminate -- it just sits there with a blinking cursor. What happened?

Well, I don't yet have a good grasp on debugging machine code, but I sure can do some Caveman Debugging. Knowing that my INT 21h 0x4c service call works great to terminate the program, and that the program was not getting terminated, there was only one thing to do: Move the terminate-program call earlier and earlier in the program until it actually terminates properly. And I finally found it: The DIV BX instruction. Reading a little more about the instruction, I learned that it actually takes DX:AX as one long 32-bit number and divides it by the given register (in this case BX), so if DX is non-zero the calculation will not be what you expect. How this was causing the app to hang I don't know, but I suspect that whatever garbage was in DX evaluated to a negative number, and DIV only deals with positive numbers. I really need to figure out how to look at register values as the program runs!

The solution is to clear DX before we do our divide:

xor dx, dx

XORing a register with itself sets it to 0. We could have also done MOV DX, 0 but from what I have read XORing a register with itself is faster because it doesn't involve any external values. Now let's try again:

Success!

Our assembly will soon get too long to follow if we don't take steps to structure it! In the next episode we'll look at procedures (like functions/methods in higher-level languages) and NASM macros to make the code shorter and more organized.

Full source for this episode can be downloaded at:
https://github.com/josh2112/pcjr-asm-game/tree/episode-2

Episode 2: Diving Into Assembly - Printing Numbers

Joshua Foster

Read more

Episode 18: A couple quick fixes

Episode 17: Text UI

Episode 16: Adding depth

Episode 15: Adding Boundary Lines