Assembly vs C - how many bytes does it take to change a single byte in a file?



How many bytes does it take to change a single byte in a file? Well, a simple patch program just needs to do several system calls to OS (to make the change and to print some output to the user). Such executable should never exceed a kilobyte, right? Right?

Well, it very much depends on your choice of the programming language…

A couple of weeks ago I were gifted an original box with Need For Speed Special Edition, I decided to play the DOS version from the disc on my retro computer. The game started if DOS was loaded into the lower memory, but the in-game video playback was completely broken. Fortunately there exists a solution described in this brilliant post by Michal Necasek Need for Speed SE video glitch The solution is extremely simple - you need to change a value of one byte in the game executable.

As an exercise I decided to write two implementations of the patch program - one on Assembly and another on C (using the standard C library). I were curious how big will be the compiled programs.

Here is my assembly implementation. patch.asm:

	org 100h
	mov dx, FILE_NAME
	mov al, 010b
	mov ah, 3Dh
	int 21h
	jc file_open_error
	mov bx, ax
	mov ax, 4200h
	mov cx, 0006h
	mov dx, 07E6h
	int 21h
	jc patch_error
	mov ah, 40h
	mov cx, 1
	mov dx, NEW_BYTE_VALUE
	int 21h
	jc patch_error
	mov ah, 3Eh
	int 21h
	mov dx, MSG_SUCCESS
	mov ah, 9
	int 21h
	mov al, 0
	mov ah, 4Ch
	int 21h
patch_error:
	mov dx, MSG_ERROR_P
	mov ah, 9
	int 21h
	mov ah, 3Eh
	int 21h
	jmp exit_with_err
file_open_error:
	mov dx, MSG_CANT_OPEN
	mov ah, 9
	int 21h
exit_with_err:	
	mov al, -1
	mov ah, 4Ch
	int 21h

NEW_BYTE_VALUE: db 08h
MSG_SUCCESS:	db "The file has been successfully patched!", 0Dh, 0Ah, "$"
MSG_ERROR_P:	db "Error patching file", 0Dh, 0Ah, "$"
MSG_CANT_OPEN:	db "Can not open: "
FILE_NAME:	db "NFS.EXE", 0h
MSG_LINE_END:	db 0Dh,0Ah, "$"

As expected it is just a series of system calls and several jumps for error handling. A makefile to compile it with NASM:

patch.com : patch.asm
	nasm -o patch.com patch.asm

The result is a COM executable of 171 bytes. Nice!

Now let’s see what C will get us. Here is my C equivalent. patch.c:

#include <stdio.h>

#define FILE_NAME "NFS.EXE"
#define BYTE_POS 0x607E6
#define BYTE_VAL 0x08

int main() {
   FILE* fp;
   fp = fopen(FILE_NAME, "rb+");
   if (fp == NULL) {
       fputs("Can not open: ", stdout); // fputs does not add \r\n, unlike puts
       puts(FILE_NAME);                 // puts footprint is lesser than printf
       return -1;
   }
   if ((fseek(fp, BYTE_POS, SEEK_SET) != 0) || (fputc(BYTE_VAL, fp) == EOF)) {
       puts("Error patching file");
       fclose(fp);
       return -1;
   }
   fclose(fp);
   puts("The file has been successfully patched!");
   return 0;
}

Here is the makefile to compile and link it using Open Watcom:

patch.obj : patch.c
	wcc -0 -d0 -ms patch.c
patch.com : patch.obj
	wlink system com file patch.obj

Before analysing the results, I compiled an empty C program in Dos as a COM executable using Open Watcom. The result size of the COM file without any useful payload slightly exceeded a kilobyte, and most of it was occupied by the C Runtime Library.

The role of the C Standard Library is even bigger - the program with the actual code from the above compiles to a COM file of 7490 bytes. Different functions from the standard library had different impact, for example using more simple puts instead of printf helped to decreased the size by several kilobytes.

The difference in the compiled sizes between the programs written on ASM and vanilla C is almost 44x. Most of the difference is of course the code from the Standard Library, most of which is actually never executed in the program and this illustrates the need of such projects as Nolibc.

Conclusion

As a next experiment I, probably, should try building it using some minimal libc replacement to see how smaller can I get it with C, but honestly, my impression is that I don’t actually want to use it for such simple task. To do such trivial job as patching, I can instruct the computer directly, get predictable size of the compiled binary and enjoy the minimalism by using pure Assembly. Yes, it’s not cross-platform, but this patch would not be used on other platforms anyway, and seeing humble 171 bytes across the name of an executable is something very satisfying, something nice, long forgotten and lost in the past decades. I’m declaring Assembly the winner :-)