Assembly vs C - how many bytes does it take to change a single byte in a file?



How many bytes does it take to change a single byte in a file? Well, a simple patch program just needs to do several system calls to OS (to make the change and to print some output to the user). Such executable should never exceed a kilobyte, right? Right?

Well, it very much depends on your choice of the programming language…

A couple of weeks ago I were gifted an old box with Need For Speed Special Edition, I decided to play the DOS version from the disc on my retro computer. The game started if DOS was loaded into the lower memory, but the in-game video playback was completely broken. Fortunately there exists a solution described in this brilliant post by Michal Necasek Need for Speed SE video glitch The solution is extremely simple - you need to change one byte in the game executable. I did it using a hex editor and it helped.

As an exercise I decided to write a program that would patch the game without needing to manually change the value in a hex editor. As an experiment I decided to make two implementations - one on Assembly and another on C (using the standard C library) - I were curious how big will be the compiled programs.

The example code bellow is not specific to any game, it uses some arbitrary values for the byte position (0x12345), the new byte value (0x42) and the file name (ABC.EXE) to be generic. (If you are interested in fixing NFS SE see the link above to the original post for the specific values to use).

Here is my assembly implementation - patch.asm:

	org 100h
	mov dx, FILE_NAME
	mov al, 010b
	mov ah, 3Dh
	int 21h
	jc file_open_error
	mov bx, ax
	mov ax, 4200h
	mov cx, 0001h    ; Position of the byte
	mov dx, 2345h    ; in the file
	int 21h
	jc patch_error
	mov ah, 40h
	mov cx, 1
	mov dx, NEW_BYTE_VALUE
	int 21h
	jc patch_error
	mov ah, 3Eh
	int 21h
	mov dx, MSG_SUCCESS
	mov ah, 9
	int 21h
	mov al, 0
	mov ah, 4Ch
	int 21h
patch_error:
	mov dx, MSG_ERROR_P
	mov ah, 9
	int 21h
	mov ah, 3Eh
	int 21h
	jmp exit_with_err
file_open_error:
	mov dx, MSG_CANT_OPEN
	mov ah, 9
	int 21h
exit_with_err:	
	mov al, -1
	mov ah, 4Ch
	int 21h

NEW_BYTE_VALUE: db 42h
MSG_SUCCESS:	db "The file has been successfully patched!", 0Dh, 0Ah, "$"
MSG_ERROR_P:	db "Error patching file", 0Dh, 0Ah, "$"
MSG_CANT_OPEN:	db "Can not open: "
FILE_NAME:	db "ABC.EXE", 0h
MSG_LINE_END:	db 0Dh,0Ah, "$"

As expected it’s just a series of system calls and several jumps for error handling. A Makefile to compile it with NASM:

patch.com : patch.asm
	nasm -o patch.com patch.asm

The result is a COM executable of 171 bytes. Nice!

Now let’s see what C will get us. Here is my first attempt patch.c:

#include <stdio.h>

#define FILE_NAME "ABC.EXE"
#define BYTE_POS 0x12345
#define BYTE_VAL 0x42

int main() {
   FILE* fp;
   fp = fopen(FILE_NAME, "rb+");
   if (fp == NULL) {
       fputs("Can not open: ", stdout); // fputs does not add \r\n, unlike puts
       puts(FILE_NAME);                 // puts footprint is lesser than printf
       return -1;
   }
   if ((fseek(fp, BYTE_POS, SEEK_SET) != 0) || (fputc(BYTE_VAL, fp) == EOF)) {
       puts("Error patching file");
       fclose(fp);
       return -1;
   }
   fclose(fp);
   puts("The file has been successfully patched!");
   return 0;
}

Here is the Makefile to compile and link it using Open Watcom:

patch.obj : patch.c
	wcc -0 -d0 -ms patch.c
patch.com : patch.obj
	wlink system com file patch.obj

Before analysing the results, I compiled an empty C program in DOS as a COM executable using Open Watcom. The result size of the COM file without any useful payload slightly exceeded a kilobyte, and most of it was occupied by the C Run-Time Library placed by the compiler.

The role of the Standard Library turned out to be even bigger - the program with the code from the above compiled to a COM file of 7490 bytes. Different functions from the standard library had different impact, for example - using a more simple puts instead of printf helped to decreased the size by several kilobytes. I went further and replaced all stdio.h functions by more lean POSIX system calls from unistd.h (which are conveniently supported by the DOS compiler):

#include <fcntl.h>
#include <unistd.h>

#define FILE_NAME "ABC.EXE"
#define BYTE_POS 0x12345

unsigned char BYTE_VAL[1] = {0x42};

int main() {
   int fd = open(FILE_NAME, O_RDWR);
   if (fd == -1) {
       write(1, "Can not open: ", 14);
       write(1, FILE_NAME, sizeof(FILE_NAME));
       return -1;
   }
   if ((lseek(fd, BYTE_POS, SEEK_SET) == -1) || (write(fd, BYTE_VAL, 1) == -1)) {
       write(1, "Error patching file", 19);
       close(fd);
       return -1;
   }
   close(fd);
   write(1, "The file has been successfully patched!", 39);
   return 0;
}

That got me a COM executable of 6594 bytes, better but still large.

The difference in the compiled sizes between the programs written on ASM and vanilla C is 44x with functions from stdio.h and 38.5x with unistd.h.

Conclusion

The most of the difference is of course the code from the Standard Library, most of which is actually never executed in the program and this illustrates the need of such projects as Nolibc. As the next experiment, I should probably try using functions from dos.h (I think that should make it a pure equivalent), or try building it using some minimal libc replacement to see how small it can get with C. Honestly though, my impression is that I don’t actually want to use C with workarounds or very careful choice of the libraries and functions for a such trivial task and still get a suboptimal result. I can instruct the computer directly, get predictable size of the compiled binary and enjoy the minimalism by using pure Assembly. Yes, it’s not cross-platform, but this patch would not be used on other platforms anyway, and seeing humble 171 bytes across the name of an executable is something very satisfying, something long forgotten and lost in the past decades. One important point is that I did not need to experiment and change the implementation on ASM to get the optimal result by default - the low level coding gives you precisely what you are asking for.