Assembly vs C - how many bytes does it take to change a single byte in a file?
How many bytes does it take to change a single byte in a file? Well, a simple patch program just needs to do several system calls to OS (to make the change and to print some output to the user). Such executable should never exceed a kilobyte, right? Right?
Well, it very much depends on your choice of the programming language…
A couple of weeks ago I were gifted an old box with Need For Speed Special Edition, I decided to play the DOS version from the disc on my retro computer. The game started if DOS was loaded into the lower memory, but the in-game video playback was completely broken. Fortunately there exists a solution described in this brilliant post by Michal Necasek Need for Speed SE video glitch The solution is extremely simple - you need to change one byte in the game executable. I did it using a hex editor and it helped.
As an exercise I decided to write a program that would patch the game without needing to manually change the value in a hex editor. As an experiment I decided to make two implementations - one on Assembly and another on C (using the standard C library) - I were curious how big will be the compiled programs.
The example code bellow is not specific to any game, it uses some arbitrary
values for the byte position (0x12345
), the new byte value (0x42
) and the
file name (ABC.EXE
) to be generic. (If you are interested in fixing NFS SE
see the link above to the original post for the specific values to use).
Here is my assembly implementation - patch.asm:
org 100h
mov dx, FILE_NAME
mov al, 010b
mov ah, 3Dh
int 21h
jc file_open_error
mov bx, ax
mov ax, 4200h
mov cx, 0001h ; Position of the byte
mov dx, 2345h ; in the file
int 21h
jc patch_error
mov ah, 40h
mov cx, 1
mov dx, NEW_BYTE_VALUE
int 21h
jc patch_error
mov ah, 3Eh
int 21h
mov dx, MSG_SUCCESS
mov ah, 9
int 21h
mov al, 0
mov ah, 4Ch
int 21h
patch_error:
mov dx, MSG_ERROR_P
mov ah, 9
int 21h
mov ah, 3Eh
int 21h
jmp exit_with_err
file_open_error:
mov dx, MSG_CANT_OPEN
mov ah, 9
int 21h
exit_with_err:
mov al, -1
mov ah, 4Ch
int 21h
NEW_BYTE_VALUE: db 42h
MSG_SUCCESS: db "The file has been successfully patched!", 0Dh, 0Ah, "$"
MSG_ERROR_P: db "Error patching file", 0Dh, 0Ah, "$"
MSG_CANT_OPEN: db "Can not open: "
FILE_NAME: db "ABC.EXE", 0h
MSG_LINE_END: db 0Dh,0Ah, "$"
As expected it’s just a series of system calls and several jumps for error handling. A Makefile to compile it with NASM:
patch.com : patch.asm
nasm -o patch.com patch.asm
The result is a COM executable of 171 bytes
. Nice!
Now let’s see what C will get us.
Here is my first attempt patch.c
:
#include <stdio.h>
#define FILE_NAME "ABC.EXE"
#define BYTE_POS 0x12345
#define BYTE_VAL 0x42
int main() {
FILE* fp;
fp = fopen(FILE_NAME, "rb+");
if (fp == NULL) {
fputs("Can not open: ", stdout); // fputs does not add \r\n, unlike puts
puts(FILE_NAME); // puts footprint is lesser than printf
return -1;
}
if ((fseek(fp, BYTE_POS, SEEK_SET) != 0) || (fputc(BYTE_VAL, fp) == EOF)) {
puts("Error patching file");
fclose(fp);
return -1;
}
fclose(fp);
puts("The file has been successfully patched!");
return 0;
}
Here is the Makefile to compile and link it using Open Watcom:
patch.obj : patch.c
wcc -0 -d0 -ms patch.c
patch.com : patch.obj
wlink system com file patch.obj
Before analysing the results, I compiled an empty C program in DOS as a COM executable using Open Watcom. The result size of the COM file without any useful payload slightly exceeded a kilobyte, and most of it was occupied by the C Run-Time Library placed by the compiler.
The role of the Standard Library turned out to be even bigger - the program
with the code from the above compiled to a COM file of 7490 bytes
.
Different functions from the standard library had different impact,
for example - using a more simple puts
instead of printf
helped to decreased
the size by several kilobytes. I went further and replaced all stdio.h
functions by more lean POSIX system calls from unistd.h
(which are
conveniently supported by the DOS compiler):
#include <fcntl.h>
#include <unistd.h>
#define FILE_NAME "ABC.EXE"
#define BYTE_POS 0x12345
unsigned char BYTE_VAL[1] = {0x42};
int main() {
int fd = open(FILE_NAME, O_RDWR);
if (fd == -1) {
write(1, "Can not open: ", 14);
write(1, FILE_NAME, sizeof(FILE_NAME));
return -1;
}
if ((lseek(fd, BYTE_POS, SEEK_SET) == -1) || (write(fd, BYTE_VAL, 1) == -1)) {
write(1, "Error patching file", 19);
close(fd);
return -1;
}
close(fd);
write(1, "The file has been successfully patched!", 39);
return 0;
}
That got me a COM executable of 6594 bytes
, better but still large.
The difference in the compiled sizes between the programs written on ASM and
vanilla C is 44x with functions from stdio.h
and 38.5x with unistd.h
.
Conclusion
The most of the difference is of course the code from
the Standard Library, most of which is actually never executed in the
program and this illustrates the need of such projects as
Nolibc. As the next experiment, I should
probably try using functions from dos.h
(I think that should make it a pure
equivalent), or try building it using some minimal libc replacement to see how
small it can get with C. Honestly though, my impression is that I don’t
actually want to use C with workarounds or very careful choice of the libraries
and functions for a such trivial task and still get a suboptimal result.
I can instruct the computer directly, get predictable size of the compiled
binary and enjoy the minimalism by using pure Assembly.
Yes, it’s not cross-platform, but this patch would not be used on other
platforms anyway, and seeing humble 171 bytes across the name of an
executable is something very satisfying, something long forgotten and lost in
the past decades. One important point is that I did
not need to experiment and change the implementation on ASM to get the optimal
result by default - the low level coding gives you precisely what you are
asking for.