Friday, April 1, 2011

Universal binaries - running the same executable on mac and linux

I've updated the sbf_flash download so that the same executable will run on either Mac or Linux. This isn't nearly as easy as you'd think, given that the two platforms use completely incompatible executable formats; Mac uses a mach-o while Linux uses ELF.

Keep reading for an explanation of how I pulled it off.
The first 4 bytes of the file identify the format

00000000  ce fa ed fe 07 00 00 00  03 00 00 00 02 00 00 00  |................|
00000010  02 00 00 00 88 00 00 00  01 00 00 00 01 00 00 00  |................|
00000020  38 00 00 00 5f 5f 54 45  58 54 00 00 00 00 00 00  |8...__TEXT......|
00000030  00 00 00 00 00 20 03 00  e4 5a 00 00 00 00 00 00  |..... ...Z......|

00000000  7f 45 4c 46 01 01 01 03  00 00 00 00 00 00 00 00  |.ELF............|
00000010  02 00 03 00 01 00 00 00  58 66 c0 00 34 00 00 00  |........Xf..4...|
00000020  00 00 00 00 00 00 00 00  34 00 20 00 02 00 28 00  |........4. ...(.|
00000030  00 00 00 00 01 00 00 00  00 00 00 00 00 10 c0 00  |................|
In the first example, the "ce fa ed fe" identifies the file as a mach-o executable. It's a 32bit word, and because it's a little-endian system, the least significant bytes are first, which means that number is actually 0xfeedface; programmers like spelling things out in hex.

The second example is an ELF file, easily identified by the fact it actually spells out "ELF" in ascii.

The mach-o executables are interesting because you can pack support for multiple architectures into a single file; many of the executables shipped with OS X contain support for both PPC and x86. Unfortunately, Linux doesn't support loading mach-o files(*) nor can you load an ELF file on a Mac; even if you could load an ELF file on a Mac, it's not enough to claim that both Linux and Mac are x86 and can therefore compatible; they may share the same instruction set, but the APIs that define how they interact with the os are completely different which means that you would then need to have some sort of per-platform abstraction layer.

An alternative and more straight forward approach is to find some sort of format which will execute on both platforms; arguably, I could rewrite and recompile the program into some sort of portable executable format like java, or even mono, but there's no guarantee that the user will have the proper interpreter installed. The one interpreter that is pretty much guaranteed to be installed is "/bin/sh"; the shell that you can find on nearly every unix system. If you've ever looked at a shell script before, you're probably familiar with the line:
#!/bin/sh
This magic incantation means that the file is to be parsed by the file /bin/sh; more specifically, it means that if the above were called example.sh, then the command "./example.sh " is really "/bin/sh ./example.sh" and likewise "./example.sh arg1 arg2" becomes "/bin/sh ./example.sh arg1 arg2". It works the exact same for any other "#!" like "#!/usr/bin/perl", the only real downside is that you have to know the exact path to the interpreter. (One trick to get around that is to use "#!/usr/bin/env perl", but I digress)

Now, I don't intend to rewrite sbf_flash as a shell script; that'd be somewhat infeasible, but what I can do is use a shell script to determine which platform I'm running on and then load the correct executable. The next question is how do I pack a shell script, along with a linux and mac version of sbf_flash into a single file? There is a tool for that; the basic concept is to append a tar.gz file to the end of a shell script, and keep track of exactly how long the original shell script was so that you can locate the data appended to it. It's a neat idea but I have a better one.

Instead of appending something to the end of a shell script, let's make the shell script itself part of the archive. Conceptually something like this:
[header]
[script]
[linux executable]
[darwin executable]
Obviously it can't be a compressed archive format; I want the script to still appear legible to the system. Think a tar.gz minus the gzip part; it's just a set of concatenated files stored exactly as-is with the exception of a bit of extra data between them to identify the location, filename and permissions of each file. This means that even with the script as the first file in the archive, there will be a slight amount of garbage before the contents of the script to hold the meta data I just described.

The trick is hiding that garbage from the shell so that the script executes correctly; simple enough, as long as the there are no newlines in it I can turn it into a comment by prefixing it with a "#". In other words:
#[archive]
I just need to remember the "#!/bin/sh" at the start of the file and to make sure that the first newline in the archive is the start of the script.
#!/bin/sh
#[start of archive]
[script]
[remainder of archive]
That works out to be 11 bytes for the "#!/bin/sh", newline and "#" and the remainder of the file is the archive itself. I chose cpio for the archive format for two reasons, one, it's small, meaning that the amount of garbage before the script is minimal, and two, most versions of cpio don't particularly care about the 11 bytes I added to the start. In other words, you can extract it all with:
cpio -i < sbf_flash
I think I've just created a new type of executable.