iOS/About iOS/Mach-O.md
2018-06-11 17:42:03 +10:00

10 KiB
Executable File

The Mach-O Format

The Mach-O file format is the executable format of choice for XNU and dyld. It serves a purpose analog to what ELF or PE do; to put it simply, it describes a portion of an address space. Support for multi-architecture executables is also provided thanks to the fat format, which allows you to join multiple different architecture Mach-OS letting the kernel pick which to load. Note that it is possible to influence this with posix_spawn APIs.

Working with Mach-Os

  1. otool is provided out-of-the-box with the xcode cli utils.
  2. jtool, by Jonathan Levin, author of MOXiI and MOXiI II, is an analog but more flexible and advanced tool. On the other hand, it lacks support for disassembling architectures other than arm64.
  3. lipo is a tool for working with fat files
  4. Apple headers, specifically <mach-o/loader.h>/<mach-o/fat.h>.

General structure

This image from Apple documentation illustrates pretty well how a Mach-O file is composed.

Mach-O files structure

The header is the first thing you will encounter while parsing a Mach-O file. It is always located at the very beginning.

Immediately following the header there are the load commands, these commands describe what the content of the file is. They give us an idea of how the file is structured. Particularly, each command describes its own segment a piece of data (located in the data part of the Mach-O file), specifying where it is located (offset) in the file, and its size. Each specific command may also contain other values, specific to its segment only.

Finally, we have the data. This part of the file contains actual data. It also contains, as you are probably wondering, the executable code of the Mach-O. The structure of this part is described exactly by the load commands discussed above, the layout is really the same. Each segment may be divided into various sections which simply exist to sub-categorize the data in each segment. For example, the __TEXT segment contains various sections, among those the __text section, that contains the actual executable machine bytes. Another section is the __cstring, which only contains hardcoded C strings used in the program.

Universal Binaries (fat Mach-Os) structure

Apple introduced universal binaries when the transition from PPC architecture to x86 architecture was made. Universal binaries consists of more Mach-O files chained together, preceded by a so-called fat header, which contains the fat magic number and the number of Mach-O objects included in the file.

Here's the structure of a fat header (struct fat_header):

struct fat_header {
	uint32_t	magic;		/* FAT_MAGIC */
	uint32_t	nfat_arch;	/* number of structs that follow */
};

Immediately following the header, we find an nfat_arch number of struct fat_arch, each of those defines more in detail every Mach-O contained in the file. Here's the structure definition:

struct fat_arch {
	cpu_type_t	cputype;	/* cpu specifier (int) */
	cpu_subtype_t	cpusubtype;	/* machine specifier (int) */
	uint32_t	offset;		/* file offset to this object file */
	uint32_t	size;		/* size of this object file */
	uint32_t	align;		/* alignment as a power of 2 */
};

The first two fields describe the architecture for the Mach-O this structure is referring to, so that dyld knows which Mach-O to load for your specific CPU.
The offset field indicates, from the beginning of the file, where the Mach-O this structure is referring to is located in the whole fat binary. The size field indicates the whole size of that same Mach-O. Finally, the align field indicates the address boundary where the Mach-O should be aligned (generally, this is a page boundary, 4096).

The fat_header and the various fat_archs are stored at the beginning of the file, in the so-called fat section. The fat section is 4096 bytes in size, and thus it can hypothetically contain up to 204 fat_archs. The rest of the fat section, which doesn't contain either the header nor the archs, is filled with zeroes.

On a final note, remember that every single value found in the fat section (fat_header + fat_archs) is big-endian encoded. This is because at the time universal binaries were designed PPC was still the most widely used architecture, and it was big-endian. So, when reading values from parsed structures in the fat section, flip them.

Summarizing an universal binary structure with a visual representation:

+---------------------+		/* header (fat_header: 0x8) */
|                     |
|     FAT HEADER      |
|                     |
+---------------------+		/* archs (fat_arch: 0x14) */
|     FAT ARCH #1     |
+---------------------+
|     FAT ARCH #2     |
+---------------------+
|         ...         |		/* other eventual fat archs... */
+---------------------+		/* Mach-Os section (variable size) */
|                     |
|                     |
|                     |
|                     |
|      MACH-O #1      |
|                     |
|                     |
|                     |
|                     |
|                     |
+---------------------+
|                     |
|                     |
|                     |
|                     |
|                     |
|      MACH-O #2      |
|                     |
|                     |
|                     |
|                     |
+---------------------+
|         ...         |		/* other eventual Mach-Os... */
+---------------------+

mach_header(_64)

There are two structs (for both x86 and x64 architectures) for representing a Mach-O header:

struct mach_header; // 32-bit Mach-O format
struct mach_header_64; // 64-bit Mach-O format

The differences are minimal, just a reserved field added to the 64-bit header.

As per loader.h, the header file defining Mach-O constants and structures:

struct mach_header(_64) {
	uint32_t	magic;			/* mach magic number identifier */
	cpu_type_t	cputype;		/* cpu specifier */
	cpu_subtype_t	cpusubtype;	/* machine specifier */
	uint32_t	filetype;		/* type of file */
	uint32_t	ncmds;			/* number of load commands */
	uint32_t	sizeofcmds;		/* the size of all the load commands */
	uint32_t	flags;			/* flags */
	uint32_t	reserved;		/* reserved; 64 bit only */
};
#define	MH_MAGIC	0xfeedface	/* the mach magic number */
#define	MH_MAGIC_64 0xfeedfacf 		/* the 64-bit mach magic number */

Types of Mach-Os

// These shuld be representing userland Mach-Os
#define	MH_EXECUTE	0x2		/* demand paged executable file (executable) */
#define	MH_FVMLIB	0x3		/* fixed VM shared library file (???) */
#define	MH_DYLIB	0x6		/* dynamically bound shared library */
#define	MH_DYLINKER	0x7		/* dynamic link editor */
#define	MH_BUNDLE	0x8		/* dynamically bound bundle file */

// Compile-time Mach-Os, Generic address-space Mach-Os, ???
#define	MH_OBJECT	0x1		/* relocatable object file (.o) */
#define	MH_CORE		0x4		/* core file */
#define	MH_PRELOAD	0x5		/* preloaded executable file */
#define	MH_DYLIB_STUB	0x9		/* shared library stub for static */
					/*  linking only, no section contents */
#define	MH_DSYM		0xa		/* companion file with only debug */
					/*  sections */

// Kernel Mach-Os
#define	MH_KEXT_BUNDLE	0xb		/* x86_64 kexts */

Load Commands

Load commands are located immediately after the Mach header. They describe the content of the file, and are necessary to correctly parse the Mach-O.

They can be parsed using the struct load_command:

struct load_command {
	uint32_t cmd;		/* type of load command */
	uint32_t cmdsize;	/* total size of command in bytes */
};

For a comprehensive list and descriptions of load commands, look in mach-o/loader.h.

The most important ones are the LC_SEGMENT(_64)s, which describe each segment in the file. For example, the segment's name, the virtual address, the virtual size, the file offset (i.e. where that segment data is located in the file), the file size, initial and maximum memory protections and the number of sections for that segment.

Segments

Segments are portions of the Mach-O file that get mapped in the address space at runtime. They are composed of sections, which hold actual data.
Whatever data they contain, it is needed by the process at runtime and so it gets mapped in the address space.

Sections

As we have stated before, segments are made of sections, and those hold actual data. They can contain virtually anything, like executable code bytes, data, pointers, strings or even nothing at all (see __PAGEZERO segment).

nlist / nlist_64

Symbols

Check facebook/fishhook For Explanations

Lazy Linking

Check facebook/fishhook For Explanations

Position Independent Executables

a.k.a. ASLR-Enabled Executables.

PIE Flag is in the thin mach_header with value 0x200000

Minus that from the original flag value will disable ASLR, and vice verse

Special Segments

__PAGEZERO is a special segment with *ZERO* disk space and takes a page in VM.
When code access NULL, it will land there
Mach-O containing __RESTRICT/__restrict will be flagged by dyld and thus DYLD_INSERT_LIBRARIES EnvVar will be ignored.(This is added after iOS6)

Other common load commands

Malformed Mach-O

By altering Mach-O headers. Different issues would happen within dyld.

Sometimes it can protect the binary from being analyzed

Sometimes it's a 0Day and can help bypass Code-Signature-Verification

Examples:

MachOProtector

Won't Work After iOS 8.1 or OS X 10.11

yalu

Maybe. Didn't Check SRC.

Pangu8

Two Level Namespace

The Mach-O Address Space in xnu/dyld based OSes

Dyld Shared Cache

Download the latest dyld source from here.

open ./launch-cache/dsc_extractor.cpp and change #if 0 to #if 1.
Then just clang++ -o dsc_extractor dsc_extractor.cpp dsc_iterator.cpp and run dsc_extractor on the cache to extract.

Known Issues:

Seems like segment VM Address is wrong. So the binary is not class-dump able (Just use runtime dump instead) not nm-able (solution: simply replace LC_SEGMENT_SPLIT_INFO to LC_SOURCE_VERSION)

References