macho/macho-go/doc/index.html.pm
2023-05-31 16:17:03 +07:00

350 lines
16 KiB
Perl

#lang pollen
(require txexpr)
(require pollen/decode)
(define-meta template "template.html")
(define (sidenote label . xs)
`(splice-me
(label ((for ,label) (class "margin-toggle sidenote-number")))
(input ((id ,label) (class "margin-toggle")(type "checkbox")))
(span ((class "sidenote")) ,@xs)))
◊(define (stupid-sidenote label . xs)
`(stupid-sidenote
(label ((for ,label) (class "margin-toggle sidenote-number")))
(input ((id ,label) (class "margin-toggle")(type "checkbox")))
(span ((class "sidenote")) ,@xs)))
(define (splice xs)
(apply append (for/list ([x (in-list xs)])
(if (and (txexpr? x) (member (get-tag x) '(splice-me)))
(get-elements x)
(list x)))))
◊(define (root . xs)
(decode `(decoded-root ,@xs)
#:txexpr-elements-proc (compose1 detect-paragraphs splice)
#:exclude-tags '(pre)
))
h1{Mach-O Binary Bindings}
h2{Mach-O Binary Format}
section{
p{
Mach-O binary is the executable file format that MacOS, iOS and tvOS use. The file is splited into 3 main parts. The header, the load commands, and the data. While the binary has 3 main parts, we only need to focus on the header and load commands part. The data part are referenced by load commands part.
}
}
h3{Header}
section{
p{
The header provides summary information to the binary. First comes the file magic then the file's archtecture, number and size of load commands, and a bitmask flag.
}
◊p{
The Mach-O binary comes in 2 different set of magic bytes, 32-bit, 64-bit with endianess in either little or big.
}
◊pre{
#define MH_MAGIC 0xfeedface /* the mach magic number */
#define MH_CIGAM 0xcefaedfe /* NXSwapInt(MH_MAGIC) */
#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */
#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
}
◊p{
The file architecture is denoted by two numbers. One specifies the main architecture (intel 386, arm, mips, ...), one specifies the sub-architecture (armv7, arm64e, ...). ◊sidenote["machine-arch-enum"]{◊code{cctools/include/mach/machine.h}}
}
◊p{
The Mach-O binary uses load commands to structure the binary information. The header stores the number of load commands and size (in bytes) for parsing and validation.
}
◊p{
The header also has a bitmask flag value for others information, e.g PIE.
}
}
◊h3{Load Commands}
◊section{
◊p{
Load commands are the main part of a Mach-O binary. These load commands are read by ◊code{dyld} linker to load the binary into memory. Each load commands are prepend by a simple header indicate the type and size. The body of each load commands different with each type.
}
◊p{
At the time of writing, there are more than 50 load command types. But we only go into a few types that are necessary to understand the Mach-O Binary Binding.
}
}
◊h4{LC_SEGMENT and LC_SECTION}
◊section{
◊p{
◊code{LC_SEGMENT} defines a region for virtual mapping into the memory on Runtime. It defines the virtual address and size for ◊code{dyld} to map. A segment contains a list of sections defined as ◊code{LC_SECTION}. ◊code{LC_SECTION}s are located below the ◊code{LC_SEGMENT}. Each section contains a data pointer to instruct ◊code{dyld} to load bytes into memory.
}
◊p{
◊code{LC_SEGMENT} are loaded into memory with default Read-Write-Executable marked by ◊code{initprot}. We can modify the memory region on runtime using ◊code{vm_protect}. However, protection bit cannot exceed ◊code{maxprot}.
}
◊p{
◊code{LC_SECTION} may contain special data, such sections are marked by the ◊code{type} field. A section containing C literal strings are marked as ◊code{S_CSTRING_LITERALS}.
}
}
◊h4{LC_LOAD_DYLIB and LC_RPATH}
◊section{
◊p{
The Mach-O binary defines the dynamically linked library using ◊code{LC_LOAD_DYLIB}. When ◊code{dyld} reads this load command, it loads the library into the memory. ◊code{LC_LOAD_DYLIB} contains a path to the linked library, e.g. ◊code{/usr/lib/libSystem.B.dylib}. The path can be absolute ◊code{/absolute/lib}, relative ◊code{../relative/lib} or prefixed. Prefixed path are path prefixed by either ◊code{@executable_path}, ◊code{@loader_path} or ◊code{@rpath}. ◊code{@executable_path} points to the directory where the executable is placed. ◊code{@loader_path} points to ◊code{dyld} parent directory. ◊code{@rpath} are defined using ◊code{LC_RPATH}.
}
◊p{
◊code{LC_RPATH} adds to the set of paths to find a library path with ◊code{@rpath}. Often, we see a ◊code{LC_RPATH = @executable_path/Frameworks} defined. This path is commonly used in iOS applications as third-party libraries are placed inside ◊code{Frameworks} folder, and specify ◊code{@rpath} to load libraries are shorter to write (and understand). ◊sidenote["rpath sucks"]{Not yet research into same library name put in different @rpath}
}
}
◊h4{LC_ID_DYLIB}
◊section{
◊p{
This is a special command indicating the path to put the library when compiled. The compilation may fail if the library is placed wrongly. Renaming the library also fail the compilation. ◊sidenote["id dylib"]{Not 100% sure}
}
}
◊h4{LC_DYLD_INFO or LC_DYLD_INFO_ONLY}
◊section{
◊p{
This command appears only once in a Mach-O binary and contains pointers to the encoded list of external (imported) symbols and exported symbols. External symbols reside on third-party library and must be bound by ◊code{dyld}. There are 3 types of external symbols, normal bind, weak bind, and lazy bind.
}
◊p{
External symbols are defined using byte-stream. Reading and decode the byte-stream reveals the symbol's name, the library hosting the symbol.
For normal bind symbols, code{dyld} read and decode the byte-stream when loading the binary to memory and write the symbol's function address to a section marked ◊code{S_NON_LAZY_SYMBOL_POINTERS}.
For lazy bind symbols, the decoding process is done through PLT ◊sidenote["PLT"]{Procedural Linkage Table}. Each lazy bind symbol have a helper stub calling ◊code{dyld_stub_binder} with a number. The number is an offset to lazy bind symbols byte-stream. When parsed, the symbol's name and the library hosting the symbol is known. With these two information, code{dyld} look up the symbol's name in the export table of the hosting library and return the address to the function. ◊code{dyld} also rewrite the PLT table on success so the next time the same function is called, it will be a direct call.
◊pre{
__LAZY:
...
dword _stub_write
...
foo:
push argument_of_write
jmp [__LAZY + offset_to_stub_write]
_stub_write:
push OFFSET_WRITE_IN_BYTE_STREAM
jmp _dyld_stub_binder
_dyld_stub_binder:
call fastBindLazySymbol
call eax
# where fastBindLazySymbol resolve the address of the symbol and
# replace the __LAZY _stub_write pointer with the symbol's address.
}
}
}
h4{LC_CODE_SIGNATURE}
section{
p{
The *OS system requires the running binary to be signed. Signed Mach-O binary has a load command code{LC_CODE_SIGNATURE} points to a compressed data containing the vendor information, the key used signing and the certificate. sidenote["codesign"]{Yet to read how to parse this information}
}
}
h4{LC_ENCRYPTION_INFO}
section{
p{
To prevent people copying binaries, Apple introduced PlayFair, a mechanism to prevent digital infringement. With PlayFair, the app is encrypted with a key installed on the chip when downloaded from AppStore. The content is decrypted at runtime using the hardware key. When running the binary in another machine, the app content can't be decrypted and fail to launch.
}
◊p{
◊code{LC_ENCRYPTION_INFO} points to the region of the encrypted data and indicate if the content is encrypted or decrypted. It should only be encrypted when distributed through AppStore. In later section, we introduce how one can decrypt the content with a Jailbroken iOS device.
}
}
◊h4{LC_FUNCTION_STARTS and LC_DATA_IN_CODE}
◊section{
◊p{
}
}
◊h4{LC_SYMTAB and LC_DYSYMTAB}
◊section{
◊p{
}
}
◊h2{Fat Binary Format}
◊section{
◊p{
Apple has a binary format for combining different architecture of an application into one, called Fat binary. The binary format is simple, it defines a list of Mach-O binaries. The Mach-O binaries inside are located with an alignment of 2. Infact:
}
◊pre{
func GetAlignment(h *macho.Header) uint32 {
switch h.Cputype() {
case CPU_TYPE_ARM, CPU_TYPE_ARM64:
return 0xe // log2(0x4000)
case CPU_TYPE_POWERPC,
CPU_TYPE_POWERPC64,
CPU_TYPE_I386,
CPU_TYPE_X86_64:
return 0xc // log2(0x1000)
default:
return 0xd // log2(0x2000)
}
}
}
}
◊h2{ipa file}
◊section{
◊p{
In iOS applications are distributed using .ipa file format. The .ipa file is a .zip file with a certain structure. A valid ipa file must have a folder ◊code{*.app}. Inside the folder must have two files, ◊code{Info.plist} and a binary. The binary name must conform to the ◊code{Info.plist}'s code{CFBundleExecutable} field. code{Info.plist} file format is (UTF8) XML or binary encoded, both format can be read using python's ◊a['((href "https://docs.python.org/3/library/plistlib.html"))]{plistlib}.
}
h3{Info.plist}
p{
}
}
h2{Jailbreak and Mach-O hacks}
h3{Jailbreak the iOS}
section{
p{
Jailbreak can be done by using tools like a['((href "https://checkra.in/"))]{◊code{checkra1n}}. At the time of writing, Jailbreak can be done easily for devices iPhone5s to iPhone X and support for iOS 12 to 14. This jailbreak is semi-tethered, means that a reboot breaks the jailbreak.
}
◊p{
Jailbreak disable some security features by patching the iOS kernel. It enables USB access, re-mounts ◊code{/} as both readable and writable, removes sandbox, etc. Jailbreaking tool often come with some other applications, e.g openssh, Cydia. A device with Cydia installed is Jailbroken. In custom jailbreaking tool, we can omit the installation of Cydia. However, most jailbroken devices are jailbroken using public tooling (like checkra1n), and by default, it comes with Cydia installed.
}
◊p{
Semi-tethered jailbreak losts the kernel patch after a reboot, but applications installed when the jailbreak was active remain. In these cases, the device must run the jailbreak again to access jailbreak features.
}
◊p{
Jailbreak tools install ◊code{openssh} to provide access through USB connection. ◊code{checkra1n} install and run openssh at port 44. After connecting the device with the USB cable, we proxy the USB connection to ◊code{localhost}, now ◊code{ssh} to the device is possible. To proxy the USB connection, we use ◊a['((href "https://libimobiledevice.org/"))]{code{iproxy}} sidenote["iproxy"]{code{brew install libimobiledevice}}.
}
pre{
iproxy 4444 44
ssh root@localhost -p 4444
# default root password: alpine
}
p{
In old books and article the path to installed applications are wrong. Use this path: code{/var/containers/Bundle/Application/ID/}, where ID is a long hash for each application.sidenote["app path"]{old path: code{/var/mobile/Applications/ID/}}
}
}
h3{Decrypt the binary}
section{
p{
As mentioned in code{LC_ENCRYPTION_INFO}, binaries downloaded from AppStore is encrypted with a hardware key. It is said that the key can't be retrieved, and the only to decrypt the binary is to let it run and dump the binary after decryption process. Here we demonstrate the use of ◊a['((href "https://github.com/stefanesser/dumpdecrypted"))]{dumpdecrypted}.
}
p{
Compile the library, add code{-miphoneos-version-min=<iOS>} as iOS version of the idevice. Sign the output binary. Use scp to copy the binary to the idevice. ssh into the idevice and run the application with the library.
}
pre{
set DYLD_INSERT_LIBRARIES=dumpdecrypted.dylib
/var/containers/Bundle/Application/ID/APPNAME.app/BINARY
}
p{
A binary BINARY.decrypted will be written out with contents decrypted. This method is the most simple way to decrypt an application but still cannot install to another machine (even after resigning). This method only decrypt the main application, however the third-party libraries are also encrypted and not decrypted. We can overcome this with code{frida-ios-dump}. This tool uses Frida, which will be discussed in later section.
}
p{
code{dyld} maps the encrypted section using code{mremap_encrypted} sidenote["mremap_encrypted"]{code{dyld/dyld3/Loading.cpp}}. Another tool using this method to decrypt is a['((href "https://github.com/JohnCoates/flexdecrypt"))]{flexdecrypt}.
}
◊pre{
#if (__arm__ || __arm64__) && !TARGET_OS_SIMULATOR
// tell kernel about fairplay encrypted regions
uint32_t fpTextOffset;
uint32_t fpSize;
if ( image->isFairPlayEncrypted(fpTextOffset, fpSize) ) {
const mach_header* mh = (mach_header*)loadAddress;
int result = ::mremap_encrypted(((uint8_t*)mh) + fpTextOffset, fpSize, 1, mh->cputype, mh->cpusubtype);
if ( result != 0 ) {
diag.error("could not register fairplay decryption, mremap_encrypted() => %d", result);
::vm_deallocate(mach_task_self(), loadAddress, (vm_size_t)totalVMSize);
return;
}
}
#endif
}
}
◊h3{Frida hooking}
◊section{
◊p{
Frida is a hooking framework. Using Frida, one can easily intercept the running application. Frida supports both jailbroken and non-jailbroken devices. Frida has two components, the server and the client. The server is installed in our machine, the client is attached to the application.
}
◊pre{
python3 -m pip install frida-tools
}
◊p{
The Frida ◊a['((href "https://frida.re/docs/ios/"))]{documentation} gives some instructions for using Frida with jailbroken or non-jailbroken devices. In our experience, we setup Frida manually, for both jailbroken and non-jailbroken devices, as stated below.
}
h4{Setup for jailbroken devices}
p{
Jailbroken devices with Cydia can easily install Frida by adding Frida from Cydia store. The installed Frida should have its ABI compatible with the device. Frida can also be installed by download, sign, copy (scp) onto the device and run the Frida client.
}
h4{Setup for non-jailbroken devices}
p{
Non-jailbroken devices cannot install Frida client. Frida server uses Frida client to inject Frida Gadget into the application on jailbroken devices. For non-jailbroken devices, we must setup Frida Gadget directly inside the application. We can do this by using a['((href "https://github.com/Tyilo/insert_dylib"))]{insert_dylib}. Build insert_dylib, download FridaGadget.dylib.
}
For an application having the ipa file (.app folder).
◊pre{
insert_dylib --strip-codesig --inplace '@executable_path/Frameworks/FridaGadget.dylib' BINARY
}
Copy FridaGadget.dylib into Frameworks folder. Resign BINARY and sign FridaGadget.dylib. Use Xcode to install the application. Frida can connect and intercept the application after the application started.
}
◊h4{Using Frida}
◊p{
Frida intercepts the application and provides a scripting utility. We can use the scripting engine to inspect memory, change registers' value. The iOS ecosystem often use Objective-C (Swift also use Objective-C at low-level) and Frida also supports conversion from Objective-C data type to raw data type, e.g. NSString to raw string.
}
h3{Others tools}
section{
p{
There are many other tools to pentest/reverse/intercept/... iOS applications. We haven't tested these tools but they are worth mentioning.
}
◊ul{
◊li{◊a['((href "http://www.cydiasubstrate.com/"))]{CydiaSubstrate} - Hook framework}
li{a['((href "https://github.com/akemin-dayo/AppSync"))]{AppSync} - Utilities for jailbreak devices}
◊li{◊a['((href "https://github.com/jmpews/Dobby"))]{Dobby} - Hook framework (stale)}
li{a['((href "https://github.com/KJCracks/Clutch"))]{Clutch} - Dump decrypted}
◊li{◊a['((href "https://github.com/nygard/class-dump"))]{class-dump} - Dump Objective-C class metadata}
li{a['((href "https://github.com/asLody/whale"))]{whale} - Hook framework}
◊li{◊a['((href "https://github.com/alexzielenski/optool"))]{optool} - Mach-O binary utilities}
li{a['((href "https://github.com/steakknife/unsign"))]{unsign} - unsign Mach-O binary}
}
}