basic website complete

- ported posts from Efiens
- add CV
- add MachO obfuscation whitepaper
This commit is contained in:
2023-10-25 01:12:36 +07:00
parent 7a8ce17237
commit 7968743f3c
16 changed files with 1785 additions and 45 deletions

48
content/osx/fairplay.md Normal file
View File

@ -0,0 +1,48 @@
---
# Documentation: https://wowchemy.com/docs/managing-content/
title: "Apple Fairplay protection in Mach-O"
subtitle: ""
summary: ""
authors: [luibo]
tags: [osx, iOS, macOS, dyld]
categories: [osx]
published: "2021-09-06"
lastmod: 2021-09-06T11:15:04+07:00
featured: false
draft: false
# Featured image
# To use, add an image named `featured.jpg/png` to your page's folder.
# Focal points: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight.
image:
caption: ""
focal_point: ""
preview_only: false
# Projects (optional).
# Associate this post with one or more of your projects.
# Simply enter your project's folder or file name without extension.
# E.g. `projects = ["internal-project"]` references `content/project/deep-learning/index.md`.
# Otherwise, set `projects = []`.
projects: ["osx", "binary-format"]
---
Fairplay encryption created by Apple to protect digial possession rights. Implemented with a custom chip set for encryption and decryption with a hardcoded key. It is still unknown how to extract the key from the hardware. But decryption is feasible given a root access to the device.
When an application is loaded, the encrypted fairplay section must be decrypted. If the decryption is success, the app can start running as normal. During the course of the app's uptime, the section is decrypted and stayed in memory.
If the memory can be dumped when the app is running, we can retrieve the file in its un-encrypted form. Using Apple APIs, we can get the mapped binary file in memory. With this, we can collect the decrypted region and write back to file.
The method is clear. However, we need to run code in the same space as the applications. The details on how to do this can be found on [[Injections]]. Right now, there are solutions:
- https://github.com/stefanesser/dumpdecrypted
- https://github.com/AloneMonkey/frida-ios-dump
- https://github.com/BishopFox/bfdecrypt
- https://github.com/KJCracks/Clutch
There's also improvements to this decrypt technology. The first one being issuing fairplay `mremap_encrypted` to load the encrypted section only. https://github.com/JohnCoates/flexdecrypt
The second one is by using an exploit to read other process' memory space. https://github.com/DerekSelander/yacd. This method applies only on iOS 13 and above, but the good thing is, there is no need of jailbreak.
Given the current situation of Apple, fairplay decryption is no where near mitigated. Fairplay decryption is crucial for most analysis, as the app can't be viewed when encrypted. As of now, we can decrypt them using the above methods, atleast, until Apple hardens the process. But even so, we can still use lower devices to decrypt.

249
content/osx/injection.md Normal file
View File

@ -0,0 +1,249 @@
---
# Documentation: https://wowchemy.com/docs/managing-content/
title: "Injecting code into Mach-O"
subtitle: ""
summary: ""
authors: [luibo]
tags: [osx, iOS, macOS, dyld]
categories: [osx]
published: "2021-09-06"
lastmod: 2021-09-06T11:15:05+07:00
featured: false
draft: false
# Featured image
# To use, add an image named `featured.jpg/png` to your page's folder.
# Focal points: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight.
image:
caption: ""
focal_point: ""
preview_only: false
# Projects (optional).
# Associate this post with one or more of your projects.
# Simply enter your project's folder or file name without extension.
# E.g. `projects = ["internal-project"]` references `content/project/deep-learning/index.md`.
# Otherwise, set `projects = []`.
projects: ["osx", "binary-format"]
---
This article introduces the reader to some easy injection that can be used to hijack the runtime of a Mach-O binary. Some techniques can be easy to perform, some are posible due to 3rd party toolings, and some are based on theory.
## Before start
Apple's loader loads and run all initial functions from dynamically linked libraries. Due to this, we can create functions that run before the main binary is started. With a little craft, we can also make our functions the first function to be run.
Started by making a dynamic library (`*.dylib`) with functions we wish to run in loader state as:
```c
struct ProgramVars {
void* mh; // mach_header or mach_header64
int* NXArgcPtr;
const char*** NXArgvPtr;
const char*** environPtr;
const char** __prognamePtr;
};
__attribute__((constructor))
void foo(int argc, const char** argv, const char** envp, const char** apple, struct ProgramVars* pvars) {
// code goes here
}
```
`__attribute__((constructor))` marks the compiler to place the function address into `__mod_init_func`, a section dedicated to be called by loader when the binary is loaded.
With this library compiled, we can run `foo` before the main binary is run by using these few methods.
## DYLD_INSERT_LIBRARIES
Similar to `LD_PRELOAD` on Linux OSes, `DYLD_INSERT_LIBRARIES` is read by loader and load addional libraries provided in value. This method is easy to do when working on MacOS, but impossible for system where we don't have access to terminal, legacy iOS, tvOS, watchOS.
## Adding load command
If we have a Mach-O binary, we can add another load command to make loader to find and load the library, which follows calling our library initial functions.
In most cases, Mach-O binary has a spare space between the list of load commands and the contents. We add a load command in this empty space, fix the header with new `ncmds` and `sizeofcmds`.
After that everything is set, we resign the binary (if iOS/tvOS/watchOS) and install. On run, loader loads and runs our functions before the main binary.
This method can be extended to make the function run first, but requires a very careful crafting.
As loader load each library following the order of their declaration in the main Mach-O binary. If we can move our library first, we can make our functions the very first to run.
Fixing the ordering of load commands can be done easily, but just fixing them won't work. As the opcodes to dynamic symbols are encoded with the library ordering. E.g. a symbol `printf` referencing library indexed `1` now must reference `2`, because we've pushed our library on top. And it gets worse, since the `__stub_helper` index into the opcode bytestream. Which means if we edit the bytestream and somehow mess up with the order, we fail.
### Fixing opcodes?
This section delivers an in depth analysis of this scenario. In the opcode bytestream, there are two opcode to encode the library index (we don't count the special index as it is defined in different opcode): `BIND_OPCODE_SET_DYLIB_ORDINAL_IMM` and `BIND_OPCODE_SET_DYLIB_ORDINAL_ULEB`. To prevent long names, we call them `imm dylib` and `uleb dylib`, respectively.
Opcode using `imm dylib` will be 1 byte and apply for libraries with indexed not exceeding 15 (0xf is max).
Opcode using `uleb dylib` will be 2 bytes or more, where the first byte is the opcode, and the rest bytes are index encoded in uleb128.
Problem occurs when a `imm dylib` with index 15 is increased, which turns the number to 16 and have to encode using `uleb dylib`. Which breaks the index in `__stub_helper` for other symbols. When this is problematic event occurs, we can resolve these by many ways, I haven't tested these solution but in theory it should work.
1. Fixing `__stub_helper`
Straight forward solution, we update the new index to the symbols in opcode bytestream to the `__stub_helper`. We know which stub points to which symbol before the edit, after editing, just loop through each stub and update the new index.
2. Fixing it on runtime
This is harder and prone to crashing. Because we can inject our function before the main code is run, we add a resolver for these symbols. There are plenty ways to do this, depends on how creative you are and how brave are you to tackle these solutions.
Remarks: `__stub_helper` can't be edited on runtime, but the `__la_symbol_ptr`, which holds the address for each function (default to stub) can be edited on runtime. We abuse this.
`__la_symbol_ptr` doesn't show us which symbols is being called, however, we match the information with stub's old index to identify the symbol.
```
__la_symbol_ptr:
stub_1
stub_2
stub_1:
load index_1
call bind
stub_2:
load index_2
call bind
;; index 1 is foo of libA
;; index 2 is bar of libB
;;
;; __la_symbol_ptr = [foo of libA, bar of libB]
```
- Simulate the loader
On load, we updates the whole `__la_symbol_ptr` sections with symbols' address. We use Apple's API to find all loaded libraries, and traverse the export trie to find the function address.
> similar to overwrite `__got`, `__plt` in pwn technique
```c
struct Symbol {
char * name;
char * lib;
void * address = 0;
}
struct export_trie;
void* find_symbol_address(export_trie* exported);
export_trie* get_export_trie(void* header); // mach_header or mach_header64
void update_symbols_in_lib(Symbol* symbols, char * lib, export_trie *
exported) {
for (unsigned int i; i < len(symbols); i++) {
if (strcmp(symbols[i].lib, lib) == 0) {
void* addr = exported.find_symbol_address(symbols[i].name);
symbols[i].address = addr;
}
}
}
// not test, quick way to get la_symbol_ptr section pointer
// static volatile void* la_symbol_ptr __attribute__((section ("__DATA,__la_symbol_ptr"))) = { 0 };
void resolve() {
Symbol* to_bind = read_la_symbol_ptr();
for (uint32_t i; i < _dyld_image_count(); i++) {
void * header = _dyld_get_image_header(i);
char * lib = _dyld_get_image_name(i);
export_trie* exported = get_export_trie(header);
update_symbols_in_lib(to_bind, lib, exported);
}
}
```
- Hijack `dyld_stub_binder`
`dyld_stub_binder` holds the address to loader's bind method. Conveniently, this symbol resides in `__got`/`__nl_symbol_ptr`, which got resolved when the binary is loaded.
When our function run, we can rewrite this value to our custom function. Which will get call by other stubs. We now know the original index passed by stubs, we just need to change the old index to new ones and send to the original bind method. This seems easier to implement.
```c
void* find_original_bind() {
// read __nl_symbol_ptr or __got
// to find original dyld_stub_binder
// should be the first one (iirc)
}
// original bind function receives two parameter,
// first is index
// second is cache of libraries (iirc)
void custom_bind(int old_index, void* param) {
static void(*original_bind)(int, void*) = (/* cast */)find_original_bind();
int new_index = get_new_index(old_index);
original_bind(new_index, param);
}
```
## Cycript
Probably the first injection framework on iOS, but stopped development since 2016. Created by one of the most renowned jailbroken iOS developer, the creator of Cydia, Jay Freeman or commonly known as *saurik*.
At the latest version, Cycript supports til iOS 11. More information can be found publicy on their [website](http://www.cycript.org/).
## Frida
Frida is famously known for its injection ecosystem that works seemlessly across Android, Apple OS, Windows, Linux. To setup Apple device with Frida is easy and instrumentation, hijacking code can be done just by writing a piece of Javascript code.
The following guide is provided only for iOS devices.
### Setup
The setup of Frida is different between non-jailbroken and jailbroken devices.
For jailbroken devices, a server must be installed and run. Then frida (on PC/Mac) can connect through the usb cable and ask the server to perform tasks such as querying the system files, listing apps, start an app, hook a running app...
For non-jailbroken devices, if Frida < 12.7.12 is used, we must manually add the Frida dynamic library (FridaGadget) to the binary. The Frida documentation says that for Frida >= 12.7.12, FridaGadget is automatically injected, but I haven't tested, and doubt that it works with iOS (due to codesigning and restriction in environment).
### Inject then Hijack
I put simple script for reference. There are plenty on the Internet.
```js
// normal attach to inject onEnter and onExit
// demo CCCrypt module
Interceptor.attach(
Process.getModuleByName('libcommonCrypto.dylib').getExportByName('CCCrypt'),
{
onEnter(args) {
let algorithm = (function(algo) {
if (algo === 0) return "AES128";
if (algo === 1) return "DES";
if (algo === 2) return "3DES";
if (algo === 3) return "CAST";
if (algo === 4) return "RC4";
if (algo === 5) return "RC2";
return "algo_" + algo;
})(args[1].toInt32())
console.log("CCCrypt using " + algorithm)
console.log("CCCrypt key:")
console.log(args[3].readByteArray(args[4].toInt32()))
console.log("CCCrypt iv:")
console.log(args[5].readByteArray(16))
console.log("CCCrypt => " + args[6].readUtf8String())
}
}
)
// inject on an address of a lib or main binary
let module = Process.getModuleByName(module_name)
let offset = 0x1234 // reverse engineer
Interceptor.attach(SF.base.add(offset), {
onEnter() {
// accessing registers
// console.log("Calling x9 raw: " + this.context.x9)
}
})
```
## bfinject
> Easy dylib injection for jailbroken 64-bit iOS 11.0 - 11.1.2. Compatible with Electra and LiberiOS jailbreaks
Update soon(tm)

93
content/osx/linker.md Normal file
View File

@ -0,0 +1,93 @@
---
# Documentation: https://wowchemy.com/docs/managing-content/
title: "Mach-O linker information"
subtitle: ""
summary: ""
authors: [luibo]
tags: [osx, iOS, macOS, dyld]
categories: [osx]
published: "2021-09-06"
lastmod: 2021-09-06T11:15:02+07:00
featured: false
draft: false
# Featured image
# To use, add an image named `featured.jpg/png` to your page's folder.
# Focal points: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight.
image:
caption: ""
focal_point: ""
preview_only: false
# Projects (optional).
# Associate this post with one or more of your projects.
# Simply enter your project's folder or file name without extension.
# E.g. `projects = ["internal-project"]` references `content/project/deep-learning/index.md`.
# Otherwise, set `projects = []`.
projects: ["osx", "binary-format"]
---
Dynamic symbols in Mach-O binary are stored in a form of bytecode and exported symbols are encoded as a prefix-`trie`. For dynamic symbols, Mach-O also has a stud binding to resolve symbols, which is the same as `__got` and `__plt` section on ELF binaries.
## Dynamic symbols
The linker reads up on symbols table and perform binding when neccessary. We start by explaining the bytecodes and finish with the binding process.
We have 4 different bytecode arrays, `rebase`, `bind symbol`, `weak bind symbol`, and `lazy bind symbol`. All 4 arrays using the same set of bytecode and laid out continuously in binary, with the only difference is the usecase.
Each byte encodes an opcode and its parameter, `uint8_t v = opcode | imm`. Some opcode require an additional `uleb128` number, the number can be read from the next bytes (at most 7 bytes, due to uleb128 encoding). Some opcode require a string, encoded next to and end with `0x00`.
The opcode reads into a state, and the state mutates after every code read. Basically the opcode is a compressed table, where we read through each row. Every new row read is first copied from the previous row and then update the collumns.
Often the binary is loaded in memory with a PIE slice, due to ASLR. A number of constants address created at compile time is no longer correct. When this happens, loader reads up the `rebase` opcode and add up the address constants to a value of slice.
`bind symbol`, `weak bind symbol`, `lazy bind symbol` are decoded into a list of dynamic symbols. Each symbol has `dylib ordinal`, `segment index`, `name`, `address`. At the start of each row, `bind_done` is performed, which will find the `name` symbol from the library declared using load command at index `dylib ordinal` (count start at 1), and write the address of the function at `address`. `dylib ordinal` has special values of 0, -1, -2 to indicate the the special dynamic library.
### Binding process
The binding process happens when a symbol address is written into the memory to provide callback for the original code. This process exist because the symbols are undefined in compiled time, and only visible at runtime, yet the address of symbols are randomly located after each run.
To resolve this issue, Mach-O binary has a fake jump into the symbols. Where as the original code calls an imported symbols, `foo`, it actually calls to a function that redirect to the resolved address.
```asm
__text:
call foo_ ;; call foo, but with a holder
foo_:
load foo_addr_holder
call
foo_addr_holder:
0x000000
```
With the above scheme, the compiler can easily create a holder for the address and let the loader re-write the address at runtime. One draw-back to this scheme requires the loader to resolve all imported symbols' address holders. Result in a longer startup time. But Mach-O can also perform lazy binding, by following the below scheme.
```asm
__text:
call foo_ ;; call foo lazy
foo_:
load foo_addr_holder
call
foo_addr_holder:
foo_addr_resolver ;; re-writen by loader after resolving
foo_addr_resolver:
load foo_opcode_start_index ;; just a number
call loader_symbol_resolver
```
For lazy bind symbols, the Mach-O has a resolver for each lazy symbols, and the function is called on the first time calling. This function loads a number and call the resolver from the loader. When the loader resolver finished, the address holder of the lazy symbol is re-written to contain the address of the symbol.
The number that is passed into loader's resolver is the index into the row of the lazy bind symbol of the correspondent symbol.
In Mach-O, the section for address holder or `__nl_symbol_ptr` and `__la_symbol_ptr` for non lazy (first scheme) and lazy (second scheme) symbols, respectively. The resolver section is called `__stub_helper`. In Go generated binaries, non lazy symbols section is named `__got`.
## Exported symbols
`exported symbols` is encoded as a prefix-`trie`, where each node holds an export symbol. The symbol can be Regular, Weak, Reexport, or Stub. Regular symbol has an address field, which is the offset from Mach-O. The parsing of the trie is quite simple, but requires a little recursive writting. Apple also write the encoding process in the Mach-O headers.
For regular symbols, the offset collected is the file offset of the Mach-O. This way, when searching for the function the loader can easily calculate the address on memory. The trie can also speed up searching by only follow the branch that matches the symbol to be found.

89
content/osx/macho.md Normal file
View File

@ -0,0 +1,89 @@
---
# Documentation: https://wowchemy.com/docs/managing-content/
title: "Overview of Mach-O binary"
subtitle: ""
summary: ""
authors: [luibo]
tags: [osx, iOS, macOS, dyld]
categories: [osx]
published: "2021-09-06"
lastmod: 2021-09-06T11:15:01+07:00
featured: false
draft: false
# Featured image
# To use, add an image named `featured.jpg/png` to your page's folder.
# Focal points: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight.
image:
caption: ""
focal_point: ""
preview_only: false
# Projects (optional).
# Associate this post with one or more of your projects.
# Simply enter your project's folder or file name without extension.
# E.g. `projects = ["internal-project"]` references `content/project/deep-learning/index.md`.
# Otherwise, set `projects = []`.
projects: ["osx", "binary-format"]
---
Mach-O is a binary format used by Apple for its systems. The binary format contains assembled bytes, data and other information. Structured by a list of load commands, where each load command hold the neccessary pointers to the contents.
## Header
At offset 0 lies a header structure, `struct mach_header`, containing the general information about the the binary.
```
struct mach_header {
uint32_t magic;
cpu_type_t cputype;
cpu_subtype_t cpusubtype;
uint32_t filetype;
uint32_t ncmds;`
uint32_t sizeofcmds;
uint32_t flags;
};
```
Valid `magic` values are: `0xfeedface` for 32-bit format, `0xfeedfacf` for 64-bit format, little endian wise, big endian version are bytes swapped.
`cputype` and `cpusubtype` declare on which platform can this binary be loaded (or which assembly type this file contains). The most we are gonna see are x86, x86_64, and arm64, arm64e; while arm 32-bit aka armv7 (armv7s, armv7a) exist, Apple dropped support for these platforms since the release of iPhone 6.
`filetype` denotes the type of binary, *executable*, *dynamic library*, *object file*.
`ncmds` and `sizeofcmds` declare the number of load commands and the total size of load commands in byte. The reason why the size is required because the size of load command types varied. Also, the `sizeofcmds` is checked upon binary load, and throws error if it's incorrect.
`flags` is bit mask value for extra information, e.g. PIE.
## Load command
Each load command is structured, type of `cmd`, command size `cmdsize`, and information of that command.
There are many types of load commands, however we only focus on `segment`, `dynamic library`, `symbols`, `fairplay`, `codesignature` command types.
Segments are common in executable/library binaries. These point to the data inside where the `.text` or `.data` reside. In Mach-O binaries, a segment load command is followed by a series of sections, with each section mark the start/end of the data. The common sections are: `__text`, `__cstring`, `__const`, `__got`, `__la_symbol_ptr`, `__mod_init_func`, `__data`, `__bss`. These sections can be named without any restrictions, however compilers often name them by a rule of thumb. The attributes for the sections is marked with bit mask `flag`, indicating the attributes of the items.
A unique segment with no section is named `__LINKEDIT`. This section points to the last part of the binary containing various information, including tables of symbols, tables of symbols name, list of exported symbols, and binary's signature.
Each dynamic library is registered through a load command containing the path to the library. The path can either be absolute or relative. Absolute path resolving is straight-forward. With relative path resolving, the binary can use either of the two forms: relative to current directory, or **rpath**. Relative path with the current directory is easy to understand, `./`, `../` and such paths are valid in this case.
**rpath** is a little different, in short, the path started with either these variables: `@executable_path`, `@loader_path`, `@rpath`. `@executable_path` is replaced with the executable's residing folder, `@loader_path` is replaced with the path of the folder containing the loader. `@rpath` is resolve by `rpath` load commands.
The Mach-O binary can possess many load command to denote the `rpath`, each of the item must be an absolute path, or relative path, or using `@executable_path` or `@loader_path` or `@rpath`. It is unclear whether `rpath` can be stacked, but as a rule of thumb, we should not use `@rpath` on rpath load command. A common rpath often used by Apple is `@executable_path/Frameworks`, which can be seen on iPhone/iPad application binaries compiled using Xcode.
Fairplay encryption is a mechanism designed by Apple to encrypt the app content with the device private key, such that you cannot run the app from another machine. The Mach-O binary always have a load command pointing to the section starts and end, and the encryption status.
Due to Apple design of the fairplay, we can't recover the key to decrypt. However, we can actively dump the binary on memory, as it must be decrypted before running. Another method involves using the Apple mmap for fairplay encrypted region. These should be discussed on [[Fairplay]].
Codesignature is present on signed binary, using `codesign` with a `distribution` or `development` key. The sections tells us many informatin regarding the signer, and hashes. The signature is encoded in a PKCS#7/CMS with SignedData encoded in BER of ASN.1 (X.609). It also contains the list of certificates in X.509 format, and the signature digest. Currently Apple is using RSA to sign its binary.
The binary must be signed with a certificate chain root as Apple CA, otherwise Apple devices reject installation. Apps distributed through the Apple Store is also signed by Apple Store and device distribution certificate. For self-signed binary, the Apple CA is still the root certificate, while the children are `developer` certificate.
Symbols are encoded as a series of bytecode, a load command is specified to mark the region of symbols. This command registers the placement of `non lazy`, `lazy`, `exported` symbols. `non lazy` symbols are searched and written into the `got` table when the binary is loaded, `lazy` symbols are searched through `plt`, `export` symbols are indexes/addresses into the function start.
`non lazy` and `lazy` symbols are encoded as **bind** opcode; `export` symbols are encoded as a prefix-`trie`. More detailed about these in [[Linker Info]].
The above paragraph states the current situation of Mach-O symbols encoding. However, a few years ago, this was not the case. Few years back (don't know when), they have a list of symbols and dynamic symbols in sperated commands. Thus in the newver version of Mach-O, they have a command id as, `LC_DYLD_INFO_ONLY`, which shows that it should not be used with the legacy list anymore. Loader crashes if this command is used with an non-empty list of (dynamic) symbols.
The Mach-O related structures can be found and read on Apple's `cctools` modules at `include/mach-o/loader.h`.