Compare commits

...

3 Commits

Author SHA1 Message Date
5e8ad0cc8b add new post 2024-09-04 23:52:30 +07:00
3fabc86b32 update elm-pages 2024-09-04 23:52:03 +07:00
5d91188cf6 change username on server 2024-09-04 23:51:45 +07:00
4 changed files with 2496 additions and 916 deletions

View File

@ -0,0 +1,38 @@
---
title: "Rebuilding binaries (easy) with ARMv8"
subtitle: ""
summary: ""
tags: []
categories: []
published: "2024-09-04"
featured: false
draft: false
---
ARMv8 architecture and instructions is simple to understand, and very compact. Each instruction uses 4-byte, and some special instructions can be used to calculate addresses effectively. These capabilities allow us to define a complex model for rebuilding application binaries.
## What is binary rebuliding and why?
Binary rebuilding is **my own term** to reflect binary modification by breaking down components of the binary, then reassemblying them again. Readers might be more familiar with techniques such as decompilation or lifting, and could be wondering why I opt to a different term. So let's dicuss what is decompilation and lifting and understand why binary rebuilding is different.
Decompilation is a complex process, ultimately trying to convert the compiled binary into its source code representation. Because the process can be seen as a reverse from compilation, people call it de-compilation. However, because of the many optimization passes introduced during compilation, decompilation is usually a less effective method.
Lifting describes the process of converting the binary assembly into a higher representation but not completely into source code. This interpretation is able to express the same program, allows for modifications, and is able to compile back into the binary. Lifting is often a more interesting choice because the liften representation is expressive, most of the time, more expressive then assembly.
So how binary rebuilding is different? Two previously described methods does not use assembly directly, mainly because assembly does not have the high density of information. It is obvious because a given assembly code does not give enough context to the function, or module it belongs to. However, assembly combines with binary format gives more information then one would think. Rebuilding binary aims to utilize the assembly with binary format to introduce a binary modification technique by breaking them to smaller components and joins them again.
## Binary executable format
The operating system can offload cpu instructions and execute them, but the instructions must be in the form of a well structured file, telling how the operating should load them in memory, as well as dynamic libraries included. This well structured file is often called executable binary format. Any opearting system can devise themselves their own format, and to date, the commonly used format are PE (on Windows), ELF (on Linux), and Mach-O (on Apple OS).
Executable binary formats are defined to store code and data used in the program, with information for the operating system to use to prepare the binary for execution (loader, but I don't want to make this post longer). In assembly, references to code or data must be through direct address or relative address of the code or data. For instance, a call to a function `foo` must either be `call addr-of-foo` or `call offset-below-x-bytes`. For data access, the same idea is applied, although sometimes, with a relocation (another common aspect in executable binary format, not to be discussed in detailed here).
## Rebuilding Binary
Rebuilding binary should be simple in the overview. Instructions, data, and imported modules are destructured from the binary format. Modifications can be made from this destructured form, and structure them back into the binary format after modifications are made. It might sound easy, but working through the this destructured form is very complex, as references must be handled very carefully. But why do we need to do this, what are the advantages compared to decompilation and lifting?
Rebuilding the binary in this form allows one to do anything, in theory. Code can be moved up and down, data can also be moved up and down, code and (readonly) data can be merged together, imported functions can be changed, data can be changed, code can be extended or trimmed or modified.
## Why ARMv8 makes this easier?
ARMv8 instructions are 4-byte each, meaning that the assembly is easy to be patched, introduced or removed, as when a modification is introduced, we can calculate ahead how many bytes are affected, thus making the correct change of space. ARM also has some interesting instructions such as `adr`, `adrp` help with addressing.

View File

@ -11,11 +11,11 @@
"avh4/elm-color": "1.0.0", "avh4/elm-color": "1.0.0",
"danfishgold/base64-bytes": "1.1.0", "danfishgold/base64-bytes": "1.1.0",
"danyx23/elm-mimetype": "4.0.1", "danyx23/elm-mimetype": "4.0.1",
"dillonkearns/elm-bcp47-language-tag": "1.0.1", "dillonkearns/elm-bcp47-language-tag": "2.0.0",
"dillonkearns/elm-form": "3.0.0", "dillonkearns/elm-form": "3.0.1",
"dillonkearns/elm-markdown": "7.0.1", "dillonkearns/elm-markdown": "7.0.1",
"dillonkearns/elm-oembed": "1.0.0", "dillonkearns/elm-oembed": "1.0.0",
"dillonkearns/elm-pages": "10.0.1", "dillonkearns/elm-pages": "10.1.0",
"elm/browser": "1.0.2", "elm/browser": "1.0.2",
"elm/bytes": "1.0.8", "elm/bytes": "1.0.8",
"elm/core": "1.0.5", "elm/core": "1.0.5",
@ -32,16 +32,16 @@
"elm-community/list-extra": "8.7.0", "elm-community/list-extra": "8.7.0",
"elm-community/result-extra": "2.4.0", "elm-community/result-extra": "2.4.0",
"jluckyiv/elm-utc-date-strings": "1.0.0", "jluckyiv/elm-utc-date-strings": "1.0.0",
"justinmimbs/date": "4.0.1", "justinmimbs/date": "4.1.0",
"matheus23/elm-default-tailwind-modules": "4.0.1", "matheus23/elm-default-tailwind-modules": "4.0.1",
"mdgriffith/elm-codegen": "3.0.0", "mdgriffith/elm-codegen": "4.2.2",
"miniBill/elm-codec": "2.0.0", "miniBill/elm-codec": "2.1.0",
"noahzgordon/elm-color-extra": "1.0.2", "noahzgordon/elm-color-extra": "1.0.2",
"pablohirafuji/elm-syntax-highlight": "3.5.0", "pablohirafuji/elm-syntax-highlight": "3.5.0",
"robinheghan/fnv1a": "1.0.0", "robinheghan/fnv1a": "1.0.0",
"rtfeldman/elm-css": "18.0.0", "rtfeldman/elm-css": "18.0.0",
"the-sett/elm-syntax-dsl": "6.0.2", "the-sett/elm-syntax-dsl": "6.0.3",
"turboMaCk/non-empty-list-alias": "1.3.1", "turboMaCk/non-empty-list-alias": "1.4.0",
"vito/elm-ansi": "10.0.1" "vito/elm-ansi": "10.0.1"
}, },
"indirect": { "indirect": {
@ -54,13 +54,13 @@
"elm-community/maybe-extra": "5.3.0", "elm-community/maybe-extra": "5.3.0",
"fredcy/elm-parseint": "2.0.1", "fredcy/elm-parseint": "2.0.1",
"matheus23/elm-tailwind-modules-base": "1.0.0", "matheus23/elm-tailwind-modules-base": "1.0.0",
"miniBill/elm-unicode": "1.0.3", "miniBill/elm-unicode": "1.1.1",
"robinheghan/murmur3": "1.0.0", "robinheghan/murmur3": "1.0.0",
"rtfeldman/elm-hex": "1.0.0", "rtfeldman/elm-hex": "1.0.0",
"rtfeldman/elm-iso8601-date-strings": "1.1.4", "rtfeldman/elm-iso8601-date-strings": "1.1.4",
"stil4m/elm-syntax": "7.2.9", "stil4m/elm-syntax": "7.3.5",
"stil4m/structured-writer": "1.0.3", "stil4m/structured-writer": "1.0.3",
"the-sett/elm-pretty-printer": "3.0.0" "the-sett/elm-pretty-printer": "3.1.0"
} }
}, },
"test-dependencies": { "test-dependencies": {

3350
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@ -5,7 +5,7 @@
"postinstall": "elm-tooling install", "postinstall": "elm-tooling install",
"start": "elm-pages dev --port 8080", "start": "elm-pages dev --port 8080",
"build": "elm-pages build", "build": "elm-pages build",
"publish": "npm run build && scp -r dist/* luibo@nganhkhoa.com:~/website/" "publish": "npm run build && scp -r dist/* nganhkhoa@nganhkhoa.com:~/website/"
}, },
"devDependencies": { "devDependencies": {
"elm-codegen": "^0.3.0", "elm-codegen": "^0.3.0",