2017-10-11 17:22:24 +07:00
# 5.3 angr 二进制自动化分析
2017-09-12 18:57:05 +07:00
2017-12-02 22:38:19 +07:00
- [安装 ](#安装 )
- [使用 angr ](#使用-angr )
2017-12-09 09:35:26 +07:00
- [入门 ](#入门 )
- [加载二进制文件 ](#加载二进制文件 )
2017-12-02 22:38:19 +07:00
- [angr 在 CTF 中的运用 ](#angr-在-ctf-中的运用 )
- [参考资料 ](#参考资料 )
2017-09-12 18:57:05 +07:00
2017-12-02 22:38:19 +07:00
[angr ](https://github.com/angr/angr ) 是一个多架构的二进制分析平台,具备对二进制文件的动态符号执行能力和多种静态分析能力。在近几年的 CTF 中也大有用途。
2017-09-12 18:57:05 +07:00
2017-12-02 22:38:19 +07:00
## 安装
在 Ubuntu 上,首先我们应该安装所有的编译所需要的依赖环境:
2017-09-12 18:57:05 +07:00
```shell
2017-12-02 22:38:19 +07:00
$ sudo apt install python-dev libffi-dev build-essential virtualenvwrapper
2017-09-12 18:57:05 +07:00
```
2017-12-02 22:38:19 +07:00
强烈建议在虚拟环境中安装 angr, 因为有几个 angr 的依赖( 比如z3) 是从他们的原始库中 fork 而来,如果你已经安装了 z3,那么你肯定不希望 angr 的依赖覆盖掉官方的共享库。
2017-09-12 18:57:05 +07:00
2017-12-02 22:38:19 +07:00
对于大多数 *nix系统, 只需要 `mkvirtualenv angr && pip install angr` 安装就好了。
2017-09-12 18:57:05 +07:00
2017-12-08 11:06:55 +07:00
如果这样安装失败的话,那么你可以按照下面的顺序从 angr 的官方仓库安装:
2017-09-12 18:57:05 +07:00
```text
1. claripy
2. archinfo
3. pyvex
4. cle
5. angr
```
2017-12-08 11:06:55 +07:00
如:
2017-09-12 18:57:05 +07:00
```shell
2017-12-02 22:38:19 +07:00
$ git clone https://github.com/angr/claripy
$ cd claripy
$ sudo pip install -r requirements.txt
$ sudo python setup.py build
$ sudo python setup.py install
2017-09-12 18:57:05 +07:00
```
2017-12-08 11:06:55 +07:00
其他几个库也是一样的。
安装过程中可能会有一些奇怪的错误,可以到官方文档中查看。
2017-09-12 18:57:05 +07:00
2017-12-08 11:06:55 +07:00
## 使用 angr
#### 基础功能
使用 angr 的第一步是新建一个工程,几乎所有的操作都是围绕这个工程展开的:
2017-09-12 18:57:05 +07:00
```python
>>> import angr
2017-12-08 11:06:55 +07:00
>>> proj = angr.Project('/bin/true')
WARNING | 2017-12-08 10:46:58,836 | cle.loader | The main binary is a position-independent executable. It is being loaded with a base address of 0x400000.
```
这样就得到了二进制文件的各种信息,如:
```python
>>> proj.filename
'/bin/true'
>>> proj.arch
< Arch AMD64 ( LE ) >
>>> hex(proj.entry)
'0x4013b0'
2017-09-12 18:57:05 +07:00
```
2017-12-08 11:06:55 +07:00
程序加载时会将二进制文件和共享库映射到虚拟地址中, CLE 模块就是用来处理这些东西的。
```python
>>> proj.loader
< Loaded true , maps [ 0x400000:0x5008000 ] >
2017-09-12 18:57:05 +07:00
```
2017-12-08 11:06:55 +07:00
所有对象文件如下,其中二进制文件是 main object:
```
>>> proj.loader.all_objects
[< ELF Object true , maps [ 0x400000:0x60721f ] > , < ELF Object libc-2 . 26 . so , maps [ 0x1000000:0x13b78cf ] > , < ELF Object ld-2 . 26 . so , maps [ 0x2000000:0x22260f7 ] > , < ELFTLSObject Object cle # # tls , maps [ 0x3000000:0x300d010 ] > , < ExternObject Object cle # # externs , maps [ 0x4000000:0x4008000 ] > , < KernelObject Object cle # # kernel , maps [ 0x5000000:0x5008000 ] > ]
>>> proj.loader.main_object
< ELF Object true , maps [ 0x400000:0x60721f ] >
>>> proj.loader.main_object.pic
True
```
通常我们在创建工程时选择关闭 `auto_load_libs` 以避免 angr 加载共享库:
```
>>> p = angr.Project('/bin/true', auto_load_libs=False)
WARNING | 2017-12-08 11:09:28,629 | cle.loader | The main binary is a position-independent executable. It is being loaded with a base address of 0x400000.
>>> p.loader.all_objects
[< ELF Object true , maps [ 0x400000:0x60721f ] > , < ExternObject Object cle # # externs , maps [ 0x1000000:0x1008000 ] > , < KernelObject Object cle # # kernel , maps [ 0x2000000:0x2008000 ] > , < ELFTLSObject Object cle # # tls , maps [ 0x3000000:0x300d010 ] > ]
2017-09-12 18:57:05 +07:00
```
2017-12-08 11:06:55 +07:00
`project.factory` 提供了很多类对二进制文件进行分析,它提供了几个方便的构造函数。
2017-09-12 18:57:05 +07:00
2017-12-08 11:06:55 +07:00
`project.factory.block()` 用于从给定地址解析一个 basic block:
```python
>>> block = proj.factory.block(proj.entry) # 从程序头开始解析一个 basic block
>>> block
< Block for 0x4013b0 , 42 bytes >
>>> block.pp() # pretty-print, 即打印出反汇编代码
0x4013b0: xor ebp, ebp
0x4013b2: mov r9, rdx
0x4013b5: pop rsi
0x4013b6: mov rdx, rsp
0x4013b9: and rsp, 0xfffffffffffffff0
0x4013bd: push rax
0x4013be: push rsp
0x4013bf: lea r8, qword ptr [rip + 0x32ca]
0x4013c6: lea rcx, qword ptr [rip + 0x3253]
0x4013cd: lea rdi, qword ptr [rip - 0xe4]
0x4013d4: call qword ptr [rip + 0x205b26]
>>> block.instructions # 指令数量
11
>>> block.instruction_addrs # 指令地址
[4199344L, 4199346L, 4199349L, 4199350L, 4199353L, 4199357L, 4199358L, 4199359L, 4199366L, 4199373L, 4199380L]
```
另外,还可以将 block 对象转换成其他形式:
```python
>>> block.capstone
< CapstoneBlock for 0x4013b0 >
>>> block.capstone.pp()
>>>
>>> block.vex
< pyvex.block.IRSB object at 0x7fe526b98670 >
>>> block.vex.pp()
```
程序的执行需要初始化一个 `SimState` 对象:
```python
>>> state = proj.factory.entry_state()
>>> state
< SimState @ 0x4013b0 >
```
该对象包含了程序的内存、寄存器、文件系统数据等:
```python
>>> state.regs.rip
< BV64 0x4013b0 >
>>> state.regs.rsp
< BV64 0x7fffffffffeff98 >
>>> state.regs.rdi
< BV64 reg_48_0_64 { UNINITIALIZED } > # 符号变量,它是符号执行的基础
>>> state.mem[proj.entry].int.resolved
< BV32 0x8949ed31 >
```
这里的 BV, 即 bitvectors, 用于表示 angr 里的 CPU 数据。下面是 python int 和 bitvectors 之间的转换:
```python
>>> bv = state.solver.BVV(0x1234, 32)
>>> bv
< BV32 0x1234 >
>>> hex(state.solver.eval(bv))
'0x1234'
>>> bv = state.solver.BVV(0x1234, 64)
>>> bv
< BV64 0x1234 >
>>> hex(state.solver.eval(bv))
'0x1234L'
```
使用 bitvectors 来设置寄存器和内存的值,当直接传入 python int 时, angr 会自动将其转换成 bitvectors:
```python
>>> state.regs.rsi = state.solver.BVV(3, 64)
>>> state.regs.rsi
< BV64 0x3 >
>>> state.mem[0x1000].long = 4
>>> state.mem[0x1000].long.resolved # .resolved 获取 bitvectors
< BV64 0x4 >
>>> state.mem[0x1000].long.concrete # .concrete 获得 python int
4L
```
初始化的 state 可以经过模拟执行得到一系列的 states, simulation 管理器的作用就是对这些 states 进行管理:
```python
>>> simgr = proj.factory.simulation_manager(state)
>>> simgr
< SimulationManager with 1 active >
>>> simgr.active
[< SimState @ 0x4013b0 > ]
>>> simgr.step() # 模拟一个 basic block 的执行
< SimulationManager with 1 active >
>>> simgr.active # 模拟状态被更新
[< SimState @ 0x1020e80 > ]
>>> simgr.active[0].regs.rip # active[0] 是当前 state
< BV64 0x404620 >
>>> state.regs.rip # 但原始的 state 没有变
< BV64 0x4013b0 >
```
`project.analyses` 提供了大量函数用于程序分析。
```python
>>> cfg = p.analyses.CFGFast() # 得到 control-flow graph
>>> cfg
< CFGFast Analysis Result at 0x7f4626f15090 >
>>> cfg.graph
< networkx.classes.digraph.DiGraph object at 0x7f462316ef90 > # 详细内容请查看 networkx
>>> len(cfg.graph.nodes())
937
>>> entry_node = cfg.get_any_node(proj.entry) # 得到给定地址的节点
>>> entry_node
< CFGNode 0x4013b0 [ 42 ] >
>>> len(list(cfg.graph.successors(entry_node)))
2
```
如果要想画出图来,还需要安装 matplotlib, Tkinter 等。
```python
>>> import networkx as nx
>>> import matplotlib.pyplot as plt
>>> nx.draw(cfg.graph) # 画图
>>> plt.show() # 显示
>>> plt.savefig('temp.png') # 保存
```
2017-09-12 18:57:05 +07:00
2017-12-09 09:35:26 +07:00
#### 加载二进制文件
angr 的二进制加载模块称为 CLE。主类为 `cle.loader.Loader` ,它导入所有的对象文件并导出一个进程内存的抽象。类 `cle.backends` 是加载器的后端,根据二进制文件类型区分为 `cle.backends.elf` 、`cle.backends.pe`、`cle.backends.macho` 等。
加载对象文件和细分类型如下:
```python
>>> proj.loader.all_objects # 所有对象文件
[< ELF Object true , maps [ 0x400000:0x60721f ] > , < ELF Object libc-2 . 26 . so , maps [ 0x1000000:0x13b78cf ] > , < ELF Object ld-2 . 26 . so , maps [ 0x2000000:0x22260f7 ] > , < ELFTLSObject Object cle # # tls , maps [ 0x3000000:0x300d010 ] > , < ExternObject Object cle # # externs , maps [ 0x4000000:0x4008000 ] > , < KernelObject Object cle # # kernel , maps [ 0x5000000:0x5008000 ] > ]
```
- `proj.loader.main_object` :主对象文件
- `proj.loader.shared_objects` :共享对象文件
- `proj.loader.extern_object` :外部对象文件
- `proj.loader.all_elf_object` :所有 elf 对象文件
- `proj.loader.kernel_object` :内核对象文件
通过对这些对象文件进行操作,可以解析出相关信息:
```python
>>> obj = proj.loader.main_object
>>> hex(obj.entry) # 入口地址
'0x4013b0'
>>> hex(obj.min_addr), hex(obj.max_addr) # 起始地址和结束地址
('0x400000', '0x60721f')
>>> obj.segments # segments
< Regions: [ < ELFSegment offset = 0x0, flags = 0x5, filesize = 0x6094, vaddr = 0x400000, memsize = 0x6094 > , < ELFSegment offset = 0x6c10, flags = 0x6, filesize = 0x470, vaddr = 0x606c10, memsize = 0x610 > ]>
>>> obj.sections # sections
< Regions: [ < Unnamed | offset 0x0 , vaddr 0x400000 , size 0x0 > , < .interp | offset 0x238 , vaddr 0x400238 , size 0x1c > , < .note.ABI-tag | offset 0x254 , vaddr 0x400254 , size 0x20 > ,...etc
```
根据需要解析我们需要的信息:
```python
>>> obj.find_segment_containing(obj.entry) # 包含给定地址的 segments
< ELFSegment offset = 0x0, flags = 0x5, filesize = 0x6094, vaddr = 0x400000, memsize = 0x6094 >
>>> obj.find_section_containing(obj.entry) # 包含给定地址的 sections
< .text | offset 0x12f0 , vaddr 0x4012f0 , size 0x33c9 >
```
2017-09-12 18:57:05 +07:00
2017-12-02 22:38:19 +07:00
## angr 在 CTF 中的运用
#### re DefcampCTF2015 entry_language
这是一题标准的密码验证题,输入一个字符串,程序验证对误。
```
$ file entry_language
defcamp_r100: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.24, BuildID[sha1]=0f464824cc8ee321ef9a80a799c70b1b6aec8168, stripped
```
```
$ ./entry_language
Enter the password: ABCD
Incorrect password!
```
为了与 angr 的自动化做对比,我们先使用传统的方法,逆向算法求解,`main` 函数和验证函数 `fcn.004006fd` 如下:
```
[0x00400610]> pdf @ main
/ (fcn) main 153
| main ();
| ; var int local_110h @ rbp-0x110
| ; var int local_8h @ rbp-0x8
| ; DATA XREF from 0x0040062d (entry0)
| 0x004007e8 55 push rbp
| 0x004007e9 4889e5 mov rbp, rsp
| 0x004007ec 4881ec100100. sub rsp, 0x110
| 0x004007f3 64488b042528. mov rax, qword fs:[0x28] ; [0x28:8]=-1 ; '(' ; 40
| 0x004007fc 488945f8 mov qword [local_8h], rax
| 0x00400800 31c0 xor eax, eax
| 0x00400802 bf37094000 mov edi, str.Enter_the_password: ; 0x400937 ; "Enter the password: "
| 0x00400807 b800000000 mov eax, 0
| 0x0040080c e8affdffff call sym.imp.printf ; int printf(const char *format)
| 0x00400811 488b15500820. mov rdx, qword [obj.stdin] ; [0x601068:8]=0
| 0x00400818 488d85f0feff. lea rax, [local_110h]
| 0x0040081f beff000000 mov esi, 0xff ; 255
| 0x00400824 4889c7 mov rdi, rax
| 0x00400827 e8b4fdffff call sym.imp.fgets ; char *fgets(char *s, int size, FILE *stream)
| 0x0040082c 4885c0 test rax, rax
| ,=< 0x0040082f 7435 je 0x400866
| | 0x00400831 488d85f0feff. lea rax, [local_110h]
| | 0x00400838 4889c7 mov rdi, rax
| | 0x0040083b e8bdfeffff call fcn.004006fd ; 调用验证函数
| | 0x00400840 85c0 test eax, eax
| ,==< 0x00400842 7511 jne 0x400855
| || 0x00400844 bf4c094000 mov edi, str.Nice_ ; 0x40094c ; "Nice!"
| || 0x00400849 e852fdffff call sym.imp.puts ; int puts(const char *s)
| || 0x0040084e b800000000 mov eax, 0
| ,===< 0x00400853 eb16 jmp 0x40086b
| ||| ; JMP XREF from 0x00400842 (main)
| |`--> 0x00400855 bf52094000 mov edi, str.Incorrect_password_ ; 0x400952 ; "Incorrect password!"
| | | 0x0040085a e841fdffff call sym.imp.puts ; int puts(const char *s)
| | | 0x0040085f b801000000 mov eax, 1
| |,==< 0x00400864 eb05 jmp 0x40086b
| ||| ; JMP XREF from 0x0040082f (main)
| ||`-> 0x00400866 b800000000 mov eax, 0
| || ; JMP XREF from 0x00400864 (main)
| || ; JMP XREF from 0x00400853 (main)
| ``--> 0x0040086b 488b4df8 mov rcx, qword [local_8h]
| 0x0040086f 6448330c2528. xor rcx, qword fs:[0x28]
| ,=< 0x00400878 7405 je 0x40087f
| | 0x0040087a e831fdffff call sym.imp.__stack_chk_fail ; void __stack_chk_fail(void)
| | ; JMP XREF from 0x00400878 (main)
| `-> 0x0040087f c9 leave
\ 0x00400880 c3 ret
[0x00400610]> pdf @ fcn.004006fd
/ (fcn) fcn.004006fd 171
| fcn.004006fd (int arg_bh);
| ; var int local_38h @ rbp-0x38
| ; var int local_24h @ rbp-0x24
| ; var int local_20h @ rbp-0x20
| ; var int local_18h @ rbp-0x18
| ; var int local_10h @ rbp-0x10
| ; arg int arg_bh @ rbp+0xb
| ; CALL XREF from 0x0040083b (main)
| 0x004006fd 55 push rbp
| 0x004006fe 4889e5 mov rbp, rsp
| 0x00400701 48897dc8 mov qword [local_38h], rdi
| 0x00400705 c745dc000000. mov dword [local_24h], 0
| 0x0040070c 48c745e01409. mov qword [local_20h], str.Dufhbmf ; 0x400914 ; "Dufhbmf"
| 0x00400714 48c745e81c09. mov qword [local_18h], str.pG_imos ; 0x40091c ; "pG`imos"
| 0x0040071c 48c745f02409. mov qword [local_10h], str.ewUglpt ; 0x400924 ; "ewUglpt"
| 0x00400724 c745dc000000. mov dword [local_24h], 0
| ,=< 0x0040072b eb6e jmp 0x40079b
| | ; JMP XREF from 0x0040079f (fcn.004006fd)
| .--> 0x0040072d 8b4ddc mov ecx, dword [local_24h]
| :| 0x00400730 ba56555555 mov edx, 0x55555556
| :| 0x00400735 89c8 mov eax, ecx
| :| 0x00400737 f7ea imul edx
| :| 0x00400739 89c8 mov eax, ecx
| :| 0x0040073b c1f81f sar eax, 0x1f
| :| 0x0040073e 29c2 sub edx, eax
| :| 0x00400740 89d0 mov eax, edx
| :| 0x00400742 01c0 add eax, eax
| :| 0x00400744 01d0 add eax, edx
| :| 0x00400746 29c1 sub ecx, eax
| :| 0x00400748 89ca mov edx, ecx
| :| 0x0040074a 4863c2 movsxd rax, edx
| :| 0x0040074d 488b74c5e0 mov rsi, qword [rbp + rax*8 - 0x20]
| :| 0x00400752 8b4ddc mov ecx, dword [local_24h]
| :| 0x00400755 ba56555555 mov edx, 0x55555556
| :| 0x0040075a 89c8 mov eax, ecx
| :| 0x0040075c f7ea imul edx
| :| 0x0040075e 89c8 mov eax, ecx
| :| 0x00400760 c1f81f sar eax, 0x1f
| :| 0x00400763 29c2 sub edx, eax
| :| 0x00400765 89d0 mov eax, edx
| :| 0x00400767 01c0 add eax, eax
| :| 0x00400769 4898 cdqe
| :| 0x0040076b 4801f0 add rax, rsi ; '+'
| :| 0x0040076e 0fb600 movzx eax, byte [rax]
| :| 0x00400771 0fbed0 movsx edx, al
| :| 0x00400774 8b45dc mov eax, dword [local_24h]
| :| 0x00400777 4863c8 movsxd rcx, eax
| :| 0x0040077a 488b45c8 mov rax, qword [local_38h]
| :| 0x0040077e 4801c8 add rax, rcx ; '& '
| :| 0x00400781 0fb600 movzx eax, byte [rax]
| :| 0x00400784 0fbec0 movsx eax, al
| :| 0x00400787 29c2 sub edx, eax
| :| 0x00400789 89d0 mov eax, edx
| :| 0x0040078b 83f801 cmp eax, 1 ; 1
| ,===< 0x0040078e 7407 je 0x400797 ; = 1 时跳转 , 验证成功
| |:| 0x00400790 b801000000 mov eax, 1 ; 返回 1, 验证失败
| ,====< 0x00400795 eb0f jmp 0x4007a6
| ||:| ; JMP XREF from 0x0040078e (fcn.004006fd)
| |`---> 0x00400797 8345dc01 add dword [local_24h], 1 ; i = i + 1
| | :| ; JMP XREF from 0x0040072b (fcn.004006fd)
| | :`-> 0x0040079b 837ddc0b cmp dword [local_24h], 0xb ; [0xb:4]=-1 ; 11
| | `==< 0x0040079f 7e8c jle 0x40072d ; i < = 11 时跳转
| | 0x004007a1 b800000000 mov eax, 0 ; 返回 0
| | ; JMP XREF from 0x00400795 (fcn.004006fd)
| `----> 0x004007a6 5d pop rbp
\ 0x004007a7 c3 ret
```
2017-09-12 18:57:05 +07:00
2017-12-02 22:38:19 +07:00
整理后可以得到下面的伪代码:
```C
int fcn_004006fd(int *passwd) {
char *str_1 = "Dufhbmf";
char *str_2 = "pG`imos";
char *str_3 = "ewUglpt";
for (int i = 0; i < = 11; i++) {
if((& str_3)[i % 3][2 * (1 / 3)] - * (i + passwd) != 1) {
return 1;
}
}
return 0;
}
```
然后写出逆向脚本:
```python
str_list = ["Dufhbmf", "pG`imos", "ewUglpt"]
passwd = []
for i in range(12):
passwd.append(chr(ord(str_list[i % 3][2 * (i / 3)]) - 1))
print ''.join(passwd)
```
2017-09-12 18:57:05 +07:00
2017-12-02 22:38:19 +07:00
逆向算法似乎也很简单,但如果连算法都不用逆的话,下面就是见证 angr 魔力的时刻,我们只需要指定让程序运行到 `0x400844` ,即验证通过时的位置,而不用管验证的逻辑是怎么样的。完整的 exp 如下,其他文件在 [github ](../src/Others/5.3_angr ) 相应文件夹中。
2017-09-12 18:57:05 +07:00
```python
import angr
2017-12-02 22:38:19 +07:00
project = angr.Project("entry_language", auto_load_libs=False)
2017-09-12 18:57:05 +07:00
@project .hook(0x400844)
def print_flag(state):
print "FLAG SHOULD BE:", state.posix.dump_fd(0)
project.terminate_execution()
project.execute()
```
2017-12-02 22:38:19 +07:00
Bingo!!!
```
$ python2 exp_angr.py
FLAG SHOULD BE: Code_Talkers
2017-12-03 21:11:11 +07:00
$ ./entry_language
Enter the password: Code_Talkers
Nice!
2017-12-02 22:38:19 +07:00
```
## 参考资料
2017-12-04 15:13:39 +07:00
- [angr.io ](http://angr.io/ )
2017-12-02 22:38:19 +07:00
- [docs.angr.io ](https://docs.angr.io/ )
- [angr API documentation ](http://angr.io/api-doc/ )
- [The Art of War:Offensive Techniques in Binary Analysis ](https://www.cs.ucsb.edu/~vigna/publications/2016_SP_angrSoK.pdf )