CTF-All-In-One/doc/8.10_aeg.md

# 8.10 AEG: Automatic Exploit Generation

[paper](http://security.ece.cmu.edu/aeg/aeg-current.pdf)

## 简介

- 这篇论文向我们展示了如何将控制流劫持攻击的漏洞利用生成模型化为一个形式化验证问题。
- 提出了预处理符号执行，这是一种新的符号执行技术。
- 提出了一种通用的漏洞利用生成方法。
- 构建了 AEG，这是第一个能够自动发现漏洞并生成利用获得 shell 的端到端系统。

### 挑战及解决方案

- Source code analysis alone is inadequate and insufficient.
  - We combine source-code level analysis to improve scalability in finding bugs and binary and runtime information to exploit programs.
- Finding the exploitable paths among an infinite number of possible paths
  - We have developed preconditioned symbolic execution, a novel technique which targets paths that are more likely to be exploitable.
  - We have developed a priority queue path prioritization technique that uses heuristics to choose likely more exploitable paths first.
- An end-to-end system.
  - Our AEG implementation is a single command line that analyzes source code programs, generates symbolic execution formulas, solves them, performs binary analysis, generates binary-level runtime constraints, and formats the output as an actual exploit string that can be fed directly into the vulnerable program.

## AEG 概述

The example application is the setuid root `iwconfig` utility from the Wireless Tools package, a program consisting of about 3400 lines of C source code.

![img](../pic/8.10_iwconfig.png)

`iwconfig` has a classic strcpy buffer overflow vulnerability in the get info function (line 15). And our system goes through the following analysis steps:

1. AEG searches for bugs at the source code level by exploring execution paths. Specifically, AEG executes `iwconfig` using symbolic arguments (`argv`) as the input sources.
2. After following the path `main → print_info → get_info`, AEG reaches line 15, where it detects an out-of-bounds memory error on variable `ifr.ifr_name`. AEG solves the current path constraints and generates a concrete input that will trigger the detected bug.
3. AEG performs dynamic analysis on the `iwconfig` binary using the concrete input generated in step 2. It extracts runtime information about the memory layout, such as the address of the overflowed buffer (`ifr.ifr_name`) and the address of the return address of the vulnerable function (`get_info`).
4. AEG generates the constraints describing the exploit using the runtime information generated from the previous step: 1) the vulnerable buffer (`ifr.ifr_name`) must contain our shellcode, and 2) the overwritten return address must contain the address of the shellcode. Next, AEG appends the generated constraints to the path constraints and queries a constraint solver for a satisfying answer.
5. The satisfying answer gives us the exploit string. Finally, AEG runs the program with the generated exploit and verifies that it works. If the constraints were not solvable, AEG would resume searching the program for the next potential vulnerability.

## 形式化建模

At its core, the automatic exploit generation (AEG) challenge is a problem of finding program inputs that result in a desired exploited execution state.

![img](../pic/8.10_input.png)

In AEG, we are only concerned with the subset of unsafe states that are exploitable, represented by the circle labeled Π<sub>bug</sub>∧Π<sub>exploit</sub>. The intuition is that preconditioned symbolic execution limits the space searched to a smaller box.

Logically, we would be guaranteed to find all possible exploits when Π<sub>prec</sub> is less restrictive than the exploitability condition: Π<sub>bug</sub>(x)∧Π<sub>exploit</sub>(x) ⇒ Π<sub>prec</sub>(x).

## 实现方法

Our approach to the AEG challenge consists of six components: Pre-Process, Src-Analysis, Bug-Find, DBA, Exploit-Gen, and Verify.

![img](../pic/8.10_design.png)

- Pre-Process: src → (B<sub>gcc</sub>, B<sub>llvm</sub>).
  - The source program (src) is compiled down to 1) a binary B<sub>gcc</sub>, for which AEG will try to generate a working exploit and 2) a LLVM bytecode file B<sub>llvm</sub>, which will be used by our bug finding infrastructure.
- Src-Analysis: B<sub>llvm</sub> → max.
  - AEG analyzes the source code to generate the maximum size of symbolic data `max` that should be provided to the program. AEG determines `max` by searching for the largest statically allocated buffers of the target program.
- Bug-Find: (B<sub>llvm</sub>, φ, max) → (Π<sub>bug</sub>, V).
  - Bug-Find takes in LLVM bytecode B<sub>llvm</sub> and a safety property φ, and outputs a tuple (Π<sub>bug</sub>, V) for each detected vulnerability. Π<sub>bug</sub> contains the path predicate. V contains source-level information about the detected vulnerability.
- DBA: (B<sub>gcc</sub>, (Π<sub>bug</sub>, V)) → R.
  - DBA performs dynamic binary analysis on the target binary B<sub>gcc</sub> with a concrete buggy input and extracts runtime information R.
- Exploit-Gen: (Π<sub>bug</sub>, R) → Π<sub>bug</sub>∧Π<sub>exploit</sub>.
  - Exploit-Gen receives a tuple with the path predicate of the bug (Π<sub>bug</sub>) and runtime information (R), and constructs a formula for a control flow hijack exploit. The output formula includes constraints ensuring that: 1) a possible program counter points to a user-determined location, and 2) the location contains shellcode (Π<sub>exploit</sub>). The resulting exploit formula is the conjunction of the two predicates.
- Verify: (B<sub>gcc</sub>, Π<sub>bug</sub>∧Π<sub>exploit</sub>) → {ε, ⊥}.
  - Verify takes in the target binary executable B<sub>gcc</sub> and an exploit formula Π<sub>bug</sub>∧Π<sub>exploit</sub>, and returns an exploit ε only if there is a satisfying answer. Otherwise, it returns ⊥.

The high-level algorithm for solving the AEG challenge:

![img](../pic/8.10_algorithm.png)

## Bug-Find: 以漏洞利用生成为目的的程序分析

Bug-Find finds bugs with symbolic program execution, which explores the program state space one path at a time. However, there are an infinite number of paths to potentially explore. AEG addresses this problem with two novel algorithms:

- First, we present a novel technique called preconditioned symbolic execution that constrains the paths considered to those that would most likely include exploitable bugs.
- Second, we propose novel path prioritization heuristics for choosing which paths to explore first with preconditioned symbolic execution.

### Preconditioned Symbolic Execution

Preconditioned symbolic execution is a novel method to target symbolic execution towards a certain subset of the input state space. The state space subset is determined by the precondition predicate (Π<sub>prec</sub>); inputs that do not satisfy Π<sub>prec</sub> will not be explored.

In AEG, we have developed and implemented 4 different preconditions for efficient exploit generation:

- None. There is no precondition and the state space is explored as normal.
- Known Length. The precondition is that inputs are of known maximum length. We use static analysis to automatically determine this precondition.
- Known Prefix. The precondition is that the symbolic inputs have a known prefix.
- Concolic Execution. Concolic execution can be viewed as a specific form of preconditioned symbolic execution where the precondition is specified by a single program path as realized by an example input.

![img](../pic/8.10_bof.png)

Consider the example program above. Suppose that the `input` buffer contains 42 symbolic bytes. Lines 3-4 represent a tight symbolic loop that will eventually spawn 42 different interpreters with traditional symbolic execution, each one having a different path predicate. Each path predicate will describe a different condition about the string length of the symbolic `input` buffer.

Preconditioned symbolic execution avoids examining the loop iterations that will not lead to a buffer overflow by imposing a length precondition. Thus, we only need a single interpreter to explore the entire loop.

### Path Prioritization: Search Heuristics

All pending paths are inserted into a priority queue based on their ranking, and the next path to explore is always drawn out of the priority queue. We present two new path prioritization heuristics we have developed: buggy-path-first and loop exhaustion.

- Buggy-Path-First. Exploitable bugs are often preceded by small but unexploitable mistakes. The observation that one bug on a path means subsequent statements are also likely to be buggy (and hopefully exploitable) led us to the buggy-path-first heuristic.
- Loop Exhaustion. The loop-exhaustion strategy gives higher priority to an interpreter exploring the maximum number of loop iterations, hoping that computations involving more iterations are more promising to produce bugs like buffer overflows.

### Environment Modelling: Vulnerability Detection in the Real World

AEG models most of the system environments that an attacker can possibly use as an input source. Therefore, AEG can detect most security relevant bugs in real programs. Our support for environment modeling includes file systems, network sockets, standard input, program arguments, and environment variables. Additionally, AEG handles most common system and library function calls.

## DBA, Exploit-Gen and Verify: 漏洞利用生成

### DBA: Dynamic Binary Analysis

DBA takes in three inputs: 1) the target executable (B<sub>gcc</sub>) that we want to exploit; 2) the path constraints that lead up to the bug (Π<sub>bug</sub>); and 3) the names of vulnerable functions and buffers.

It then outputs a set of runtime information: 1) the address of the return address of the vulnerable function (&retaddr); 2) the address of the vulnerable buffer where the overwrite starts (bufaddr); and 3) the stack memory contents between them (µ).

### Exploit-Gen

Exploit-Gen takes in two inputs to produce an exploit: the unsafe program state containing the path constraints (Π<sub>bug</sub>) and low-level runtime information R.

It generates exploit formulas (Π<sub>bug</sub>∧Π<sub>exploit</sub>) for four types of exploits: 1) stack-overflow return-to-stack, 2) stack-overflow returnto-libc, 3) format-string return-to-stack, 4) format-string return-to-libc.

![img](../pic/8.10_algorithm2.png)

### Verify

VERIFY takes in two inputs: 1) the exploit constraints Π<sub>bug</sub>∧Π<sub>exploit</sub>, and 2) the target binary. It outputs either a concrete working exploit, i.e., an exploit that spawns a shell, or ⊥, if AEG fails to generate the exploit.
-												make reviews clear again

											
										
										
											2018-04-26 20:27:37 +07:00
+								# 8.10 AEG: Automatic Exploit Generation
-												fix

											
										
										
											2018-07-13 23:31:51 +07:00
+								[paper](http://security.ece.cmu.edu/aeg/aeg-current.pdf)
-												fix

											
										
										
											2018-06-17 16:46:53 +07:00
-												make reviews clear again

											
										
										
											2018-04-26 20:27:37 +07:00
+								## 简介
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								- 这篇论文向我们展示了如何将控制流劫持攻击的漏洞利用生成模型化为一个形式化验证问题。
 								- 提出了预处理符号执行，这是一种新的符号执行技术。
 								- 提出了一种通用的漏洞利用生成方法。
 								- 构建了 AEG，这是第一个能够自动发现漏洞并生成利用获得 shell 的端到端系统。
-												make reviews clear again

											
										
										
											2018-04-26 20:27:37 +07:00
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								### 挑战及解决方案
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								- Source code analysis alone is inadequate and insufficient.
 								  - We combine source-code level analysis to improve scalability in finding bugs and binary and runtime information to exploit programs.
 								- Finding the exploitable paths among an infinite number of possible paths
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								  - We have developed preconditioned symbolic execution, a novel technique which targets paths that are more likely to be exploitable.
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								  - We have developed a priority queue path prioritization technique that uses heuristics to choose likely more exploitable paths first.
 								- An end-to-end system.
 								  - Our AEG implementation is a single command line that analyzes source code programs, generates symbolic execution formulas, solves them, performs binary analysis, generates binary-level runtime constraints, and formats the output as an actual exploit string that can be fed directly into the vulnerable program.
-												make reviews clear again

											
										
										
											2018-04-26 20:27:37 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								## AEG 概述
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								The example application is the setuid root `iwconfig` utility from the Wireless Tools package, a program consisting of about 3400 lines of C source code.
-												make reviews clear again

											
										
										
											2018-04-26 20:27:37 +07:00
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								![img](../pic/8.10_iwconfig.png)
-												make reviews clear again

											
										
										
											2018-04-26 20:27:37 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								`iwconfig` has a classic strcpy buffer overflow vulnerability in the get info function (line 15). And our system goes through the following analysis steps:
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+. AEG searches for bugs at the source code level by exploring execution paths. Specifically, AEG executes `iwconfig` using symbolic arguments (`argv`) as the input sources.
 . After following the path `main → print_info → get_info`, AEG reaches line 15, where it detects an out-of-bounds memory error on variable `ifr.ifr_name`. AEG solves the current path constraints and generates a concrete input that will trigger the detected bug.
 . AEG performs dynamic analysis on the `iwconfig` binary using the concrete input generated in step 2. It extracts runtime information about the memory layout, such as the address of the overflowed buffer (`ifr.ifr_name`) and the address of the return address of the vulnerable function (`get_info`).
 . AEG generates the constraints describing the exploit using the runtime information generated from the previous step: 1) the vulnerable buffer (`ifr.ifr_name`) must contain our shellcode, and 2) the overwritten return address must contain the address of the shellcode. Next, AEG appends the generated constraints to the path constraints and queries a constraint solver for a satisfying answer.
 . The satisfying answer gives us the exploit string. Finally, AEG runs the program with the generated exploit and verifies that it works. If the constraints were not solvable, AEG would resume searching the program for the next potential vulnerability.
 								## 形式化建模
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								At its core, the automatic exploit generation (AEG) challenge is a problem of finding program inputs that result in a desired exploited execution state.
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								![img](../pic/8.10_input.png)
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
 								In AEG, we are only concerned with the subset of unsafe states that are exploitable, represented by the circle labeled Π<sub>bug</sub>∧Π<sub>exploit</sub>. The intuition is that preconditioned symbolic execution limits the space searched to a smaller box.
 								Logically, we would be guaranteed to find all possible exploits when Π<sub>prec</sub> is less restrictive than the exploitability condition: Π<sub>bug</sub>(x)∧Π<sub>exploit</sub>(x) ⇒ Π<sub>prec</sub>(x).
 								## 实现方法
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								Our approach to the AEG challenge consists of six components: Pre-Process, Src-Analysis, Bug-Find, DBA, Exploit-Gen, and Verify.
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								![img](../pic/8.10_design.png)
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
 								- Pre-Process: src → (B<sub>gcc</sub>, B<sub>llvm</sub>).
 								  - The source program (src) is compiled down to 1) a binary B<sub>gcc</sub>, for which AEG will try to generate a working exploit and 2) a LLVM bytecode file B<sub>llvm</sub>, which will be used by our bug finding infrastructure.
 								- Src-Analysis: B<sub>llvm</sub> → max.
 								  - AEG analyzes the source code to generate the maximum size of symbolic data `max` that should be provided to the program. AEG determines `max` by searching for the largest statically allocated buffers of the target program.
 								- Bug-Find: (B<sub>llvm</sub>, φ, max) → (Π<sub>bug</sub>, V).
 								  - Bug-Find takes in LLVM bytecode B<sub>llvm</sub> and a safety property φ, and outputs a tuple (Π<sub>bug</sub>, V) for each detected vulnerability. Π<sub>bug</sub> contains the path predicate. V contains source-level information about the detected vulnerability.
 								- DBA: (B<sub>gcc</sub>, (Π<sub>bug</sub>, V)) → R.
 								  - DBA performs dynamic binary analysis on the target binary B<sub>gcc</sub> with a concrete buggy input and extracts runtime information R.
 								- Exploit-Gen: (Π<sub>bug</sub>, R) → Π<sub>bug</sub>∧Π<sub>exploit</sub>.
 								  - Exploit-Gen receives a tuple with the path predicate of the bug (Π<sub>bug</sub>) and runtime information (R), and constructs a formula for a control flow hijack exploit. The output formula includes constraints ensuring that: 1) a possible program counter points to a user-determined location, and 2) the location contains shellcode (Π<sub>exploit</sub>). The resulting exploit formula is the conjunction of the two predicates.
 								- Verify: (B<sub>gcc</sub>, Π<sub>bug</sub>∧Π<sub>exploit</sub>) → {ε, ⊥}.
 								  - Verify takes in the target binary executable B<sub>gcc</sub> and an exploit formula Π<sub>bug</sub>∧Π<sub>exploit</sub>, and returns an exploit ε only if there is a satisfying answer. Otherwise, it returns ⊥.
 								The high-level algorithm for solving the AEG challenge:
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								![img](../pic/8.10_algorithm.png)
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
 								## Bug-Find: 以漏洞利用生成为目的的程序分析
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								Bug-Find finds bugs with symbolic program execution, which explores the program state space one path at a time. However, there are an infinite number of paths to potentially explore. AEG addresses this problem with two novel algorithms:
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								- First, we present a novel technique called preconditioned symbolic execution that constrains the paths considered to those that would most likely include exploitable bugs.
 								- Second, we propose novel path prioritization heuristics for choosing which paths to explore first with preconditioned symbolic execution.
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								### Preconditioned Symbolic Execution
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								Preconditioned symbolic execution is a novel method to target symbolic execution towards a certain subset of the input state space. The state space subset is determined by the precondition predicate (Π<sub>prec</sub>); inputs that do not satisfy Π<sub>prec</sub> will not be explored.
 								In AEG, we have developed and implemented 4 different preconditions for efficient exploit generation:
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								- None. There is no precondition and the state space is explored as normal.
 								- Known Length. The precondition is that inputs are of known maximum length. We use static analysis to automatically determine this precondition.
 								- Known Prefix. The precondition is that the symbolic inputs have a known prefix.
 								- Concolic Execution. Concolic execution can be viewed as a specific form of preconditioned symbolic execution where the precondition is specified by a single program path as realized by an example input.
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								![img](../pic/8.10_bof.png)
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
 								Consider the example program above. Suppose that the `input` buffer contains 42 symbolic bytes. Lines 3-4 represent a tight symbolic loop that will eventually spawn 42 different interpreters with traditional symbolic execution, each one having a different path predicate. Each path predicate will describe a different condition about the string length of the symbolic `input` buffer.
 								Preconditioned symbolic execution avoids examining the loop iterations that will not lead to a buffer overflow by imposing a length precondition. Thus, we only need a single interpreter to explore the entire loop.
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								### Path Prioritization: Search Heuristics
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								All pending paths are inserted into a priority queue based on their ranking, and the next path to explore is always drawn out of the priority queue. We present two new path prioritization heuristics we have developed: buggy-path-first and loop exhaustion.
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								- Buggy-Path-First. Exploitable bugs are often preceded by small but unexploitable mistakes. The observation that one bug on a path means subsequent statements are also likely to be buggy (and hopefully exploitable) led us to the buggy-path-first heuristic.
 								- Loop Exhaustion. The loop-exhaustion strategy gives higher priority to an interpreter exploring the maximum number of loop iterations, hoping that computations involving more iterations are more promising to produce bugs like buffer overflows.
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								### Environment Modelling: Vulnerability Detection in the Real World
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
 								AEG models most of the system environments that an attacker can possibly use as an input source. Therefore, AEG can detect most security relevant bugs in real programs. Our support for environment modeling includes file systems, network sockets, standard input, program arguments, and environment variables. Additionally, AEG handles most common system and library function calls.
 								## DBA, Exploit-Gen and Verify: 漏洞利用生成
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
 								### DBA: Dynamic Binary Analysis
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								DBA takes in three inputs: 1) the target executable (B<sub>gcc</sub>) that we want to exploit; 2) the path constraints that lead up to the bug (Π<sub>bug</sub>); and 3) the names of vulnerable functions and buffers.
 								It then outputs a set of runtime information: 1) the address of the return address of the vulnerable function (&retaddr); 2) the address of the vulnerable buffer where the overwrite starts (bufaddr); and 3) the stack memory contents between them (µ).
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								### Exploit-Gen
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
+								Exploit-Gen takes in two inputs to produce an exploit: the unsafe program state containing the path constraints (Π<sub>bug</sub>) and low-level runtime information R.
 								It generates exploit formulas (Π<sub>bug</sub>∧Π<sub>exploit</sub>) for four types of exploits: 1) stack-overflow return-to-stack, 2) stack-overflow returnto-libc, 3) format-string return-to-stack, 4) format-string return-to-libc.
-												use markdownlint

											
										
										
											2018-08-05 16:43:10 +07:00
+								![img](../pic/8.10_algorithm2.png)
 								### Verify
-												finish 8.10

											
										
										
											2018-06-12 19:15:50 +07:00
 								VERIFY takes in two inputs: 1) the exploit constraints Π<sub>bug</sub>∧Π<sub>exploit</sub>, and 2) the target binary. It outputs either a concrete working exploit, i.e., an exploit that spawns a shell, or ⊥, if AEG fails to generate the exploit.