2018-03-30 20:18:28 +07:00
|
|
|
|
# 1.5.9 Linux 内核
|
2018-04-20 20:00:41 +07:00
|
|
|
|
|
|
|
|
|
- [编译安装](#编译安装)
|
|
|
|
|
- [系统调用](#系统调用)
|
|
|
|
|
- [参考资料](#参考资料)
|
|
|
|
|
|
|
|
|
|
## 编译安装
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
我的编译环境是如下。首先安装必要的软件:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
$ uname -a
|
|
|
|
|
Linux firmy-pc 4.14.34-1-MANJARO #1 SMP PREEMPT Thu Apr 12 17:26:43 UTC 2018 x86_64 GNU/Linux
|
|
|
|
|
$ yaourt -S base-devel
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
为了方便学习,选择一个稳定版本,比如最新的 4.16.3。
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
$ mkdir ~/kernelbuild && cd ~/kernelbuild
|
|
|
|
|
$ wget -c https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz
|
|
|
|
|
$ tar -xvJf linux-4.16.3.tar.xz
|
|
|
|
|
$ cd linux-4.16.3/
|
|
|
|
|
$ make clean && make mrproper
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
内核的配置选项在 `.config` 文件中,有两种方法可以设置这些选项,一种是从当前内核中获得一份默认配置:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
$ zcat /proc/config.gz > .config
|
|
|
|
|
$ make oldconfig
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
另一种是自己生成一份配置:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
$ make localmodconfig # 使用当前内核配置生成
|
2018-08-05 16:43:10 +07:00
|
|
|
|
# OR
|
2018-04-20 20:00:41 +07:00
|
|
|
|
$ make defconfig # 根据当前架构默认的配置生成
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
为了能够对内核进行调试,需要设置下面的参数:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
CONFIG_DEBUG_INFO=y
|
|
|
|
|
CONFIG_DEBUG_INFO_REDUCED=n
|
|
|
|
|
CONFIG_GDB_SCRIPTS=y
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
如果需要使用 kgdb,还需要开启下面的参数:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
CONFIG_STRICT_KERNEL_RWX=n
|
|
|
|
|
CONFIG_FRAME_POINTER=y
|
|
|
|
|
CONFIG_KGDB=y
|
|
|
|
|
CONFIG_KGDB_SERIAL_CONSOLE=y
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
`CONFIG_STRICT_KERNEL_RWX` 会将特定的内核内存空间标记为只读,这将阻止你使用软件断点,最好将它关掉。
|
|
|
|
|
如果希望使用 kdb,在上面的基础上再加上:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
CONFIG_KGDB_KDB=y
|
|
|
|
|
CONFIG_KDB_KEYBOARD=y
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
另外如果你在调试时不希望被 KASLR 干扰,可以在编译时关掉它:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
CONFIG_RANDOMIZE_BASE=n
|
|
|
|
|
CONFIG_RANDOMIZE_MEMORY=n
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
将上面的参数写到文件 `.config-fragment`,然后合并进 `.config`:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
$ ./scripts/kconfig/merge_config.sh .config .config-fragment
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
最后因为内核编译默认开启了 `-O2` 优化,可以修改 Makefile 为 `-O0`:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
KBUILD_CFLAGS += -O0
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
编译内核:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
$ make
|
|
|
|
|
```
|
|
|
|
|
|
2018-08-05 16:43:10 +07:00
|
|
|
|
完成后当然就是安装,但我们这里并不是真的要将本机的内核换掉,接下来的过程就交给 QEMU 了。(参考章节4.1)
|
2018-04-20 20:00:41 +07:00
|
|
|
|
|
|
|
|
|
## 系统调用
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
在 Linux 中,系统调用是一些内核空间函数,是用户空间访问内核的唯一手段。这些函数与 CPU 架构有关,x86-64 架构提供了 322 个系统调用,x86 提供了 358 个系统调用(参考附录9.4)。
|
|
|
|
|
|
2018-04-29 21:21:55 +07:00
|
|
|
|
下面是一个用 32 位汇编写的例子,[源码](../src/others/1.5.9_linux_kernel):
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
.data
|
|
|
|
|
|
|
|
|
|
msg:
|
|
|
|
|
.ascii "hello 32-bit!\n"
|
|
|
|
|
len = . - msg
|
|
|
|
|
|
|
|
|
|
.text
|
|
|
|
|
.global _start
|
|
|
|
|
|
|
|
|
|
_start:
|
|
|
|
|
movl $len, %edx
|
|
|
|
|
movl $msg, %ecx
|
|
|
|
|
movl $1, %ebx
|
|
|
|
|
movl $4, %eax
|
|
|
|
|
int $0x80
|
|
|
|
|
|
|
|
|
|
movl $0, %ebx
|
|
|
|
|
movl $1, %eax
|
|
|
|
|
int $0x80
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
编译执行(可以编译成64位程序的):
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
$ gcc -m32 -c hello32.S
|
|
|
|
|
$ ld -m elf_i386 -o hello32 hello32.o
|
|
|
|
|
$ strace ./hello32
|
2018-04-20 20:00:41 +07:00
|
|
|
|
execve("./hello32", ["./hello32"], 0x7ffff990f830 /* 68 vars */) = 0
|
|
|
|
|
strace: [ Process PID=19355 runs in 32 bit mode. ]
|
|
|
|
|
write(1, "hello 32-bit!\n", 14hello 32-bit!
|
|
|
|
|
) = 14
|
|
|
|
|
exit(0) = ?
|
|
|
|
|
+++ exited with 0 +++
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
可以看到程序将调用号保存到 `eax`,并通过 `int $0x80` 来使用系统调用。
|
|
|
|
|
|
2018-05-09 20:37:01 +07:00
|
|
|
|
虽然软中断 `int 0x80` 非常经典,早期 2.6 及以前版本的内核都使用这种机制进行系统调用。但因其性能较差,在往后的内核中使用了快速系统调用指令来替代,32 位系统使用 `sysenter`(对应`sysexit`) 指令,而 64 位系统使用 `syscall`(对应`sysret`) 指令。
|
2018-04-20 20:00:41 +07:00
|
|
|
|
|
2018-05-09 20:37:01 +07:00
|
|
|
|
一个使用 sysenter 的例子:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-05-09 20:37:01 +07:00
|
|
|
|
.data
|
|
|
|
|
|
|
|
|
|
msg:
|
|
|
|
|
.ascii "Hello sysenter!\n"
|
|
|
|
|
len = . - msg
|
|
|
|
|
|
|
|
|
|
.text
|
|
|
|
|
.globl _start
|
|
|
|
|
|
|
|
|
|
_start:
|
|
|
|
|
movl $len, %edx
|
|
|
|
|
movl $msg, %ecx
|
|
|
|
|
movl $1, %ebx
|
|
|
|
|
movl $4, %eax
|
|
|
|
|
# Setting the stack for the systenter
|
|
|
|
|
pushl $sysenter_ret
|
|
|
|
|
pushl %ecx
|
|
|
|
|
pushl %edx
|
|
|
|
|
pushl %ebp
|
|
|
|
|
movl %esp, %ebp
|
|
|
|
|
sysenter
|
|
|
|
|
|
2018-08-05 16:43:10 +07:00
|
|
|
|
sysenter_ret:
|
2018-05-09 20:37:01 +07:00
|
|
|
|
movl $0, %ebx
|
|
|
|
|
movl $1, %eax
|
|
|
|
|
# Setting the stack for the systenter
|
|
|
|
|
pushl $sysenter_ret
|
|
|
|
|
pushl %ecx
|
|
|
|
|
pushl %edx
|
|
|
|
|
pushl %ebp
|
|
|
|
|
movl %esp, %ebp
|
|
|
|
|
sysenter
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
$ gcc -m32 -c sysenter.S
|
2018-05-09 20:37:01 +07:00
|
|
|
|
$ ld -m elf_i386 -o sysenter sysenter.o
|
2018-08-05 16:43:10 +07:00
|
|
|
|
$ strace ./sysenter
|
2018-05-09 20:37:01 +07:00
|
|
|
|
execve("./sysenter", ["./sysenter"], 0x7fff73993fd0 /* 69 vars */) = 0
|
|
|
|
|
strace: [ Process PID=7663 runs in 32 bit mode. ]
|
|
|
|
|
write(1, "Hello sysenter!\n", 16Hello sysenter!
|
|
|
|
|
) = 16
|
|
|
|
|
exit(0) = ?
|
|
|
|
|
+++ exited with 0 +++
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-05-09 20:37:01 +07:00
|
|
|
|
可以看到,为了使用 sysenter 指令,需要为其手动布置栈。这是因为在 sysenter 返回时,会执行 `__kernel_vsyscall` 的后半部分(从0xf7fd5059开始):
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-05-09 20:37:01 +07:00
|
|
|
|
gdb-peda$ vmmap vdso
|
|
|
|
|
Start End Perm Name
|
|
|
|
|
0xf7fd4000 0xf7fd6000 r-xp [vdso]
|
2018-08-05 16:43:10 +07:00
|
|
|
|
gdb-peda$ disassemble __kernel_vsyscall
|
2018-05-09 20:37:01 +07:00
|
|
|
|
Dump of assembler code for function __kernel_vsyscall:
|
|
|
|
|
0xf7fd5050 <+0>: push ecx
|
|
|
|
|
0xf7fd5051 <+1>: push edx
|
|
|
|
|
0xf7fd5052 <+2>: push ebp
|
|
|
|
|
0xf7fd5053 <+3>: mov ebp,esp
|
2018-08-05 16:43:10 +07:00
|
|
|
|
0xf7fd5055 <+5>: sysenter
|
2018-05-09 20:37:01 +07:00
|
|
|
|
0xf7fd5057 <+7>: int 0x80
|
|
|
|
|
0xf7fd5059 <+9>: pop ebp
|
|
|
|
|
0xf7fd505a <+10>: pop edx
|
|
|
|
|
0xf7fd505b <+11>: pop ecx
|
2018-08-05 16:43:10 +07:00
|
|
|
|
0xf7fd505c <+12>: ret
|
2018-05-09 20:37:01 +07:00
|
|
|
|
End of assembler dump.
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-05-09 20:37:01 +07:00
|
|
|
|
`__kernel_vsyscall` 封装了 sysenter 调用的规范,是 vDSO 的一部分,而 vDSO 允许程序在用户层中执行内核代码。关于 vDSO 的内容我们将在后面的章节中细讲。
|
|
|
|
|
|
|
|
|
|
下面是一个 64 位使用 `syscall` 的例子:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
.data
|
|
|
|
|
|
|
|
|
|
msg:
|
|
|
|
|
.ascii "Hello 64-bit!\n"
|
|
|
|
|
len = . - msg
|
|
|
|
|
|
|
|
|
|
.text
|
|
|
|
|
.global _start
|
|
|
|
|
|
|
|
|
|
_start:
|
|
|
|
|
movq $1, %rdi
|
|
|
|
|
movq $msg, %rsi
|
|
|
|
|
movq $len, %rdx
|
|
|
|
|
movq $1, %rax
|
|
|
|
|
syscall
|
|
|
|
|
|
|
|
|
|
xorq %rdi, %rdi
|
|
|
|
|
movq $60, %rax
|
|
|
|
|
syscall
|
|
|
|
|
```
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
编译执行(不能编译成32位程序):
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
|
|
|
|
$ gcc -c hello64.S
|
|
|
|
|
$ ld -o hello64 hello64.o
|
|
|
|
|
$ strace ./hello64
|
2018-04-20 20:00:41 +07:00
|
|
|
|
execve("./hello64", ["./hello64"], 0x7ffe11485290 /* 68 vars */) = 0
|
|
|
|
|
write(1, "Hello 64-bit!\n", 14Hello 64-bit!
|
|
|
|
|
) = 14
|
|
|
|
|
exit(0) = ?
|
|
|
|
|
+++ exited with 0 +++
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
在这两个例子中我们直接使用了 `execve`、`write` 和 `exit` 三个系统调用。但一般情况下,应用程序通过在用户空间实现的应用编程接口(API)而不是直接通过系统调用来编程。例如函数 `printf()` 的调用过程是这样的:
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
|
|
|
|
```text
|
2018-04-20 20:00:41 +07:00
|
|
|
|
调用printf() ==> C库中的printf() ==> C库中的write() ==> write()系统调用
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## 参考资料
|
2018-08-05 16:43:10 +07:00
|
|
|
|
|
2018-04-20 20:00:41 +07:00
|
|
|
|
- [The Linux Kernel documentation](https://www.kernel.org/doc/html/latest/)
|
|
|
|
|
- [linux-insides](https://legacy.gitbook.com/book/0xax/linux-insides/details)
|