本文的灵感来源是, 某次 C++ 课程的 OJ 题出了一道题是写一个 Brainfuck 解释器.
群里某位同学开始吐槽 OJ 样例太弱, 代码不需要优化就能过.
又恰好前几天在 GitHub 上看到 BF JIT 的一个教程实现, 于是决定自己也用 C++ 来实践一下.
我们的原题只是想用 C++ 写个很朴素的 BF 解释器, 现在纯属是杀鸡用屠龙宝刀了.
Brainfuck 语言简介 (From wikipedia)
The language consists of eight commands, listed below. A brainfuck program is a sequence of these commands, possibly interspersed with other characters (which are ignored). The commands are executed sequentially, with some exceptions: an instruction pointer begins at the first command, and each command it points to is executed, after which it normally moves forward to the next command. The program terminates when the instruction pointer moves past the last command.
The brainfuck language uses a simple machine model consisting of the program and instruction pointer, as well as a one-dimensional array of at least 30,000 byte cells initialized to zero; a movable data pointer (initialized to point to the leftmost byte of the array); and two streams of bytes for input and output (most often connected to a keyboard and a monitor respectively, and using the ASCII character encoding).
The eight language commands each consist of a single character:
Character | Meaning |
---|---|
> | Increment the data pointer (to point to the next cell to the right). |
< | Decrement the data pointer (to point to the next cell to the left). |
+ | Increment (increase by one) the byte at the data pointer. |
- | Decrement (decrease by one) the byte at the data pointer. |
. | Output the byte at the data pointer. |
, | Accept one byte of input, storing its value in the byte at the data pointer. |
[ | If the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it forward to the command after the matching ] command. |
] | If the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching [ command. |
As the name suggests, Brainfuck programs tend to be difficult to comprehend. This is partly because any mildly complex task requires a long sequence of commands and partly because the program’s text gives no direct indications of the program’s state. These, as well as Brainfuck’s inefficiency and its limited input/output capabilities, are some of the reasons it is not used for serious programming. Nonetheless, like any Turing complete language, Brainfuck is theoretically capable of computing any computable function or simulating any other computational model, if given access to an unlimited amount of memory. A variety of Brainfuck programs have been written. Although Brainfuck programs, especially complicated ones, are difficult to write, it is quite trivial to write an interpreter for Brainfuck in a more typical language such as C due to its simplicity. There even exist Brainfuck interpreters written in the Brainfuck language itself.
Brainfuck is an example of a so-called Turing tarpit: It can be used to write any program, but it is not practical to do so, because Brainfuck provides so little abstraction that the programs get very long or complicated.
翻译指令序列
Just-in-time compiler, 顾名思义(?), 就是要在运行时将代码翻译成机器码序列执行.
JIT 编译对于我们常用的抽象程度较高的高级语言都十分复杂(如 Hotspot, V8 等), 但对于 Brainfuck 这种 esolang 则十分简单.
Brainfuck 中的每个指令都能被简单地翻译成对应的 C 语言代码, 也能被直接翻译成机器码.
比如 >
可以翻译为 data_pointer++;
, +
可以被翻译为 data[data_pointer]++;
, …
我们现在可以将输入的程序字符串直接替换成机器码, 就完成了程序的翻译.
但你先别急.
比如连续的十个 >
会被直接翻译为十句 data_pointer++;
, 而不是更优化的 data_pointer += 10;
.
我们可以先对 BF 程序做一些简单的优化再将其进行翻译.
具体的实现主要就是将连续的 +-><
指令进行了合并: https://github.com/lyc8503/BrainfuckJIT/blob/master/main.cpp#L64
得到了我们自定义的指令序列之后就可以开始将它们翻译成汇编代码了.
我们使用 rdx
寄存器作为上面提到的数据指针, 那么指令就可以进行如下的对应翻译.
>
和<
1
2add rdx, 1
sub rdx, 1+
和-
1
2add byte [rdx], 1
sub byte [rdx], 1.
和,
使用了 Linux 的系统调用
read
和write
1
2
3
4
5
6
7mov rax, 0 ; `read` syscall no 0
mov rdi, 0 ; 1st arg: int fd, value 0 (stdin)
mov rsi, rdx ; 2nd arg: void* buf, addr rdx
push rdx ; save rdx to stack
mov rdx, 1 ; 3rd arg: size_t count, 1 character
syscall ; do the syscall
pop rdx ; restore rdx1
2
3
4
5
6
7mov rax, 1 ; `write` syscall no 1
mov rdi, 1 ; 1st arg: int fd, value 1 (stdout)
mov rsi, rdx ; 2nd arg: void* buf, addr rdx
push rdx ; save rdx to stack
mov rdx, 1 ; 3rd arg: size_t count, 1 character
syscall ; do the syscall
pop rdx ; restore rdx[
和]
这两个是条件跳转指令, 具体的汇编可以表示如下
1
2
3
4
5cmp byte [rdx], 0
je 0x1
cmp byte [rdx], 0
jne 0x1但在刚读取到前中括号时我们时不知道后括号的位置的, 没法填充后跳转的位置.
不过这个经典问题显然可以使用栈来解决.
在遇到前括号时将当前的代码位置压栈, 遇到对应的后括号的就可以计算出对应的跳转位置.
具体实现: https://github.com/lyc8503/BrainfuckJIT/blob/master/main.cpp#L108
执行机器码
在开始执行整段代码之前我们首先要设置 rdx
,让其指向我们预先准备好的一段内存区域.
1 | // initialize the rdx as the data pointer |
接下来理论上我们只需要使用内联汇编将我们将我们当前程序的 rip
寄存器指向代码的开头, 就可以正确执行代码了.
但由于现在操作系统的安全保护措施, 生成的机器码是不能直接执行的. (否则会直接喜提一个内存违例 SIGSEGV.)
我们需要先向操作系统特别申请一块可以执行的内存空间, 在 Linux 下可以使用 mmap
来实现. 具体参数含义请参考手册.
1 | // allocate rwx memory |
之后我们就将代码复制到这块内存区域, 直接使用函数指针的方式修改 rip
, 当然 C++ 编译器会为我们的函数调用创建一个栈帧, 所以不要忘了在机器码最后加上一个 ret
来正确地返回.
1 | code.push_back(0xc3); // machine code `ret` to exit the func |
具体实现: https://github.com/lyc8503/BrainfuckJIT/blob/master/main.cpp#L21
正确性测试
完成之后能够通过原本 OJ 的所有测试用例.
也可以尝试跑个自解释器~
1 | >>>+[[-]>>[-]++>+>+++++++[<++++>>++<-]++>>+>+>+++++[>++>++++++<<-]+>>>,<++[[>[->>]<[>>]<<-]<[<]<+>>[>]>[<+>-[[<+>-]>]<[[[-]<]++<-[<+++++++++>[<->-]>>]>>]]<<]<]<[[<]>[[>]>>[>>]+[<<]<[<]<+>>-]>[>]+[->>]<<<<[[<<]<[<]+<<[+>+<<-[>-->+<<-[>+<[>>+<<-]]]>[<+>-]<]++>>-->[>]>>[>>]]<<[>>+<[[<]<]>[[<<]<[<]+[-<+>>-[<<+>++>-[<->[<<+>>-]]]<[>+<-]>]>[>]>]>[>>]>>]<<[>>+>>+>>]<<[->>>>>>>>]<<[>.>>>>>>>]<<[>->>>>>]<<[>,>>>]<<[>+>]<<[+<<]<] |
速度测试
尝试了双层自解释器运行 Hello World 程序, 原本的解释器花了 2m18s, 使用 JIT 后花了 2.5s.
1 | >>>+[[-]>>[-]++>+>+++++++[<++++>>++<-]++>>+>+>+++++[>++>++++++<<-]+>>>,<++[[>[->>]<[>>]<<-]<[<]<+>>[>]>[<+>-[[<+>-]>]<[[[-]<]++<-[<+++++++++>[<->-]>>]>>]]<<]<]<[[<]>[[>]>>[>>]+[<<]<[<]<+>>-]>[>]+[->>]<<<<[[<<]<[<]+<<[+>+<<-[>-->+<<-[>+<[>>+<<-]]]>[<+>-]<]++>>-->[>]>>[>>]]<<[>>+<[[<]<]>[[<<]<[<]+[-<+>>-[<<+>++>-[<->[<<+>>-]]]<[>+<-]>]>[>]>]>[>>]>>]<<[>>+>>+>>]<<[->>>>>>>>]<<[>.>>>>>>>]<<[>->>>>>]<<[>,>>>]<<[>+>]<<[+<<]<] |
1 | # 原本 |
在一个 绘制曼德博分型 的程序上我原来的解释器花了 3m2s, JIT 优化后只花了 1s.
1 | # 原本 |
优化空间
目前的输入输出指令是直接使用 Linux 系统调用实现, 而且每次只读取一个字符, 系统调用的开销较大, 对于输入输出较多的程序可以事先读取并实现缓存.
本文采用 CC BY-NC-SA 4.0 许可协议发布.
作者: lyc8503, 文章链接: https://blog.lyc8503.net/post/bf-jit/
如果本文给你带来了帮助或让你觉得有趣, 可以考虑赞助我¬_¬