This one is going to be interesting as I'm back at Brainfuck again (oh boy). What can I say? It's just too fun to play around with. Given how simple the language is made out of 8 tokens, it's a great way of learning about expression parsing and compilers in general. Sure you can't be an efficient programmer but that doesn't stop you from understanding the fundamentals.
Because of this, I took a step further with my Brainfuck project and made a compiler out of it that translates the brainfuck code to C. The difference my project had from the plethora of other projects with similar goals is that it generates a more effective C code that is more readable and convincing enough it was written by someone applying brainfuck logic to C.
I have written this compiler in Rust, you can checkout the entire code base yourself. Here I will be discussing the problem solving aspect of this project.
The 8 token constraint has led brainfuck not to use numbers and instead rely on loops in many cases but more often than not you'll see a typical brainfuck code to have repeating characters. This can get easily solved by counting all lexemes and encapsulating it all in one lexeme.
My implementation is on Rust which has a great enumeration system, it allows me to create enums with data. In this way I can store how many a certain character has been encountered.
#[derive(ebug, PartialEq)]
enum Lex {
PtrLeft(u16),
PtrRight(u16),
Inc(u16),
Dec(u16),
Write(u16),
Read(u16),
LoopStart,
LoopEnd
}
With that I started examining every token in that code and assembled similar tokens. This effectively made smaller code that will be converted to C. Unlike brainfuck, C can use constant numbers for which it is important to compress the code to avoid repetition of statements. As a result the conversion turns out to be:
">>>" -> [gen_lex] -> "PtrRight(3)"
This was achieved by run-length encoding which is a very primitive algorithm to compress data, using arrays this would be easy but I also want to write in idiomatic Rust code where iterators are favoured which led to this code.
fn get_count(ch: char, itr: &mut Peekable<Chars>) -> u16 {
let mut count = 0;
while itr.next_if(|&cx| ch == cx).is_some() {
count += 1;
}
count
}
Now, on to the fun part. Generating C code, this is where compressing lexemes is handy using this count, we can write an equivalent C code using the count. Using the same constants we can control the pointer and its corresponding value directly, as a result each lexeme will correspond to a statement in C. Leading to something like this.
"PtrRight(3)" -> [gen_lex] -> "ptr += 3;"
But sometimes it is not as direct, to avoid repetition, sometimes loops are important. Take Write lexeme as an example:
Lex::Write(count) => {
if count == &1 {
res.push(format!("{:width$}putchar(*ptr);", " ", width = spaces));
} else {
res.push(format!("\n{:width$}for (int x = 0; x < {}; x++)", " ", count, width = spaces));
res.push(format!("{:width$}putchar(*ptr);\n", " ", width = spaces + 4));
}
},
Here's another part that looks tricky on the surface but is not tricky at all, that is providing indentation to signify the depth of code blocks. But before that, brackets are verified to be paired. Once done, then the lexical analysis and code generation can proceed. I'm not going to explain about the process of bracket matching as all it needed was to use a counter to balance the number of opening and closing brackets.
If you look at the last statement, you'll find formatter {:width$}, this signifies the number of characters to fill in the string, here it is space (" ") which gets incremented/decremented by 4. This gets controlled by the brackets.
Lex::LoopStart => {
res.push(format!("\n{:width$}while (*ptr) {{", " ", width = spaces));
spaces += 4;
},
Lex::LoopEnd => {
spaces -= 4;
res.push(format!("{:width$}}}\n", " ", width = spaces));
},
This wasn't an elaborate article as the program was also small, I made it in a weekend while re-writing my brainfuck interpreter in Rust. I've start to implement some programs in it since I'm now a C++ programmer and this is going to an important language for my career. I prefer Go for overall productivity but for systems programming, Rust is a better option due to superior package management, zero overhead abstraction, and assembly support just like C making Rust its spiritual successor.