Intro
I’ve recently started reading “Writing an Interpreter in Go” as part of my journey to sharpen my programming skills. The main goal is actually to redo the book but in different languages like zig and rust.
At the time of writing, I was working on the lexer component and decided to explore performance optimizations Since A lexer is a state machine which essentially splits string into tokens. Based on a few simple branches like take any character, number and underscore for an identifier.
In this post i will explore a few ways to speed up branches i also had the idea to use some SIMD instructions but this was rather hard.
Token splitting
So the main performance optimization is to find consecutive characters which are part of a token. In the examples we are looking for any alphanumeric word.
branches
The most strait forward way of doing this is to have a branch like:
|
|
This however becomes slower the more tokens you are searching for since for a ‘B’ all the checks will need to be done.
Lookup Tables
A lookup table is a very simple way to speed up the code we can just precompute the result for all ascii characters.
|
|
bitvectors
Bit vectors compress the state representation by mapping each ASCII character to a bit. This reduces memory usage while still allowing efficient lookups.
|
|
benchmark
To evaluate the performance of these methods, I benchmarked them on an AMD Ryzen 5 3600 6-Core Processor:
|
|
The code is available here https://github.com/borissmidt/smidt-blog-examples/0004.
Failed Attempts
’naive’ simd
X86 processors have 128 bit MMX registers and 256bit MMY registers, the latest processors even have 512 bit registers. There is a VTEST instruction which test if a mask matches the bit flag. But sadly x86 is missing an instruction to shift the register beyond 64 bits. Also x86 can be quite hard you can not set a value directly to an MMX register, you first have to load the constant in an GP64 register to then load it to an MMX register. (asm source)[https://github.com/borissmidt/smidt-blog-examples/0004/asm/simd_bitset_asm.go]
Since go doesn’t inline assembly i decided to include the search loop into the assembly itself and to return the index of the non matching character. Sadly i didn’t get it to work fully because i’m missing tools to actually debug the assembly itself.
zig
Here’s my working example in Zig. Since zig has arbitrary sized integer types it was trivial to implement the bitvector. But sadly it didn’t use any SIMD registers. To see the compiler output you can copy the code below and open it in godbolt.
|
|
conclusion
Following a book like “Writing an Interpreter in Go” is an enjoyable way to dive into low-level programming. The bit vector approach offers a compelling middle ground, and for bigger sets it is for sure a useful tool.
It was an interesting experience write my own assembly and use it a program. Its nice to see that with go its pretty easy to include assembly in the code.
While the idea of leveraging SIMD turned out to be overly ambitious, it was a fascinating challenge that highlighted just how niche this domain is. Although I didn’t achieve my initial goals, I took my baby steps in assembly and the nuances of performance optimization.
references
Later on i discovered this post, which goes deeper into simd processing and how it is best to normalize characters first with a ‘perfect hash’:
More resources:
- https://github.com/awesome-simd/awesome-simd awesome github lists are always a good start for these kind of things.
- https://github.com/mmcloughlin/avo i used this to write assembly for go.
- https://godbolt.org/ useful to investigate what the compiler generates, i used this to check the 128 bit integers of zig.
- https://www.felixcloutier.com/x86/ to see the x86 assembly
- https://branchfree.org/ : The creator of simd json. Because the writers audience are other simd developers he expects you to know the subject quite deeply. So whenever he states an x86 simd instruction but the ‘alphabet soup’ named PCLMULQDQ doesn’t really make it clear what its used for.
https://branchfree.org/2024/06/09/a-draft-taxonomy-of-simd-usage/
git repo: