Bit Vectors and my first steps into assembly

Intro

I’ve recently started reading “Writing an Interpreter in Go” as part of my journey to sharpen my programming skills. The main goal is actually to redo the book but in different languages like zig and rust.

At the time of writing, I was working on the lexer component and decided to explore performance optimizations Since A lexer is a state machine which essentially splits string into tokens. Based on a few simple branches like take any character, number and underscore for an identifier.

In this post i will explore a few ways to speed up branches i also had the idea to use some SIMD instructions but this was rather hard.

Token splitting

So the main performance optimization is to find consecutive characters which are part of a token. In the examples we are looking for any alphanumeric word.

branches

The most strait forward way of doing this is to have a branch like:

1
2
3
if '0' <= x && x <= '9' || 'a' <= x && x <= 'z' || 'A' <= x && x <= 'Z' {
	...
}

This however becomes slower the more tokens you are searching for since for a ‘B’ all the checks will need to be done.

Lookup Tables

A lookup table is a very simple way to speed up the code we can just precompute the result for all ascii characters.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// using an 256 sized array to remove bound checks.
var identifier [256]bool
init {
	// populate the identifier
	for x := range 256 {
		if '0' <= x && x <= '9' || 'a' <= x && x <= 'z' || 'A' <= x && x <= 'Z' {
			identifier[x] = true
		}
	}
}

func isIdentifier (x uint8) bool{
  return identifier[x]
}

bitvectors

Bit vectors compress the state representation by mapping each ASCII character to a bit. This reduces memory usage while still allowing efficient lookups.

bit vector

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
type AsciiVector [4]uint64

const mask = 64 - 1

// Set ith bit
func (bv *AsciiVector) Set(i uint8) {
	idx := i >> 6
	bv[idx] |= 1 << (i & mask)
}

func (bv AsciiVector) Get(i uint8) bool {
	idx := i >> 6
	bit := i & (mask)
	return (bv[idx] & (1 << bit)) != 0
}

benchmark

To evaluate the performance of these methods, I benchmarked them on an AMD Ryzen 5 3600 6-Core Processor:

1
2
3
4
5
6
cpu: AMD Ryzen 5 3600 6-Core Processor              
BenchmarkArrayIndex-12                     85222             12510 ns/op
BenchmarkBranch-12                        325195              3767 ns/op
BenchmarkAsciiVector-12                   422432              2910 ns/op
BenchmarkWillfLibBitvector-12             449012              2597 ns/op
BenchmarkLookup-12                        479762              2370 ns/op

The code is available here https://github.com/borissmidt/smidt-blog-examples/0004.

Failed Attempts

’naive’ simd

X86 processors have 128 bit MMX registers and 256bit MMY registers, the latest processors even have 512 bit registers. There is a VTEST instruction which test if a mask matches the bit flag. But sadly x86 is missing an instruction to shift the register beyond 64 bits. Also x86 can be quite hard you can not set a value directly to an MMX register, you first have to load the constant in an GP64 register to then load it to an MMX register. (asm source)[https://github.com/borissmidt/smidt-blog-examples/0004/asm/simd_bitset_asm.go]

Since go doesn’t inline assembly i decided to include the search loop into the assembly itself and to return the index of the non matching character. Sadly i didn’t get it to work fully because i’m missing tools to actually debug the assembly itself.

zig

Here’s my working example in Zig. Since zig has arbitrary sized integer types it was trivial to implement the bitvector. But sadly it didn’t use any SIMD registers. To see the compiler output you can copy the code below and open it in godbolt.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
const std = @import("std");

pub fn main() !void {
    // Prints to stderr (it's a shortcut based on `std.io.getStdErr()`)
    var c : u7 = 'A';
    var mask: u128 = 1;
    for (0..26) |_| {
        var register: u128 = 1;
        register = register << c;
        mask |= register;
        c+=1;
    }


    for (0..127) |v | {
        c = @intCast( v);
        var register: u128 = 1;
        register = register << c;
        register &= mask;

        std.debug.print("is this true? {} {}\n", .{c, register != 0});
    }
}

conclusion

Following a book like “Writing an Interpreter in Go” is an enjoyable way to dive into low-level programming. The bit vector approach offers a compelling middle ground, and for bigger sets it is for sure a useful tool.

It was an interesting experience write my own assembly and use it a program. Its nice to see that with go its pretty easy to include assembly in the code.

While the idea of leveraging SIMD turned out to be overly ambitious, it was a fascinating challenge that highlighted just how niche this domain is. Although I didn’t achieve my initial goals, I took my baby steps in assembly and the nuances of performance optimization.

references

Later on i discovered this post, which goes deeper into simd processing and how it is best to normalize characters first with a ‘perfect hash’:

https://mcyoung.xyz/2023/11/27/simd-base64/

More resources:

https://github.com/awesome-simd/awesome-simd awesome github lists are always a good start for these kind of things.
https://github.com/mmcloughlin/avo i used this to write assembly for go.
https://godbolt.org/ useful to investigate what the compiler generates, i used this to check the 128 bit integers of zig.
https://www.felixcloutier.com/x86/ to see the x86 assembly
https://branchfree.org/ : The creator of simd json. Because the writers audience are other simd developers he expects you to know the subject quite deeply. So whenever he states an x86 simd instruction but the ‘alphabet soup’ named PCLMULQDQ doesn’t really make it clear what its used for.

https://branchfree.org/2024/06/09/a-draft-taxonomy-of-simd-usage/

git repo:

https://github.com/borissmidt/smidt-blog-examples/bitvectors