Go Type Tagging

2023-11-04 2180 words 11 minutes

Contents

Intro

Mixing up the values passed to functions and structs are a common mistake in many programming languages. This happens even more often with languages that have dynamic type system.

Here is a trivial example which illustrates this problem:

type User struct {
    UserName string
    Email string
    Password string
}

func NewUser(
    UserName string,
    Email string,
    Password string,
) User {
	return User {
			UserName,
			// The Password is passed as an Email
			Password,
			Email,
	}
}

In this post we leverage the system to handle these kind of bugs and see which other advantages this approach offers. Don’t worry we will not go fully function or include any haskell in this post.

Custom Types

In dynamic languages, we rely on unit tests to catch type-related errors. This often leads to a substantial amount of repetitive test code, including mocks, to ensure type safety.

However, statically typed languages offer a more elegant solution by introducing a distinct type for each value with a unique meaning. This effectively shifts the responsibility of the type checking from unit tests to the compiler.

type (
    UserName string
    Email string
    Password string
)


type User struct {
    Name UserName
    Email Email
    Password Password
}

By defining custom types in the first statement, it is impossible to mix up UserName, Email and Password, unless you explicitly cast the values. This approach enhances the readability by documenting the meaning of the value and traces all occurrences of the types throughout the code. Additionally, it promotes consistent naming of values in the code. In some context you might name the UserName type an Name like in the struct above. But the type makes it clear we are talking about a UserName.

Helper Methods

In Go we can define helper methods for the custom types which are only relevant to the type. Since they are methods they are easier to discover using the IDE and the code gets more organized as a result.

In this example we can avoid leaking the password in the logging. We also define a helper method on Email.

type (
    Password string
    Email string
    MailServer string
)

func (Password) String() string {
        return "***"
}

func main() {
	// prints ***
	fmt.Printf("%s", Password("hello-world"))
}

func (self Email) MailServer() MailServer {
	return MailServer(strings.Split(string(self), "@")[1])
}

By adding a String() method to Password the password will be printed as *** obfuscating the value. The MailServer() helper method allows you to extract the mail server from the Email address.

Validation of Custom Types

Instead of directly constructing custom types we can use constructors, similar to structs. These constructors can enforce the validation of the type and return an error when invalid data is provided. This catches errors early on, preventing hard to debug errors.



func NewEmail(email string) (Email, error) {
	if regexp.Match("^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$", []byte(email)){
		return Email(email), nil
	}
	return errors.New("Invalid email")
}

Enforcing Data Validation when Serializing

With data validation in place, we can further enhance the data integrity by catching invalid data during deserialization. This ensures no invalid data will be processed by the rest of the system.

In Go, we can define a custom JSON deserializer for our custom types. By using our constructor inside this deserializer we can handle the error during deserialization.

type User struct {
    Name UserName
    Email Email
    Password Password
}

// The UnmarshalJson has to use a pointer receiver
//  because we have to assign to it.
func (self *Email) UnmarshalJSON(data []byte) error {
	var email string
	json.Unmarshal(data, &email)
	email, err := NewEmail(email)
	if err != nil {
		// assign the value to the Email pointer.
		*self = email
	}
	return err
}

func main() {
	var user User
	err := json.Unmarshal([]byte(`{"email" : "not an email", "name": "john", "password", "123" }`), &user)
	if err != nil {
		// we will log the "Invalid email" error.
		slog.Error("Invalid User Data", "error", err)
	} else {
        fmt.Println("Valid User Data:", user)
    }
}

In this example, the func (self *Email) UnmarshalJson(data []byte) error method is implemented which performs the deserialization fo the Email type.

The main advantage is that we can catch error at the start and don’t have to trace back where the invalid value was created. Making the application more robust.

Type tagging

A more advanced problem is shown in the following example:

const(
	prodRequirementsFile RequirementsFile = "prod.txt"
	testRequirementsFile RequirementsFile = "test.txt"

	prodLockFile LockFile = "prod-lock.txt"
	testLockFile LockFile = "test-lock.txt"
)

type Requirement struct {
	name Name
}
type Requirements {
	[]Requirement
}

type Dependency struct {
	name Name
	// is an exact version 0.2.0 for example
	version Version
}

type Dependencies {
	[]Dependency
}

// a file containing the resolved dependencies
type LockFile string

// Loads the dependencies from the lockfile
func (*self LockFile) Load() Dependencies{}

// Saves the dependencies to the lockfile.
func (*self LockFile) Store(d Dependencies) {}

// marks a file which contains the requirements
type RequirementsFile string

// reads and parses the file
func (*self RequirementsFile)Load() Requirements{}

//  resolves requirements into specific depedencies
func ResolveDependencies(req Requirements) Dependencies {}

//  resolves reads the requirements files and writes out the dependencies into the lock file
func UpdateLockFile(req RequirementsFile, lockFile) error {}

func main() {
	// Oops mixed up the files!
	UpdateLockFile(prodRequirementsFile, testLockFile)
	UpdateLockFile(testRequirementsFile, prodLockFile)
}

In the above example we are reusing the same type but for different things. This makes sense because both the Prod and Test Environment have dependencies and lock files. We could follow our earlier advice and make for each instance a Custom Type and just copy past functionality of that type. But this makes the code hard to maintain.

type Requirements {
	[]Requirement
}

type TestRequirements {
	// embed the type 
	Requirements
}

type ProdRequirements {
	// embed the type 
	Requirements
}
// copy a function but let it call the method on requirements.

We could just handle it at runtime by introducing an ’enum’ value which we then add as a ’tag’ to each type which has a specific context:

type Environment int
const (
	Test Environment = 1
	Prod Environment = 2
)

type Requirements {
	[]Requirement
	env Environment
}

type Dependencies {
	[]Dependency
	env Environment
}

But now we have to check this enum at runtime to make sure we have a dependency/ requirement pair with the same enum value.

If only we could use a similar trick but fully at compile time to avoid a faulty program. Lucky for us with generics this is possible.

Context dependent types

First we want to define the enum above in a way that the compiler understands it. This is possible by creating an Environment as a context type and Test/Prod as its instance types.

type (
	//Note that we use the empty `interface{}` since we never intend to instantiate it.
	Environment interface{}
	Test        Environment
	Prod        Environment
)

Then just like the enum example we start tagging each type which is context dependent with a tag. This we can do by introducing a generic parameter with the type Environment.

type Requirements[T Environment] struct {
	Requirements []Requirement
}

// A set of dependencies are also context dependent
type Dependencies[T Environment] struct {
	Dependencies []Dependency
}

Now we can add a type tag onto the LockFile and RequirementsFile to make it possible to add the tag to the value when we read in the file.

const (
	// we tag each file to make sure we don't mix up the contexts
	prodRequirementsFile RequirementsFile[Prod] = "prod.txt"
	testRequirementsFile RequirementsFile[Test] = "test.txt"

	prodLockFile LockFile[Prod] = "prod-lock.txt"
	testLockFile LockFile[Test] = "test-lock.txt"
)

// The requirements file now has a context.
type RequirementsFile[T Environment] string

// when loading a requirements file we Pass the context to the requirements Type.
func (self *RequirementsFile[T]) Load() Requirements[T] { return Requirements[T]{} }


// a file containing the resolved dependencies
type LockFile[T Environment] string

// When you load a lock file it is only relevant to a single context.
func (self *LockFile[T]) Load() Dependencies[T] { return Dependencies[T]{} }

// When you store the resolved Dependencies you should save it to the correct file.
func (self *LockFile[T]) Store(t Dependencies[T]) error { return nil }

To make sure keep the context when resolving the function we add a type tag to the ResolveDependencies function which now passes the tag to the Dependencies type:

// Pass the context fom the Requirements to Dependencies
func ResolveDependencies[T Environment](req Requirements[T]) Dependencies[T] {
	return Dependencies[T]{}
}

Now the last piece of the puzzle to solve the mistake we had before is to add a type tag to the UpdateLockFile`` function. Now we can only call UpdateLockFile[Test]which then acceptsRequirementsFile[Test]andLockFile[Test]`. Making sure we don’t mix up the types:

// Each update lock file is context dependent. We now cannot pass the wrong requirements lockfile combination.
func UpdateLockFile[T Environment](req RequirementsFile[T], lockFile LockFile[T]) error {
	//  ...
	//  reads the requirements file, resolves the dependencies and stores it in the lock.
	return nil
}

func main() {
	// cannot mix up the lock file of the different Enviroments
	UpdateLockFile(prodRequirementsFile, prodLockFile)
	// this will give a compilation error
	UpdateLockFile(testRequirementsFile, testLockFile)
}

The advantage of this solution is that the compiler will keep track of the tag for us. Unlike the enum solution we will catch bugs at compile time and remove the overhead of runtime checks. Compared to the custom type solution we can now reuse our types directly.

One disadvantage is that any function which deals with a tagged type has to be context aware and thus have a generic parameter.

Context specific functions

Now we can take it a step further just like the helper methods for a custom type. We can add specialized functions which are only valid for in a specific context.


// Lets introduce a new type:
type Container[T] struct {}


func CreateContainer[T Environment](dependencies Dependencies[T]) Container[T] {
// creates a container with the dependencies installed we take the Environment tag with us. 
}

// This only accepts a container with the test dependencies installed.
func RunTests(c Container[Test]) TestResults{}

// This only accepts a container with the Prod dependencies installed.
func Publish(c Container[Prod]) error {}

func main() {
	testDeps :=testLock.read()
	container := CreateContainer(testDeps)

	prodDependencies := prodLock.read()
	prodContainer := CreateContainer(prodDependencies)

	// can't pass the wrong container in this case
	testresults := RunTests(container)
	if testresults.Failed() {
		log.Fatal(testresults)
	}

	// can't pass the wrong container in this case
	publish(prodContainer)
}

The Container[T Enviroment] shows how easy it is to define a new context dependent variable and extend the existing system.

The RunTests function only takes Container in the Test Environment. This makes it impossible to pass a container without the tests dependencies to the test function.

The Publish function can only publish the container with the Prod context avoiding us to accidentally publish the a ‘Test container’.

So now we extended both contexts with a ‘context dependent function’ which cannot be used by the other context.

Here is are some other examples on how we can share 1 function between 2 context but not 3.

// only allows the function in a single context
func OneContext(container Container[Test]) {}

// can be used in the Test and Prod context
func TwoContexts[T Test|Prod](container Container[T]) {}

// can be used in any context with the type Enviroment
func AllEnviromentContexts[T Enviroment](container Container[T]) {}

// Can be used in any context
func AnyContext[T any](container Container[_]) Other[T any] {}

// The context is completely ignored.
func AllEnviromentContexts[_](container Container[_]) {}

Extensibility

It is rather easy to introduce another context by doing:

type Debug Enviroment

Now we can reuse all existing functions but have a new context with its own context specific functions.

Conclusion

Custom types and type tagging are powerful tools to make sure the compiler/type system catches any mistake we make at compile time. The added overhead of maintaining these types is more than made up by the time not spend on writing unit tests, thus speeding up our development cycle.

Custom Types:

Allow you to define each separate value with its own type. Avoiding you to pass the wrong value to the wrong type.
Helper methods allow you to map from one type to another custom type.
Helper methods give you type specific operations which are only relevant to that type.
Constructors allow you to set constraints to the values of that Type avoiding having invalid values.
The value can also be validated during deserialization catching invalid values as soon as they enter the program.

Type Tagging

Type tagging avoids mixing up the same value but in a different context.
Allows for code reusing for common types. Instead of creating a type for each value + context combination.
You can also limit certain functions to certain contexts.
Type tagging is ‘Free’ since we never instantiate a context value.

Custom types are easy to rollout and can catch most of the type errors by making sure each value has its own specific type. It also gives a lot of added benefits which are easy to implement in go.

Type tagging is more advanced and you have to be sure that this is the right solution for the problem. But it has as an advantage that you can reuse more code and don’t have to introduce as many types as when you use Custom Types.