Creating a Programming Language

Thu Apr 18, 2024

TL;DR: Feel free to try it out yourself. No need to install anything, it runs entirely in your browser!

Why create a new programming language?

The short answer is that I really wanted a language that was easy to write, is unopinionated, and produces a small and fast WebAssembly binary.

We can go one layer deeper, and we can ask, "well, why WebAssembly?"

That one's a little easier to answer. WebAssembly is a huge step forward for the web.

While the amount of performance that can still be eked out compared to javascript, is remarkably thin, one thing that wasm can beat javascript at is startup time. Wasm has already been parsed AND optimized. All of the types are already known!

Javascript engines typically have multiple tiers of execution engines (note that I am most familiar with the V8 engine, and this description is still fairly high-level, so forgive me if I miss anything that you deem is important). When it first sees new code, it has to parse it, convert that raw text to an internal bytecode, and then start running that bytecode in an interpreter. This can be quite slow. It is useful, though, since compiling to native machine code isn't free. You can think of this like a race, between the interpreter trying to run the code as fast as it can, vs the fast unoptimized compiler trying to compile to native code, AND then run that same code, before the interpreter finishes running it. So there's a tradeoff. if the code is only run a small number of times — or maybe even only once — then the code will finish running in the interpreter before it could finish being compiled to native code. Whereas if the race is long enough, taking some time to get a much faster version can be wildly worth it.

And as that code gets run more, it will eventually get sent to one or more additional layers of optimizing compilers — ones that have learned more about what the code is trying to do from how the previous layers ran it, and can combine that with more intimate knowledge of the host's CPU, to produce a far more optimized version of the native code that the engine runs.

This leads to the well known trick that it's possible to trick the js engine into optimizing your code that you know will run a lot, by running your important code in a loop at startup, tricking the compiler into thinking that that code is hot, and creating an optimized version of it.

Unfortunately, it's a double-edged sword in that, it's just as easy to accidentally break all of the assumptions that the compiler made creating the optimized version, and force it to throw away all of that work, falling back to the interpreter and starting all over again. The typical example of this is the add(x, y) function.

// A simple function that we will use to add 2 numbers together.
function add(x, y) {
    return x + y;
}

for (let i = 0; i < 10000; ++i) {
    // Tell the JS engine that we are using the function to add 2 numbers
    // together. This will cause the JS engine to optimize for the case where
    // the arguments are floating point numbers.
    add(Math.random(), Math.random());
}

// Now throw those optimizations away, since it assumed numbers and we are now
// using it to concatenate strings.
add("string", " concatenation");

"All that is great and all, but why do I care?" you may be wondering. Honestly, you're right. For most typical uses, you don't need to care. It IS fast enough, and that is all that matters.

Maybe your typical web page isn't where you'll find your biggest bang for your buck though.

Now you may say, "there are plenty of other programming languages that can already do that!" That's very true! But I wasn't truly happy that any of those options fulfilled all of those at the same time.

Like Rust. Rust is great. Its claim to fame is that it is safe, and that "if the program compiles, then it probably works." While this is almost always true (see some examples where this isn't true without needing to use the unsafe keyword, like improper lifetime expansion in this github repo).

Instead, (in my head at least) I picture my issue with Rust like this: (mouse over the image below)

I often find myself in the green region. I know that what I want to program is safe, but I don't know how to tell the compiler that it is — to get it into that blue region. This is often referred to as "fighting the borrow checker". It's a common enough problem that there are plenty of memes about it.

Well what about other languages?

C++ is fast, and while not easy to write, it is actually one of my favourite languages. In this case, it's not the language itself, but the tooling around it. Searching the web for how to compile it to WebAssembly, you'll find you're really pushed into using Emscripten. While not strictly needed — you can compile C++ to wasm with LLVM/clang directly — you are highly incentivized to use Emscripten, since otherwise you lose a lot, like the standard library itself!

Unfortunately, as far as I can tell, Emscripten tries to create a full app for you. Specifically, it requires that the C/C++ program have a main function for the app to start from. Then it also tries to emulate or replicate everything standard C++ can do, like some OS functionality (file IO, environment variables, etc.) with no easy way that I could find to opt out of it. And because it included all of these features that I didn't plan to — and didn't want to — use, it caused the wasm binaries to grow by several MB.

I wanted an easier way to decide whether I wanted certain features. And more importantly, for me at least, I didn't want to create a whole app using Emscripten. Instead I wanted to augment an existing app with WebAssembly, call out to it only when I needed to, and not the other way around.

Don't get me wrong, Emscripten is great if you want to port existing C/C++ apps to the web, and don't want to have to rewrite tens/hundreds of thousands or even millions of lines of existing code. Or if you are Adobe porting photoshop.

Coming at it from the other direction, creating something new and small, with a pre-existing javascript codebase but no pre-existing C++ codebase, these tradeoffs felt like a negative.

So if not C++, what about Go? I love go, it was the main language I used while working at Google. It's fast. It's simple. It's easy to write. It's easy to read. Errors are annoying to pass around, but are so much easier to debug than trying to deal with than C++ exceptions jumping through who knows how many layers of function calls, that it's a net win.

But Go is garbage collected. I don't think this is an issue on its own. It's the fact that it needs to bring its own garbage collector with it on every download. The runtime with its garbage collector can be massive for simple programs.

To solve that, why not use TinyGo?

Well, I mean, I probably could. But that still avoids probably the biggest reason I wanted to create a new language:

I thought it would be fun!