COMPILER

The Day Hello World First Compiled

A walkthrough of the exact pipeline that has to execute correctly for a Vexel source file to become a running binary, and what it looked like when it worked for the first time.

After about three weeks of building the Vexel compiler in the evenings, I had a lexer, a parser, a partially working type checker, and a half-finished LLVM IR emitter. None of it was connected end-to-end. None of it had ever produced a working binary.

I decided to try to make Hello World compile. Just that. The simplest possible program. If I could make that work, the whole architecture was probably sound.

Step 1: Lexing

The first step is tokenizing the source text. The lexer reads characters one by one and groups them into meaningful units called tokens. For a Hello World program:

fn main():
    print("Hello, World!")

The lexer produces a token stream that looks like this:

KW_FN       'fn'
IDENT       'main'
LPAREN      '('
RPAREN      ')'
COLON       ':'
NEWLINE
INDENT
IDENT       'print'
LPAREN      '('
STRING      '"Hello, World!"'
RPAREN      ')'
NEWLINE
DEDENT
EOF

The INDENT and DEDENT tokens are how Vexel handles Python-style indentation. The lexer tracks indentation levels and synthesizes these tokens so the parser can treat indentation as structure without needing to count spaces itself.

Step 2: Parsing

The parser reads the token stream and builds an Abstract Syntax Tree (AST). Each node in the tree represents a language construct: a function definition, a call expression, a string literal. The tree for Hello World looks like:

Program
  FunctionDef name='main' params=[] return_type=None
    CallExpr callee='print'
      StringLiteral value="Hello, World!"

The parser is a hand-written recursive descent parser. Each grammar rule in the language is implemented as a method. parse_statement() calls parse_expression(), which calls parse_call(), and so on.

Step 3: Type Checking

The semantic analyzer walks the AST and checks that every operation is type-correct. For Hello World, the checks are minimal: verify that print is a known builtin, verify that it accepts a str argument, and verify that a fn main() with no parameters and no return type is valid.

The analyzer also annotates each AST node with its resolved type. Those type annotations are used by the code generator in the next step.

Step 4: LLVM IR Generation

The code generator walks the annotated AST and emits LLVM Intermediate Representation. LLVM IR is a typed, low-level language that looks a bit like assembly but is platform-independent. Here is a simplified version of the IR the Vexel compiler emits for Hello World:

; String constant
@str0 = private constant [14 x i8] c"Hello, World!\00"

; External puts declaration
declare i32 @puts(i8* nocapture)

; main function
define i32 @main() {
entry:
  %ptr = getelementptr [14 x i8], [14 x i8]* @str0, i32 0, i32 0
  call i32 @puts(i8* %ptr)
  ret i32 0
}

The string literal becomes a global constant. The print builtin is lowered to a call to the C standard library's puts. The main function gets a pointer to the string and calls it.

Step 5: Native Code

LLVM takes the IR, runs its optimizer passes, and emits a native object file. The linker (MSVC link.exe on Windows, gcc/ld on Linux) combines the object file with the C runtime library and produces the final executable.

The entire pipeline from source to binary happens in under a second for most programs. For JIT mode (vexel run), the linker step is skipped and LLVM executes the IR directly in memory.

The Moment It Worked

The bug that was blocking me was a wrong calling convention. I was emitting fastcc (LLVM's fast calling convention) for the main function, but the Windows C runtime expects standard C calling convention. The result was a segfault on startup, before any user code even ran.

I fixed it by making the compiler always emit ccc (the standard C calling convention) for the top-level main function and for any function that calls external C code.

I ran the compiler. The IR was emitted without errors. The linker ran without errors. The executable appeared in the directory. I typed its name.

$ ./hello
Hello, World!

That was it. Hello World ran. A Vexel source file had become a native binary and executed correctly. I sat and looked at the terminal for a while. Three weeks of evenings and weekends had led to two words on a screen. It was exactly enough.

I spent the next few months building on that foundation: structs, enums, arrays, interfaces, the garbage collector, SDL2 support. Every new feature started from that same pattern. Get the IR right. Get the types right. Make it run.

← Why I Built Vexel All posts