How it works
Overview
kemlang-py is a tree-walking interpreter. This section explains exactly how it turns a .jsk source file into running output - from character scanning all the way to executing statements.
What is a programming language, really?
A programming language is a convention. The source file you write is just text - a sequence of Unicode characters sitting on disk. Nothing in the hardware understands bhai bol. The interpreter is the program that reads that text and figures out what to do with it.
Every interpreter or compiler does the same fundamental job: transform source text into behavior. The strategies differ enormously in complexity and performance, but the goal is always the same.
The spectrum of language implementations
Different languages take different approaches to turning source into execution.
Source text
│
▼
┌───────────────────────────────────────────────────────────────────┐
│ COMPILED (C, Rust, Go) │
│ │
│ Source ──▶ Compiler ──▶ Machine code (.exe) ──▶ CPU runs │
│ │
│ + Fastest possible execution (direct CPU instructions) │
│ - Compilation is a separate step before running │
└───────────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────────┐
│ BYTECODE VM (Python, Java, Lua) │
│ │
│ Source ──▶ Compiler ──▶ Bytecode ──▶ VM interprets │
│ │
│ + Faster than tree-walking; portable across platforms │
│ - VM adds complexity; bytecode is an intermediate layer │
└───────────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────────┐
│ TREE-WALKING (kemlang-py, early Ruby, many scripting languages) │
│ │
│ Source ──▶ Lexer ──▶ Parser ──▶ AST ──▶ walk & execute │
│ │
│ + Simplest implementation; easy to debug and extend │
│ - Slowest; each node is re-evaluated on every visit │
└───────────────────────────────────────────────────────────────────┘The pipeline
Every time you run kem run hello.jsk, the source file travels through three sequential stages. Each stage receives the output of the previous one.
What the CLI actually does
source = Path(file).read_text(encoding="utf-8") tokens = Lexer(source).tokenize() # str -> List[Token] ast = Parser(tokens).parse() # tokens -> Program exit_code = Interpreter().interpret(ast) # AST -> stdout + int raise typer.Exit(exit_code)
Stage 1: Lexer
The lexer reads source text one character at a time and groups characters into tokens - the smallest meaningful units of the language. kemlang-py's lexer handles multi-word Gujarati keywords like bhai bol by checking multi-word sequences before single-word keywords.
Stage 2: Parser
The parser takes the flat token stream and builds an Abstract Syntax Tree using recursive descent. Each grammar rule maps to a method; operator precedence is encoded in the grammar stratification.
Deep dive: The ParserStage 3: Interpreter
The interpreter walks the AST recursively. Statement nodes produce side effects; expression nodes return a KemValue. Variable scope is managed through a chain of Environment objects.
Explore each stage
The Lexer
How characters become tokens. Multi-word keywords, the scanning loop, what gets rejected and why.
The Parser
How tokens become an AST. Context-free grammars, recursive descent, operator precedence, the full BNF grammar.
The Interpreter
How the AST gets executed. Tree-walking, environment scopes, control flow via exceptions, and I/O.
Runtime and Types
The five runtime types, truthiness, type coercion, the full execution lifecycle, and error propagation.