HALMAT
The Space Shuttle's compiler language, reconstructed
HALMAT is the intermediate representation used internally by the HAL/S-FC compiler. HAL/S is the programming language of the Space Shuttle. Every line of flight software that ever ran on an orbiter passed through HALMAT on its way to object code for the AP-101 flight computers. It was the language between the language and the machine, and for fifty years nobody outside of IBM Federal Systems Division and the Johnson Space Center had any idea what it actually looked like.
It was never documented. Not publicly, not in any IBM manual, not in any NASA technical report that anyone has been able to find. The compiler source code survived because Ron Burkey and the Virtual AGC project spent years recovering it, scanning it, OCR'ing it, and getting it to compile again. But the intermediate language itself remained a mystery, described only by an approximately 850-line sketch that Ron wrote covering two of the nine instruction classes, with the remaining seven marked as "TODO".
This project fills in the rest. All 180 opcodes across all nine classes, reconstructed from the compiler's own source code by parsing every file, tracing every reference, and verifying against actual compiled binary output. The specification runs to about 4,700 lines, which is roughly five times longer than the document it replaces and covers roughly four and a half times as many instruction classes, depending on how you count the one Ron had partially finished.
How This Happened
The HAL/S compiler is written in XPL/I, which is an extended dialect of XPL, which is itself a strict subset of PL/I, a language IBM designed in 1964. If you are keeping track, that is a compiler for a 1970s real-time language, written in a 1960s systems language, targeting hardware that flew on something with wings and rockets. The source code runs to 630 files spread across seven compiler passes. The last time anyone touched it professionally was probably the 1990s.
Parsers for XPL exist inside compilers, most notably Ron's XCOM-I, which is how HAL/S-FC was compiled in the first place. But there was no standalone, grammar-driven parser built from the XPL BNF that you could just point at a pile of source files and say "tell me what's in there." No npm package. No crate. No library. If you wanted to understand what those 630 files contained, you had to read them yourself, all of them, and hold the whole thing in your head while you did it.
So I wrote a parser. I studied vintage compiler construction from the CBT Tape archive, the community-maintained collection of IBM mainframe software that has been passed around on magnetic tape since the 1970s, back when Arnold Casinghino would physically mail reels to anyone who needed them. The XPL BNF grammar, the XCOM bootstrap compiler, the PL/360 table-driven parser: all CBT Tape. I used those lessons to build an OCaml parser using menhir for LR(1) grammar and ocamllex for tokenisation, and I made it handle every XPL/I extension the HAL/S compiler uses. BASED records, DYNAMIC arrays, LITERALLY macros, spaced comparison operators, the lot.
It parses all 630 files. Every single one. 100%.
What Came Out
Once the parser worked, I turned it loose on the compiler source to extract
every HALMAT opcode definition, cross-reference every opcode against every
compiler pass, and generate a complete specification. The opcodes live as
BIT(16) INITIAL constants with X-prefixed names in a file called
##DRIVER.xpl, which is exactly the kind of name you give
something when you work at IBM in 1976 and job security comes from nobody
else being able to find your code.
I also wrote a binary disassembler. HALMAT output is a stream of 32-bit big-endian words, packed into 1800-word blocks, with operator words and operand words distinguished by a single bit. The operands point into a symbol table, a literal table, or back into the HALMAT stream itself via "virtual accumulators" that reference the operator which produced a given value. The literal table stores numbers in IBM System/360 hexadecimal floating-point format, which predates IEEE 754 by two decades, because naturally the Space Shuttle's compiler stores its constants in a format designed for punch-card batch processing on a machine with 256 kilobytes of core memory.
The disassembler resolves all of it. Symbol table entries from COMMON memory dumps, literal values from the binary literal table, cross-references back into the HALMAT stream. The output reads like annotated pseudocode. You can trace every HALMAT instruction back to the HAL/S source line that produced it.
The Nine Classes
HALMAT organises its 180 opcodes into nine classes. Before this project, two of them were documented. Classes 1 through 7 were not.
- Class 0: Control — Programme structure, flow control, I/O, subscripts, scope. This is the class Ron had mostly documented.
- Class 1: Bit — Bit string operations. Assign, AND, OR, NOT, CAT, type conversions.
- Class 2: Character — String assign, concatenation, type conversions.
- Class 3: Matrix — Matrix arithmetic, transpose, inverse, determinant, identity construction.
- Class 4: Vector — Vector arithmetic, dot product, cross product.
- Class 5: Scalar — Scalar arithmetic, exponentiation, type conversions.
- Class 6: Integer — Integer arithmetic and type conversions.
- Class 7: Conditional — Comparisons across all types, boolean connectives.
- Class 8: Initialisation — Structure and array initialisation, typed constants. Ron had this one partially covered.
The control flow patterns are the real prize. HALMAT encodes FOR loops, IF/ELSE branches, CASE statements, WHILE and UNTIL loops, I/O groups, and function calls as structured sequences of opcodes with TAG fields that link the opening and closing markers. Every pattern documented in the spec has been verified by compiling a targeted HAL/S test programme, disassembling the binary output, running it through a purpose-built pattern analyser, and confirming that the TAG values, operand orders, flow label allocation, and sentinel markers all match.
yaHALMAT
The repository includes yaHALMAT, an emulator that executes HALMAT directly. No AP-101 CPU emulation, no object code generation. It loads the compiler's binary output, decodes the 32-bit word stream, and interprets it. Integer, scalar, vector, and matrix arithmetic all work. Control flow works. Function calls with argument passing and return values work. Character and bit string operations work. Nine test programmes execute and produce correct output, which is nine more than the number of HALMAT emulators that existed before this one, that number being zero.
It is written in C99 and has no dependencies beyond libc and libm, because if you are going to emulate the intermediate representation of a compiler from 1976, you should at least have the decency to write the emulator in something that would compile on hardware from the same decade.
Ron Burkey and Virtual AGC
None of this would exist without Ron Burkey. Ron has spent years on the Virtual AGC project, preserving the source code of the Apollo Guidance Computer and, later, the HAL/S compiler and its associated infrastructure. He recovered the compiler source, got it compiling, wrote the first HALMAT documentation, and made the whole thing publicly available. His work on the Apollo source code alone is one of the great acts of software preservation, and the HAL/S work extends that same care to the Shuttle era.
Ron's existing HALMAT documentation covers Class 0 and Class 8 in detail,
with notes and observations that informed the reconstruction of the remaining
classes. His Python decoder (HALMAT.py) handles basic disassembly
and was a useful reference for understanding the binary format. This project
builds on his foundation. The ~850 lines he wrote were the reason I knew
there was something worth finding in the other seven classes.
The Virtual AGC project is hosted on ibiblio and contains the recovered Apollo and Shuttle-era source code, emulators, scanned documentation, and the tools Ron built to make all of it work again. If you have any interest in the software that flew to the Moon or carried astronauts to orbit, it is one of the most important repositories on the internet.
The CBT Tape
The parser that made this possible was informed by compiler construction techniques from the CBT Tape, the community-maintained collection of IBM mainframe software that Arnold Casinghino started distributing on physical magnetic tapes in 1975. Before GitHub, before SourceForge, before the phrase "open source" was coined, mainframe programmers were sharing assembler macros and compiler tools through the postal service.
The XPL BNF grammar, the XCOM bootstrap compiler, and the PL/360 table-driven parser all came from CBT Tape contributions. Understanding how those vintage compilers worked was essential to building a modern parser that could handle the same language. Fifty years of communal knowledge, passed hand to hand on magnetic tape, and it turns out the techniques still work perfectly well when you rewrite them in OCaml.
Source Material
All compiler source material is from the Virtual AGC project: 630 XPL files across seven compiler passes, Ron's existing documentation, and the regression test suite. The project repository contains the full specification, the emulator, the disassembler, the pattern analyser, and annotated disassembly of compiled HAL/S programmes.
The specification is freely available. The emulator is C99 with no dependencies. The source is on GitHub.