jc Compiler Development Log #0: Intro/January 2020

Published by Jon on

I have a penchant for ambitious projects. The ambition of the project that concerns this blog series will come as no surprise to anyone familiar with my older projects. Nevertheless I thought it was about time I started posting about it.

The jc Programming Language and Compiler

The project itself is technically composed of two parts: the language and the compiler.

The jc programming language is a language designed specifically for games development, targeting the same use cases traditionally handled by high-performance languages such as C and C++. It is intended to be used to implement games and game engines.

The jc compiler is the compiler for the language. It is named with recursive acronym, standing for jc compiler. The compiler will be the main focus of these blog posts, however language design will be touched upon. You can read the first draft version of the language spec here.

Project Origins

At the time of writing I am approximately a bit over halfway through my final year of undergraduate study at university. Over the previous year we were told to start seriously considering what we want to do for a dissertation project (referred to as an honours project). Originally I had intended to do something related to AI, procedural generation or graphics programming but the more I thought about it the less I wanted to do any of those things. This stemmed from a couple of factors. One factor (which could be considered a character flaw) being that I didn’t particularly want to specialise in anything and get pigeonholed in that one specific areas. The other being a feeling of burnout after finishing up my third year projects.

Eventually I ended up stumbling across Jonathan Blow’s fantastic lectures on ideas about a new programming language designed with games development in mind (lectures 1 and 2 can be viewed here and here respectively) and they blew my damn mind. That planted the seeds for the idea to make a new language and compiler for games programmers.

Once I started fourth year I had made up my mind that this was what I wanted to do for my honours project. There was but one obstacle: for any honours project you have to submit a proposal and if the proposal is rejected you can’t do the project for honours. After doing the necessary research and submitting the proposal the project was approved. Most of the first semester was spent getting to grips with a core piece of technology required to drive the project: LLVM.

LLVM

For the uninitiated LLVM is a compiler infrastructure project. It is designed in a modular fashion such that it can have different compiler front-ends for different languages and output machine code for different architectures through the same infrastructure. It achieves this using LLVM IR or LLVM Intermediate Representation, a somewhat assembly-like intermediate language that any LLVM back-end can take and translate into machine code. This IR is produced by a front-end, which takes in the target language and translates it into the IR, which can be done in several different ways using the LLVM project. For my project this meant creating a front-end for my language.

LLVM’s documentation and tutorials are decent but can sometimes be rather obtuse. My job for that first semester was to get to grips with their C++ API for generating IR. I followed the Kaleidoscope tutorials (which can be viewed here), the documentation and a few posts scattered across the internet.

Once it became apparent that middle-to-back-end of the project was completely doable and I was somewhat comfortable with the API I then considered how to implement the front-end. Originally I wanted to write the lexer and parser myself but was strongly advised against this. Even at this early stage of development I must concede that using a lexer-generator and parser-generator was definitely the right choice. I settled on Flex and Bison.

Flex

Flex is a lexer-generator originally designed to replace another lexer-generator called lex. It can be used to break up an input into tokens based on criteria defined in a file, with the file’s format largely compatible with the aforementioned lex. The lexer output by the tool can be in either C or C++.

Bison

Bison is a parser-generator developed by the Free Software Foundation. It generates a parser implementation via an input file. Bison is at the very least partially compatible with another parser generator named Yacc. In the jc compiler Bison is used to create the parser that builds the Abstract Syntax Tree for the program. The output parser of Bison can be in a few different languages including C, and C++.

Final Thoughts

This post is more of an introduction to the project than an actual development log. I am hoping to publish these posts once a month until I stop developing it so I can track the project’s progress. I would also like to mention a tutorial that helped me fuse together these different tools which can be viewed here although it is a little bit out of date now.

Categories: jcLLVM