name mode size
ast 040000
cfg 040000
doc 040000
lalr 040000
lang 040000
lexer 040000
phase1 040000
setup 040000
.gitignore 100644 63B
CMMC.cpp 100644 1.64kB
LICENSING.txt 100644 35.15kB
Makefile 100644 2.06kB
README.txt 100644 3.79kB
gdb-cmm.ini 100644 33B 100755 168B
CMM Compiler 0. Intro The cmm compiler is a project written for the 2015 course "Machines en Berekenbaarheid" at the University of Antwerp. Cmm is a subset of C. This project aims to explore semantic analysis and code generation on parse trees generated by a syntactic analysis stage (specifically by an LALR(1) parser) 1. Install This project requires the LLVM libraries and tools. Here is a quick list of which packages you will need. The project was tested with LLVM 3.6 and 3.7, we cannot guarantee that it will work with any earlier or later versions. - Ubuntu 14.04: llvm-3.6 sudo apt-get intall llvm-3.6 Ubuntu requires a special Makefile and a special file, you can find both in Setup/U1404 cp setup/U1404/* . - Ubuntu 15.10: llvm sudo apt-get intall llvm Ubuntu requires a special Makefile, you can find it in Makefiles/U1510/Makefile. Replace the Makefile in the top level directory with this one. cp setup/U1510/Makefile . - ArchLinux: llvm sudo pacman -S llvm 2. Usage Compile the project by running make. You should end up with an executable called main. You can use this executable to compile files with cmm code in them. You should find some examples in doc/examples. ./main FILENAME If the code contains no errors, a file with llvm IR will be created called FILENAME.ll. This can then be compiled by the llvm back-end to assembly. llc FILENAME.ll llc-3.6 FILENAME.ll (on ubuntu 14.04) Now you have a assembly file called FILENAME.s which you can link into an executable by running gcc or clang on it. Phew, lot of work isn't it? Not to worry, we have created a script that will do this all for you. But wait, there is more, it will also launch gdb so you can run the code and see it executing. Run it with: ./ FILENAME Run single instructions by typing "s" or let it run until the end by typing "c". You can close gdb by typing "quit". If the gdb window is empty, that means something went wrong. Close gdb and look at the shell output. You can clean out the object files and executables with: make clean And you can remove all the files that were created by the cmm compiler by running: make mrproper 3. Overview of the implementation Input is tokenized by the lexer. The lexer is implemented as a partly automatically generated DFA. The token stream is then passed onto the LALR(1) parser (which is fully automatically generated) which checks the syntactic correctness and builds a parse tree. The parse tree will then be converted to an AST (a decorated but more concise version of the parse-tree). The AST will then do the required semantic analysis and will generate an LLVM IR representation of itself. The LLVM IR is now compiled by the llvm back-end to assembly language (this is done through an llvm tool called llc). Sadly we do not know how this bit is implemented, but it's probably MAGIC. The generated assembly can now be compiled into an executable by any linker. 4. Contributors Sebastiaan De Peuter <> - LALR(1) parser generator (phase 1) - AST - CFG - llvm code generation Christophe Verdonck <> - LALR parse-tree -> AST - parse-tree (phase 1) - AST Stein De Groof <> - Lexer - llvm code generation - AST - CYK (phase 1) Jeroen Verstraelen <> - LALR parse-tree -> AST - CNF (phase 1)