PA2 is due 1/24 at 11:50pm.
Programming assignments 2 through 6 will direct you to design and build an optimizing compiler for Cool. Each assignment will cover one component of the interpreter: lexical analysis, parsing, semantic analysis, code generation, and optimization. Each assignment will ultimately result in a working compiler phase which can interface with the other phases.
You may work in a team of up to four people for this assignment. You may work in a team for any or all subsequent programming assignments. You do not need to keep the same teammates. The course staff are not responsible for finding you a willing team.
For this assignment you will write a lexical analyzer, also called a scanner, using a lexical analyzer generator. You will describe the set of tokens for Cool in an appropriate input format and the analyzer generator will generate actual code. You will then write additional code to serialize the tokens for use by later interpreter stages.
You must create three artifacts:
You must use ply or ruby-lex or ocamllex or jison (or a similar tool or library). Parts of it must be tool-generated from regular expressions you provide.
Students have asked previously about creating a lexer from scratch (i.e., without lex). While you are welcome to try, it is not recommended. There are numerous odd corner cases that make the task a significant undertaking without corresponding pedagogical value.
The first line in a file is line 1. Each successive '\n' newline character increments the line count. Your lexer is responsible for keeping track of the current line number.
To report an error, write the string
to standard output and terminate the program. You may write whatever you want in the message, but it should be fairly indicative. Example erroneous input:
Example error report output:
If there are no errors in file.cl your program should create file.cl-lex and serialize the tokens to it. Each token is represented by a pair (or triplet) of lines. The first line holds the line number. The second line gives the name of the token. The optional third line holds additional information (i.e., the lexeme) for identifiers, integers, strings and types. For example, for an integer token the third line should contain the decimal integer value.
Corresponding .cl-lex output:
The official list of token names is:
In general the intended token is evident. For the more exotic names:
The .cl-lex file format is exactly the same as the one generated by the reference compiler when you specify --lex. In addition, the reference compiler (and your upcoming PA3 parser!) will read .cl-lex files instead of .cl files.
You must use a lexical analyzer generator or similar library for this assignment.
All of these lexical analyzer generators are derived from lex (or flex), the original lexical analyzer generator for C. Thus you may find it handy to refer to the Lex paper or the Flex manual. When you're reading, mentally translate the C code references into the language of your choice.
You can do basic testing with something like the following:
For example, if you used OCaml:
You may find the reference compiler's --unlex option useful for debugging your .cl-lex files.
Need more testcases? Any Cool file you have (including the one you wrote for PA1) works fine. The contents of cool-examples.zip should be a good start. There's also one among the PA1 hints. You'll want to make more complicated test cases — in particular, you'll want to make negative testcases (e.g., testcases with malformed string constants).
NOTE: Some of these video guides are from a previous offering of a similar course at the University of Virginia. The assignment for this semester has changed slightly. While they are still relevant, you are responsible for completing the assignment according to this course's grading rubric.
A number of Video Guides are provided to help you get started on this assignment on your own. The Video Guides are walkthroughs in which the instructor manually completes and narrates, in real time, the first part of this assignment — including a submission to the grading server. They include coding, testing and debugging elements.
If you are still stuck, you can post on the forum, approach the TAs, or approach the professor. The use of online instructional content outside of class weakly approximates a flipped classroom model. Click on a video guide to begin, at which point you can watch it fullscreen or via Youtube if desired.
You must turn in a zip file containing these files:
Your zip file may also contain:
You may complete this project in a teams of one, two, three, or four members. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.
Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. All members bear full responsibility for the completion of assignments. One member turns in one solution for each programming assignment; each member receives the same grade for the assignment. Teams may not be dissolved in the middle of an assignment.
If you are working in a team, exactly one team member should submit a PA2 zipfile. That submission should include the file team.txt, a flat ASCII text file that contains the Uniqid of all teammates on separate lines. Don't include the @umich.edu bit. Example: If ph4u and wrw6y are working together, ph4u would submit ph4u-pa2.zip with a team.txt file that contains two lines:
Then ph4u and wrw6y will both receive the same grade for that submission. Alternatively, if kjleach, yhhy, and weimerw are on a team, kjleach can submit a team.txt file containing
kjleach yhhy weimerwand nothing more. If you are not on a team, you do not need to submit a team.txt file.
We have had plenty of students fail to properly format this simple text file. Because of the way autograding is done, it is a significant inconvenience and burden on the staff if you do not properly specify teammates.
PA2 Grading (out of 50 points):