Programming assignments 2 through 5 will direct you to design and build an interpreter for Cool. Each assignment will cover one component of the interpreter: lexical analysis, parsing, semantic analysis, and operational semantics. Each assignment will ultimately result in a working interpreter phase which can interface with the other phases.

For this assignment, you will be programming in Java and performing lexical analysis.

You may work in a team of two people for this assignment. You may work in a team for any or all subsequent programming assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate.

Goal

For this assignment you will write a lexical analyzer, also called a scanner, using a lexical analyzer generator. You will describe the set of tokens for Cool in an appropriate input format and the analyzer generator will generate actual code. You will then write additional code to serialize the tokens for use by later interpreter stages.

Specification

You must create four artifacts:

  1. A Java program that takes a single command-line argument (e.g., file.cl). That argument will be an ASCII text Cool source file. Your program must either indicate that there is an error in the input (e.g., a malformed string) or emit file.cl-lex, a serialized list of Cool tokens. Your program's main lexer component must be constructed by a lexical analyzer generator. The "glue code" for processing command-line arguments and serializing tokens should be written by hand. If your main function is in the Main class, invoking java Main file.cl should yield the same output as cool --lex file.cl. Your program will consist of a number of Java files.
  2. A plain ASCII text file called readme.txt describing your design decisions and choice of test cases. See the grading rubric. A few paragraphs should suffice.
  3. A plain ASCII text file called references.txt providing a citation for each resource you used (excluding class notes, and assigned readings) to complete the assignment. For example, if you found a Stack Overflow answer helpful, provide a link to it. Additionally, provide a brief description of how the resource helped you.
  4. Testcases good.cl and bad.cl. The first should lex correctly and yield a sequence of tokens. The second should contain an error.

You must use JFlex (or a similar tool or library). Do not write your entire lexer by hand. Parts of it must be tool-generated from regular expressions you provide.

Line Numbers

The first line in a file is line 1. Each successive '\n' newline character increments the line count. Your lexer is responsible for keeping track of the current line number.

Error Reporting

To report an error, write the string

ERROR: line_number: Lexer: message

to standard output and terminate the program. You may write whatever you want in the message, but it should be fairly indicative.

Example erroneous input:
Backslash not allowed \
Example error report output:
ERROR: 1: Lexer: invalid character: \

The .cl-lex File Format

If there are no errors in file.cl your program should create file.cl-lex and serialize the tokens to it. Each token is represented by a pair (or triplet) of lines. The first line holds the line number. The second line gives the name of the token. The optional third line holds additional information (i.e., the lexeme) for identifiers, integers, strings and types. For example, for an integer token the third line should contain the decimal integer value.

Example input:
Backslash not
   allowed
Corresponding .cl-lex output:
1
type
Backslash
1
not
2
identifier
allowed

The official list of token names is:

In general the intended token is evident. For the more exotic names:

The .cl-lex file format is exactly the same as the one generated by the reference compiler when you specify --lex. In addition, the reference compiler (and your upcoming PA3 parser!) will read .cl-lex files instead of .cl files.

Lexical Analyzer Generators

You must use a lexical analyzer generator or similar library for this assignment. In class, we discuss JFlex, a lexical analyzer generator for Java. You will find the documentation to be particularly helpful. Because you are producing a Java program for this assignment, it is highly encouraged that you use JFlex.

For your reference, there exist similar tools for other programming languages:

All of these lexical analyzer generators are derived from lex (or flex), the original lexical analyzer generator for C. Thus you may find it handy to refer to the Lex paper or the Flex manual. When you're reading, mentally translate the C code references.

Commentary

You can do basic testing with something like the following:

Example testing
$ cool --out reference --lex file.cl
$ jflex lexer.flex
$ javac Main.java
$ java Main file.cl
$ diff -b -B -E -w file.cl-lex reference.cl-lex

You may find the reference compiler's --unlex option useful for debugging your .cl-lex files.

Need more testcases? Any Cool file you have (including the one you wrote for PA1) works fine. The contents of cool-examples.zip should be a good start. There's also one among the PA1 hints. You'll want to make more complicated test cases—in particular, you'll want to make negative testcases (e.g., testcases with malformed string constants).

If you are still stuck, you can post on the forum, approach the TAs, or approach the professor.

Video Guides

Wes Weimer has developed a number of Video Guides that you might find helpful. The Video Guides are walkthroughs in which Wes manually completes and narrates, in real time, the first part of a similar assignment — including a submission to his grading server. They include coding, testing and debugging elements.

These videos are considered an outside resource for completing this assignment. Be sure to note these videos in your references.txt if you use them.

Note: Wes's videos use a different submission site from this class.

Reminder: You can watch YouTube videos at 1.5x speed with full audio.

What to Submit for PA2

You must turn in a zip file containing these files:

Your zip file may also contain:

Working In Pairs

You may complete this assignment in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.

Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the teaching assistants will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment.

If you are working in a team, exactly one team member should submit a PA2 zipfile. That submission should include the file team.txt, a one-line, one-word flat ASCII text file that contains the email ID of your teammate. Don't include the @virgnia.edu bit. Example: If ph4u and kaa2nx are working together, ph4u would submit ph4u-pa2.zip with a team.txt file that contains the word kaa2nx. Then ph4u and kaa2nx will both receive the same grade for that submission.

This seems minor, but in the past we've had students fail to correctly format this one word file. Thus you now get a point on this assignment for either formatting this file correctly (i.e., including only a single word that is equal to your partner's UVA email ID) or not including it (and thus not working in a pair).

Legacy Grading

The legacy grading server does not have Java installed. You will need to submit this assignment in C. Be sure to include a Flex or Lex definition file you use for this assignment to receive full credit.

Grading Rubric

PA2 Grading (out of 50 points):