CS4610 PA4 - Kevin Angstadt :: Teaching

Programming assignments 2 through 5 will direct you to design and build an interpreter for Cool. Each assignment will cover one component of the interpreter: lexical analysis, parsing, semantic analysis, and operational semantics. Each assignment will ultimately result in a working interpreter phase which can interface with the other phases.

You will complete this assignment using JavaScript (NodeJS) and implement the semantic analysis component of an interpreter.

You may work in a team of two people for this assignment. You may work in a team for any or all subsequent programming assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate.

Goal

For this assignment you will write a semantic analyzer. Among other things, this involves traversing the abstract syntax tree and the class hierarchy. You will reject all Cool programs that do not comply with the Cool type system.

This assignment is broken into three parts (tests, checkpoint, final submission). First, you will construct a test suite for your semantic analyzer. Next, you will perform a static checks on Cool ASTs to rule out invalid programs. Because this is a rather large task, there is a checkpoint before the final submission. You should work on test suite construction and implementation in parallel.

Specification

You must create four artifacts:

A program that takes a single command-line argument (e.g., file.cl-ast). That argument will be an ASCII text Cool abstract syntax tree file (as described in PA3). Your program must either indicate that there is an error in the input (e.g., a type error) or emit file.cl-type, a serialized Cool abstract syntax tree, class map, implementation map, and parent map. If your program is called checker.js, invoking nodejs checker.js file.cl-ast should yield the same output as cool --type file.cl. Your program will consist of a number of JavaScript files.
A plain ASCII text file called readme.txt describing your design decisions and choice of test cases. See the grading rubric. A few paragraphs should suffice.
A plain ASCII text file called references.txt providing a citation for each resource you used (excluding class notes, and assigned readings) to complete the assignment. For example, if you found a Stack Overflow answer helpful, provide a link to it. Additionally, provide a brief description of how the resource helped you.
Testcases good.cl, bad1.cl, bad2.cl, and bad3.cl. The first should pass the semantic analysis stage. The remaining three should yield semantic analysis errors.

PA4t: Creating PA4 Tests

PA4t is a preliminary testing exercise that introduces a form of test-driven development or mutation testing into our software development process and requires you to construct a high-quality test suite.

The goal of PA4t is to leave you with a high-quality test suite of Cool programs that you can use to evaluate your own PA4 type checker. Writing a type checker requires you to consider many corner cases when reading the formal and informal typing rules in the Cool Reference Manual. While you you can check for correct "positive" behavior by comparing your typechecker's output to the reference compiler's output on existing "good" Cool programs, it is comparatively harder to check for "negative" behavior (i.e., correctly reporting ill-typed Cool programs).

If you fail to construct a rich test suite of syntactically-valid but semantically-invalid programs you will face a frustrating series of "you fail held-out negative test x" reports for PA4 proper, which can turn into unproductive guessing games. Because students often report that this is frustrating (even though it is, shall we say, infinitely more realistic than making all of the post-deployment tests visible in advance), the PA4t preliminary testing exercise provides a structured means to help you get started with the construction of a rich test suite.

The course staff have produced 20 variants of the reference compiler, each with a secret intentionally-introduced defect related to type-checking. A high-quality test suite is one that reveals each introduced defect by showing a difference between the behavior of the true reference compiler and the corresponding buggy version. You desire a high-quality test suite to help you gain confidence in your own PA4 submission.

For PA4t, you must produce syntactically valid Cool programs (test cases). There are 20 separate held-out seeded type-checker bugs waiting on the grading server. For each bug, if one of your tests causes the reference and the buggy version to produce difference output (that is, either a different .cl-type file or a different error report), you win: that test has revealed that bug. For full credit your tests must reveal at least 15 of the 20 unknown defects.

The secret defects that we have injected into the reference compiler correspond to common defects made by students in PA4. Thus, if you make a rich test suite for PA4t that reveals many defects, you can use it on your own PA4 submission to reveal and fix your own bugs!

PA4c: Checkpoint

PA4c is a checkpoint for PA4. The typechecker is a large project (and a large part of your grade), so it behooves you to start it early.

For PA4c you should turn in (electronically) an early version of PA4 that does the following:

Reads in the .cl-ast file given as a command-line argument.
- You do not need to use a parser generator to read in the .cl-ast file — its format was specifically chosen to make it easy to read with just some mutually-recursive procedures. It should take you (much) less than 150 lines to read in the .cl-ast file.
Does every bit of typechecking and semantic analysis possible without typechecking expressions.
- Thus you should not annotate types in initializer expressions in the class map.
Prints out error messages as normal.
Outputs only the class map to .cl-type if there are no errors.
- You can use the --class-map command-line argument to get the reference compiler to spit out the class map after typechecking (for comparison).

Thus you should build the class hierarchy and check everything related to that. For example:

Check to see if a class inherits from Int (etc.).
Check to see if a class inherits from an undeclared class.
Check for cycles in the class hierarchy.
Check for duplicate method or attribute definitions in the same class.
Check for a child class that redefines a parent method but changes the parameters.
Check for a missing method main in class Main.
Check for self and SELF_TYPE mistakes in classes and methods.
This list is not exhaustive -- read the Cool Reference Manual carefully and find everything you might check for without typechecking expressions.
Basically, you'll look at classes, methods and attibutes (but not method bodies).

Note

No exact list of errors is provided for this assignment. Part of your task is to think up all possible checks that do not involve expressions. If you were designing the tools for a new language, you wouldn't know the possible errors in advance; part of your job as the language designer is to consider corner cases of your language's specification. Use your tests from PA4t as a starting point.

PA4

Your final submission for PA4 will perform all of the same checks from PA4c, and it will also check expressions (method bodies).

Error Reporting

To report an error, write the string

ERROR: line_number: Type-Check: message

to standard output and terminate the program. You may write whatever you want in the message, but it should be fairly indicative.

Example erroneous input:

class Main inherits IO { 
 main() : Object { 
   out_string("Hello, world.\n" + 16777216) -- adding string + int !? 
 } ; 
} ;

Example error report output:

ERROR: 3: Type-Check: arithmetic on String Int instead of Ints

Line Number Error Reporting

The typing rules do not directly specify the line numbers on which errors are to be reported. As of v1.11, the Cool reference compiler uses these guidelines (possibly surprising ones are italicized):

Errors related to parameter-less method main in class Main: always line 0
Inheritance cycle: always line 0
Other inheritance type problem: inherited type identifier location
self or SELF_TYPE used in wrong place: self (resp. SELF_TYPE) identifier (resp. type) location
Redefining a feature: (second) feature location
Redefining a formal or class: (second) identifier location
Other attribute problems: attribute location
Redefining a method and changing types: (second) type location
Other problems with redefining a method: method location
Method body type does not conform: method name identifier location
Attribute initializer does not conform: attribute name identifier location
Errors with types of arguments to relational/arithmetic operations: location of relational/arithmetic operation expression
Errors with types of while / if subexpression(s): location of (enclosing) while or if expression (not the location of the conditional expression)
Errors with case expression (e.g., lub): location of case expression
Errors with conformance in let: location of let expression (not location of initializer)
Errors in blocks: location of (beginning of) block expression
Errors in actual arguments: location of method invocation expression (not the location of any particular actual argument)
Assignment does not conform: assignment expression location (not right-hand-side location)
Unknown identifier: location of identifier
Unknown method: location of method name identifier
Unknown type: location of type

Remember that you do not have to match the English prose of the reference compiler's error messages at all. You just have to get the line number right.

Semantic checks are unordered; if a program contains two or more errors, you may indicate whichever you like. You can infer from this that all of our test cases will contain at most one error.

The .cl-type File Format

If there are no errors in file.cl-ast your program should create file.cl-type and serialize the class map, implementation map, parent map, and annotated AST to it.

The class and implementation maps are described in the Cool Reference Manual.

A .cl-type file consists of four sections:

The class map.
The implementation map.
The parent map.
The annotated AST.

Simply output the four sections in order, one after the other.

We will now describe exactly what to output for the class and implementation maps. The general idea and notation (one string per line, recursive descent) are the same as in PA3.

The Class Map
- Output class_map \n.
- Output the number of classes and then \n.
- Output each class in turn (in ascending alphabetical order):
  - Output the name of the class and then \n.
  - Output the number of attributes and then \n.
  - Output each attribute in turn (in order of appearance, with inherited attributes from a superclass coming first):
    - Output no_initializer \n and then the attribute name \n and then the type name \n.
    - or Output initializer \n and then the attribute name \n and then the type name \n and then the initializer expression.
The Implementation Map
- Output implementation_map \n.
- Output the number of classes and then \n.
- Output each class in turn (in ascending alphabetical order):
  - Output the name of the class and then \n.
  - Output the number of methods for that class and then \n.
  - Output each method in turn (in order of appearance, with inherited or overridden methods from a superclass coming first; internal methods are defined to appear in ascending alphabetical order):
    - Output the method name and then \n.
    - Output the number of formals and then \n.
    - Output each formal's name only:
      - Output the name and then \n
    - If this method is inherited from a parent class and not overriden, output the name of the ultimate parent class that defined the method body expression and then \n. Otherwise, output the name of the current class and then \n.
    - Output the method body expression.
The Parent Map
- Output parent_map \n.
- Output the number of parent-child inheritance relations and then \n. This number is equal to the number of classes minus one (since Object has no parent).
- Output each child class in turn (in ascending alphabetical order):
  - Output the name of the child class and then \n.
  - Output the name of the child class's parent and then \n.
The Annotated AST
- With two exceptions, the annotated AST format is identical to the normal AST from PA3.
- The first change involves expressions. To output an Expression:
  1. Output the line number of the expression and then a newline (as in PA3).
  2. Output the name of type associated with the expression and then a newline. For example, the expression 3+x is associated with the type Int. This is new to PA4. It should be done for PA4, but not for PA4c.
  3. Output the name of the expression and then a newline and then any subparts (as in PA3).
- The second change is a new kind of expression, internal, used to represent the bodies of predefined methods. Internal expressions are those that are handled by the run-time system — you might think of them as part of the standard library. You output Internal Expressions (including the type annotation, as above) as follows:
  - 0 \n type \n internal \n Class.method \n
  The valid kinds of internal expressions (i.e., the values for Class.method) are:
  - IO.in_int IO.in_string IO.out_int IO.out_string Object.abort Object.copy Object.type_name String.concat String.length String.substr
  They are formally defined in the Cool Reference Manual.
  Note that you must output information about all classes and methods defined in the program as well as all base classes (and their methods). Do not just print out "classes actually used" or "methods actually called" or something like that. Output all classes and methods — no optimizations or shortcuts!

Example input:

class Main inherits IO {
  my_attribute : Int <- 5 ; 
  main() : Object { 
    out_string("Hello, world.\n") 
  } ;
} ;

Example .cl-type class map output with comments:

class_map 6 -- number of classes Bool -- note: includes predefined base classes 0 IO 0 Int 0 Main 1 -- our Main has 1 attribute initializer my_attribute -- named "my_attribute" Int -- with type Int

2 -- initializer expression line number Int -- initializer expression type (see above: this is an expression annotated with a type) -- do not emit these expression type annotations for the PA4c Checkpoint! integer -- initializer expression kind 5 -- which integer constant is it?

Object 0 String 0

Example .cl-type implementation map output with comments:

implementation_map  
6               -- six classes
Bool            -- first is Bool
3               -- it has three methods
abort           -- first is abort()
0               -- abort has 0 formal arguments
Object          -- name of parent class from which Bool inherits abort()
0               -- abort's body expression starts on line 0
Object          -- abort's body expression has type Object
internal        -- abort's body is an internal kind of expression (i.e., a system call; see above)
Object.abort    -- extra detail on abort's body expression
copy            -- second of Bool's three methods is copy()
0               -- copy has 0 formal arguments
Object          -- name of parent class from which Bool inherits copy()
0               -- copy's body expression starts on line 0
SELF_TYPE       -- copy's body expression has type SELF_TYPE
internal        -- copy's body is an internal kind of expression (i.e., a system call; see above)
Object.copy     -- extra detail on copy's body expression
... many lines skipped ...  
Main            -- another class is Main
8               -- it has 8 methods
... many lines skipped ...  
main            -- one of Main's methods is main()
0               -- main has 0 formal arguments
Main            -- the name of the class where Main.main() is defined
4               -- the body expression of Main.main starts on line 4
SELF_TYPE       -- the body expression of Main.main has type SELF_TYPE
self_dispatch   -- the body of Main.main() is a self_dispatch kind of expression	
... many lines skipped ...

Example .cl-type parent output with comments:

parent_map  
5               -- there are five classes with parents (Object is the sixth class)
Bool            -- Bool's parent ...
Object          -- ... is Object.
IO              -- IO's parent ...
Object          -- ... is Object.
Int             -- Int's parent ...
Object          -- ... is Object.
Main            -- Main's parent ...
IO              -- ... is IO.
String          -- String's parent ...
Object          -- ... is Object.

Writing the code to output a .cl-type text file given an AST may take a bit of time but it should not be difficult; the reference implementation (OCaml) does it in 35 lines and cleaves closely to the structure given above. Reading in the AST is similarly straightforward; our reference implementation (OCaml) does it in 171 lines. (My JavaScript implementation has a similar number of lines for each task.)

Commentary

You can do basic testing as follows:

Example testing

$ cool --parse file.cl
$ cool --out reference --type file.cl
$ my-checker file.cl-ast
$ diff -b -B -E -w file.cl-type reference.cl-type

You should implement all of the typing rules in the Cool Reference Manual. There are also a number of other rules and corner cases you have to check (e.g., no class can inherit from Int, you cannot redefine a class, you cannot have an attribute named self, etc.). They are sprinkled throughout the manual. Check everything you possibly can.

Getting Started with JavaScript

If you've never used JavaScript in the past, there's no need to worry. You already learned on programming language in this class (Reason), so you can learn another. While JavaScript looks something like Java, you're actually better off thinking about your programs as if you're writing Reason or Python.

Douglas Crockford (who is heavily involved in the development of JavaScript) has many helpful resources on his personal website. I also recommend watching a talk he gave at Google many years ago. Be sure to note these resources in your references.txt if you use them.

One of my favorite things about programming in JavaScript is Chrome's developer tools. You can use them with NodeJS with a couple additional flags on the command line:

$ nodejs --inspect --debug-brk main.js

Not only do the developer tools let you single-step through your program and inspect program state, but there is also an interactive shell available for you to use.

Getting Started Video

As additional assistance for getting up to speed with JavaScript, I recorded a screen cast in which I begin implementing the parsing functions to read in a Cool AST. The video contains discussion of function scoping, callbacks, higher-order functions, debugging, and JavaScript Objects. The quality of the jokes cannot be guaranteed.

Getting Started with JavaScript

Symbol Tables

Remember how OCaml has a nice Hashtbl module that is useful for symbol tables? Here is a limited version written in JavaScript:

symboltable.zip

Hint

Hint: because you can find "positive" bugs in your typechecker more easily (e.g., by running your typechecker on the correct Cool programs from cool-examples.zip), the PA4t exercise is strongly biased toward "negative" bugs (i.e., the secret buggy typecheckers usually fail to report certain semantic errors).

If you are still stuck, you can post on the forum, approach the TAs, or approach the professor.

Video Guides

Wes Weimer has developed a number of Video Guides that you might find helpful. The Video Guides are walkthroughs in which Wes manually completes and narrates, in real time, the first part of a similar assignment — including a submission to his grading server. They include coding, testing and debugging elements.

These videos are considered an outside resource for completing this assignment. Be sure to note these videos in your references.txt if you use them.

Note: Wes's videos use a different submission site from this class.

What to Submit For PA4t

You must turn in a zip file containing these files:

A set of up to 99 .cl files: Cool typechecker test cases
- Each testcase you submit must be syntactically valid (i.e., must pass cool --parse).
- Each testcase you submit may be semantically valid or semantically invalid, your choice (i.e., it can pass or fail cool --type).
- No testcase should be named bug... or ref... because the testing server uses those prefices internally. If you submit a test case with such a name it will be ignored.
- If you submit more than 99 tests, some will be ignored.

Your zip file may also contain:

team.txt — an optional file listing your other team member's UVA ID

What to Submit For PA4c

You must turn in a zip file containing these files:

source_files: including
- main.js

Your zip file may also contain:

team.txt — an optional file listing your other team member's UVA ID

What to Submit For PA4

You must turn in a zip file containing these files:

readme.txt: your README file
references.txt: your file of citations
good.cl: a novel positive testcase
bad1.cl, bad2.cl, and bad3.cl: novel negative testcases
source_files: including
- main.js

Your zip file may also contain:

team.txt — an optional file listing your other team member's UVA ID

Working In Pairs

You may complete this assignment in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.

Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the teaching assistants will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment.

If you are working in a team, exactly one team member should submit a PA4 zipfile. That submission should include the file team.txt, a one-line, one-word flat ASCII text file that contains the email ID of your teammate. Don't include the @virgnia.edu bit. Example: If ph4u and kaa2nx are working together, ph4u would submit ph4u-pa2.zip with a team.txt file that contains the word kaa2nx. Then ph4u and kaa2nx will both receive the same grade for that submission.

This seems minor, but in the past we've had students fail to correctly format this one word file. Thus you now get a point on this assignment for either formatting this file correctly (i.e., including only a single word that is equal to your partner's UVA email ID) or not including it (and thus not working in a pair).

Legacy Grading

The legacy server has an older version of NodeJS installed. Be careful of language features you choose to use!

Grading Rubric

PA4 Grading (out of 100 points):

75 points: autograder tests
- Pa4t — (-0.5 point per missed defect [you only need to reveal 15 defects]) Each missed test removes points, to a minimum of 0/5, even if there are more tests than total points.
- PA4c — (-0.5 point per incorrect test, minimum score of 0) Each missed test removes points, to a minimum of 0/5, even if there are more tests than total points.
- PA4 — (-1 point per incorrect test, minimum score of 0) Each missed test removes points, to a minimum of 0/65, even if there are more tests than total points.
1 point: a correct team.txt file
- 1 — file contains a single word: the UVA ID address of your partner (e.g., kaa2nx)
- 1 — file is not present at all, and you are working alone
- 0 — file is present but contains something other than the single word of your partner's UVA ID
8 points: a clear description in your README and references
- 8 — thorough discussion of design decisions (e.g., the handling of let) and choice of test cases; a few paragraphs of coherent English sentences should be fine. Citations provided are well-formatted.
- 4 — vague or hard to understand; omits important details. Citations provided are well-formatted.
- 0 — little to no effort, or submitted an RTF/DOC/PDF file instead of plain TXT. Citations do not provide correct information.
8 points: valid and novel good.cl, bad1.cl, bad2.cl, and bad3.cl files
- 8 — wide range of test cases added, stressing most Cool features and three error condition, novel files
- 4 — added some tests, but the scope not sufficiently broad
- 0 — little to no effort, or submitted an RTF/DOC/PDF file instead of plain TXT, or submitted part of course files as test cases
8 points: code cleanliness
- 8 — code is mostly clean and well-commented
- 4 — code is sloppy and/or poorly commented in places
- 0 — little to no effort to organize and document code
3.5 points extra credit: Early/Complete Test Suite
- 0.5 — every 5 revealed defects by 2017-03-03 at 11:50 pm (maximum 2 points)
- 0.5 — every defect over 15 you reveal (maximum 1.5 points)

PA4: The Semantic Analyzer (JavaScript)

PA4t Due: 2017-03-14 at 11:50pm

PA4c Due: 2017-03-19 at 11:50pm

PA4 Due: 2017-03-28 at 11:50pm

Goal

Specification

PA4t: Creating PA4 Tests

PA4c: Checkpoint

Note

PA4

Error Reporting

Line Number Error Reporting

The .cl-type File Format

Commentary

Getting Started with JavaScript

Getting Started Video

Symbol Tables

Hint

Video Guides

What to Submit For PA4t

What to Submit For PA4c

What to Submit For PA4

Working In Pairs

Legacy Grading

Grading Rubric