EE 360N - Programming Assignment 1 Clarifications

Constants can be expressed in hex or in decimal. Hex constants consist of an 'x' or 'X' followed by one or more hex digits. Decimal constants consist of a '#' followed by one or more decimal digits. Negative constants are identified by a minus sign immediately after the x or #. For example, #-10 is the negative of decimal 10 (i.e., -10), and x-10 is the negative of x10 (i.e. -16).
Since the sign is explicitly specified, the rest of the constant is treated as an unsigned number. For example, x-FF is equivalent to -255. The 'x' tells us the number is in hex, the '-' tells us it is a negative number, and "FF" is treated as an unsigned hex number (i.e., 255). Putting it all together gives us -255.

Your assembler does not have to check for multiple .ORIG pseudo-ops.

Since the .END pseudo-op is used to designate the end of the assembly language file, your assembler does not need to process anything that comes after the .END.

The trap vector for a TRAP instruction and the shift amount for SHF instructions must be positive values. If they are not, you should return error code 3.

The same label should not appear in the symbol table more than once. During pass 1 of the assembly process, you should check to make sure a label is not already in the symbol table before adding it to the symbol table. If the label is already in the symbol table, you should return error code 4.

An invalid label (i.e., one that contains non-alphanumeric characters, or one that starts with the letter 'x' or a number) is another example of error code 4.

The standard C function isalnum() can be used to check if a character is alphanumeric.

After you have gone through the input file for pass 1 of the assembler and your file pointer is at the end of the file, there are two ways you can get the file pointer back to the beginning. You can either close and reopen the file or you can use the standard C I/O function rewind().

The following definitions can be used to create your symbol table:

#define MAX_LABEL_LEN 20
#define MAX_SYMBOLS 255

typedef struct{
  int address;
  char label[MAX_LABEL_LEN + 1];   /*Question for the reader: Why do we need to add 1? */
} TableEntry;

TableEntry symbolTable[MAX_SYMBOLS];

To check if two strings are the same, you can use the standard C string function strcmp(). To copy one string to another, you can use the standard C string function strcpy().

If you decide to use any of the math functions in math.h, you also have to link the math library by using the command "gcc -lm -ansi -o assemble assembler.c".

The specification for the lab classifies the error "ADD R0, R0, 1" as "invalid operand," error code 4. However, if you use the given toNum function to detect this error while checking a possible immediate operand, it produces error code 3 ("invalid constant"). Therefore, we will accept both error 3 and error 4 for this case.

When your assembler finds an error in the input assembly language program, it is not required that you print out an error message to the screen. If you choose to do this to make debugging easier, that is fine. What is required is that you exit with the appropriate error code. This is what we will be checking for when we grade your program; we will ignore anything that is printed to the screen.

A student pointed out a discrepancy between the lab description and the assembler we posted. It was related to the following example:
```
.ORIG x3000
JSR ADD         ; JSR is parsed as an opcode and then ADD is the undefined label
.END
```
The lab description states that this should be reported as error code 1 (undefined label). The old assembler we posted reported error code 4 instead. We have fixed the problem and have posted the new assembler.

A student pointed out that the toNum() function we provided did not detect errors for examples such as:
```
ADD R1, R1, #Invalid
.FILL xInvalid
```
We have posted a new and improved toNum() function on the useful code page. The use of this new toNum() function is optional.