CS241 Lecture Notes - Lecture 5: Binary File, Lexeme, Lexical Analysis
May 15, 2018
Assembly language ---cs241.binasm----> machine language (what an assembler does -
cs241.binasm is an assembler)
Assignment 3 is writing your own s241.binasm
What are the different things you need to consider?
The input is a text file with lines that look like this:
add $3, $1, $2
jr $31
The output is a binary file that looks like this:
101101....
0101.....
How do we start to write our own assembler?
1. Read the input string and make sense out of it (is it valid, is it an add, etc.)
2. Translate it into binary
These steps have actual names...
Stages
1. Analysis: making sense of the input string
- understanding the meaning/intent
2. Synthesis: output equivalent machine instruction in binary
Analysis
Can have multiple steps:
- First thing is to tokenize input: break down the input string into a sequence of tokens
ex.
add $1, $2, $3
Can break this into ID token (add), REG token ($1), COMMA token, REG token ($2), COMMA
token, REG token ($3)
Tokenization is already done for you in A3
class Token {
std::string kind; // the type of token (ID, REG, COMMA)
std::tring lexeme; // the actual input character making up this token ($1, $2)
public:
...
}
This class is given in A3
Scannere produces a vector<vector<Token>>
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
Assembly language ---cs241. binasm----> machine language (what an assembler does - cs241. binasm is an assembler) The input is a text file with lines that look like this: add , , jr . The output is a binary file that looks like this: How do we start to write our own assembler: read the input string and make sense out of it (is it valid, is it an add, etc. , translate it into binary. First thing is to tokenize input: break down the input string into a sequence of tokens ex. add , , . Can break this into id token (add), reg token (), comma token, reg token (), comma token, reg token () Tokenization is already done for you in a3 class token { std::string kind; // the type of token (id, reg, comma) std::tring lexeme; // the actual input character making up this token (, ) public: