Runtime Organization
Introduction
- We have covered the front-end phases: lexing, parsing, semantic analysis
- Now back-end: optimization, code generation
- Before that: we need to talk about runtime organization --- what do we need to generate. Later we will talk about how to generate it (code generation)
What lives where?
What:
- code
- variables: local and global
- malloc'd memory
Where?
- Memory address space (how to manage, what is the layout)
- registers
- disk
Who manages what?
Who
- code generated by compiler
- OS
- runtime libraries
- hardware
Execution of a program is initially under the control of the operating system
- The OS allocates space for a program
- Code is loaded into part of the space
- OS jumps to the entry point (e.g., "main")
- Other space is used for data. Some of the data space is initialized by the OS, rest is managed by compiler-generated code and/or libraries.
- The data space can be dynamically (de)allocated using calls to the OS.
Compiler is responsible for (a) generating code, and (b) orchestrating the use of the data area.
Draw a picture of memory with "code" and "other space".
- Will draw memory as a rectangle with low address at bottom and high address at top
- Lines delimiting different areas of the memory
- Simplification: assume contiguous memory (need not be necessary)
Programming model
We will assume the following programming model:
- Sequential execution
- Procedure call/return semantics
For now, we are ignoring the following programming constructs:
- Exceptions (throw/catch)
- call/cc : call with current continuation
These additional programming
constructs will depend on the ideas we will discuss.
Activations
Activation Records
The active activations can be maintained in a stack, also called the call stack. The information for each activation is stored in an activation record, also called a frame.
What information to keep in an activation?
- The information needed to manage one procedure activation is called an activation record (AR) or frame
- If procedure
F
calls procedure G
, then G
's activation record contains a mix of info about F
and G
F
is "suspended" until G
completes, at which point F
resumes
G's
AR contains information needed to
- Complete execution of
G
- Resume execution of
F
-
AR: (1) result , (2) arguments , (3) control-link : pointer to the caller's activation , (4) return address
- Executing this program by hand, show the AR for each function (except main). There can be two return addresses for
f
, and show them as "*" and "**" respectively. These represent addresses of the instructions in memory. Show the ARs as abstract structures with pointers to each other --- do not show them as a part of the stack yet.
- This stack is not as abstract as a general stack. This stack has an array-like layout, and hence compiler writers will often play tricks to exploit this fact that activation records are adjacent in memory.
- Because the lifetime of a local variable is contained within the lifetime of the activation that created it, a local variable can also be stored in an AR. Further, an AR may shrink and grow dynamically as the activation executes, e.g., to (de)allocate local variables. The AR format can be specialized for variadic procedures.
- This is only one of many possible AR designs. e.g., many compilers don't use a control link, they rely on the fact that the stack is laid out linearly as an array. Similarly, the return address may be in a register (and not in the stack). The compiler determines the AR. Different compilers may employ different ARs. Some ARs are more efficient than others. Some ARs are easier to generate code for.
- The advantage of placing the result in the first word of the AR is that the caller can find the result at a fixed offset from its own activation (size of its AR + 1).
- Can divide responsibility between caller/callee differently
- Real compilers make careful tradeoffs on which part of the activation frame should be in registers and which part in memory
- The compiler must determine, at compile-time, the layout of activation records and generate code that correctly accesses locations in the activation record. Thus AR layout and the code generator must be designed together!
Globals and Heap
- A global variable's lifetime transcends all procedure lifetimes. It has a global lifetime
- Can't store a global in an activation record
- Globals are assigned a fixed address once
- Variables with fixed address are "statically allocated"
- Depending on the language, there may be other statically allocated values
- Memory layout: Code, static data, stack (grows downwards)
- Mention the possibility of self-referential and self-modifying code, with possibly the example of variable shifting through constant-shifting instruction opcodes (using code modification at runtime).
Heap objects
- A value that outlives the procedure that creates it cannot be kept in the AR
- e.g., new A should survive deallocation of the current procedure's AR.
Usually
- The code area contains object code: fixed size and read only
- Static area contains data (not code) with fixed addresses: fixed size, may be readable or writeable
- Stack contains an AR for each currently active procedure. Each AR usually fixed size, contains locals
- Heap contains all other data: In C, heap is managed through malloc and free
Both heap and stack grow. Must take care they don't grow into each other. Simple solution: put them at opposite ends of memory and let them grow towards each other. If the two regions start touching each other, then the program is out of memory (or it may request more memory, etc.). This design allows different programs to use these different areas as they see fit, e.g., some programs may have large stacks, other may have large static data regions.
Alignment
Low-level but important detail that every compiler writer needs to be aware of.
- Most modern machines are 32 or 64 bit
- 8 bits in a byte
- 4 or 8 bytes in a word in a 32 and 64-bit machine respectively.
- Machines are either byte or word addressable
- Machines are optimized for word-sized accesses (as all pointers are word-sized).
- A word-sized access should ideally be word-aligned (to avoid straddling across cache lines for example)
- For this reason, an unaligned access either throws an exception or is much slower, e.g., up to 10x slower.
- Alignment restrictions may be opcode specific.
- Did you know: C requires an object of type
int
to be aligned by sizeof(int)
, which is typically four
. The compiler may need to generate padding bytes on stack to meet this requirement for local variables.
- Example: a string "Hello" takes 5 characters (without a terminating '\0').
- Show an array of bytes with word-boundaries marked with darker lines
- To word align next word, add 3 "padding" characters to the string
- The padding is not part of the string, it's just unused memory
- Consider the following C declaration
struct foo {
int i1;
char s[5];
int i2; //the compiler will like introduce three "pad" bytes between s and i2, to ensure that integer accesses are four-byte aligned
};
The expression sizeof(struct foo)
would include the pad bytes.
- Here is an example translation of the declaration of a local variable
x
:
void foo(...) {
int x;
...
}
to the following x86 code:
foo:
and $0xfffffffc, %esp
sub $4, %esp
The mask through the and
instruction implements the alignment restriction required in C.
Some calling conventions require esp
to be aligned at the start of every procedure-call (to avoid some
of these masking instructions).
Stack machines
Begin talking about code generation. Stack machines are the simplest model of code generation
In a stack machine
- Only storage is a stack
- An instruction
r = F(a1, ..., an):
- Pop the top
n
words off the stack
- Apply
f
- Push the result on the stack
- Example: two instructions
push i
and add
- Location of the operands/result is not explictly stated
- Always the top of the stack
- Invariant: after evaluating an expression
e
, the value of e
is pushed to the top of the stack
- In contrast to a register machine
add
instead of add r1,r2,r3
- More compact programs
- One reason that Java bytecode uses stack evaluation model
- In early days, Java was meant for mobility and memory was scarce
- There is an intermediate point between a pure stack machine and a pure register machine
- An n-register stack machine: Conceptually, keep the top n locations of the stack in the registers
- A one-register stack machine: the one register is called an accumulator.
- Advantage of a one-register stack machine:
- In a pure stack machine
- An
add
does 3 memory operations (load two arguments and store one result
- In a one-register stack machine
- The add does
acc <-- acc + top_of_stack
- Just one memory read
- Consider an expression
op(e1,...,en)
- Note
e1,..,en
are subexpressions
- For each
ei
, compute ei
with the result in the accumulator; push the result on the stack
- For
en
, just evaluate into accumulator, do not push on stack.
- Evaluate
op
using values on stack and en
in accumulator to produce result in accumulator
- e.g.,
7+5
: push 7; acc <-- 5; acc <-- acc + [top]
- Invariant: after evaluating an expression
e
, the accumulator holds the value of e
and the stack is unchanged
- Another example: 3 + (7 + 5)
- First evaluate 3
- Save 3
- Evaluate 7
- Save 7
- Evaluate 5
- Add
- Add
- Another important property: the expressions are being evaluated left-to-right. i.e., the evaluation order is left to right which also determines the order of the operands on the stack.
- Some code generation strategies may depend on the evaluation order like this one
- Others may not, e.g., where we had named identifiers (e.g., registers) storing the result for each intermediate expression