Runtime Organization

Introduction

We have covered the front-end phases: lexing, parsing, semantic analysis
Now back-end: optimization, code generation
Before that: we need to talk about runtime organization --- what do we need to generate. Later we will talk about how to generate it (code generation)

What lives where?

What:

code
variables: local and global
malloc'd memory

Where?

Memory address space (how to manage, what is the layout)
registers
disk

Who manages what?

Who

code generated by compiler
OS
runtime libraries
hardware

Execution of a program is initially under the control of the operating system

The OS allocates space for a program
Code is loaded into part of the space
OS jumps to the entry point (e.g., "main")
Other space is used for data. Some of the data space is initialized by the OS, rest is managed by compiler-generated code and/or libraries.
The data space can be dynamically (de)allocated using calls to the OS.

Compiler is responsible for (a) generating code, and (b) orchestrating the use of the data area.

Draw a picture of memory with "code" and "other space".

Will draw memory as a rectangle with low address at bottom and high address at top
Lines delimiting different areas of the memory
Simplification: assume contiguous memory (need not be necessary)

Programming model

We will assume the following programming model:

Sequential execution
Procedure call/return semantics

For now, we are ignoring the following programming constructs:

Exceptions (throw/catch)
call/cc : call with current continuation

These additional programming constructs will depend on the ideas we will discuss.

Activations

An invocation of procedure P is an activation of P
The lifetime of an activation of P is
- All the steps to execution P
- Including all the steps in procedures P calls
The lifetime of a variable x is the portion of execution in which x is defined.
- For a global variable, this is the same as the lifetime of the program, also known as the lifetime of the main() function.
- In languages like C, C++, Rust, a local variable's lifetime is contained within the lifetime of the procedure activation that created it.
- In languages like Java, Python, OCaml, the allocation and deallocation of a local variable is performed on the heap and garbage-collected. So its lifetime may extend beyond the lifetime of the procedure activation that created the variable.
Note that
- Lifetime is a dynamic (run-time) concept
- Scope is a static concept
- Many simultaneous runtime instances of the same procedure or variable are possible.
- Lifetimes often cross scope boundaries
Observation
- When P calls Q, then Q returns before P returns
Lifetimes of procedure activations are properly nested. Draw figure of P's lifetime and how Q's lifetime is a sub-interval of that.
Activation lifetimes can be depicted as a tree
Consider an example:
```
int g() { return 1; }
int f() { return g(); }

int
main() {
g();
f();
}
	
```
Draw the tree of activation lifetimes: main->g, main->f, f->g.
- The example shows nesting and the fact that different calls to the same procedure will have different activations

Another example

int g() { return 1; }
int f(int x) { if (x == 0) then return g(); else return f(x- 1); }

int
main() {
return f(2);
}

Draw the activation tree: main->f2, f2->f1, f1->f0, f0->g.

The activation tree depends on run-time behaviour (may be different for different inputs).
Since activations are properly nested, a stack can track the currently active activations
- The stack will not keep track of the entire activation tree
- It will only keep track of the currently active activations
- Work the first example showing the tree and the active lifetimes (as a stack) over runtime
Let's look at the memory layout, where the first portion is occupied by the code. One of the important structures that goes in the "data area" is the stack of active activations. The stack typically grows in one direction; usually downwards (by convention).

Activation Records

The active activations can be maintained in a stack, also called the call stack. The information for each activation is stored in an activation record, also called a frame. What information to keep in an activation?

The information needed to manage one procedure activation is called an activation record (AR) or frame
If procedure F calls procedure G, then G's activation record contains a mix of info about F and G
F is "suspended" until G completes, at which point F resumes
G's AR contains information needed to
- Complete execution of G
- Resume execution of F
AR: (1) result , (2) arguments , (3) control-link : pointer to the caller's activation , (4) return address
Executing this program by hand, show the AR for each function (except main). There can be two return addresses for f, and show them as "*" and "**" respectively. These represent addresses of the instructions in memory. Show the ARs as abstract structures with pointers to each other --- do not show them as a part of the stack yet.
This stack is not as abstract as a general stack. This stack has an array-like layout, and hence compiler writers will often play tricks to exploit this fact that activation records are adjacent in memory.
Because the lifetime of a local variable is contained within the lifetime of the activation that created it, a local variable can also be stored in an AR. Further, an AR may shrink and grow dynamically as the activation executes, e.g., to (de)allocate local variables. The AR format can be specialized for variadic procedures.
This is only one of many possible AR designs. e.g., many compilers don't use a control link, they rely on the fact that the stack is laid out linearly as an array. Similarly, the return address may be in a register (and not in the stack). The compiler determines the AR. Different compilers may employ different ARs. Some ARs are more efficient than others. Some ARs are easier to generate code for.

The advantage of placing the result in the first word of the AR is that the caller can find the result at a fixed offset from its own activation (size of its AR + 1).
Can divide responsibility between caller/callee differently
Real compilers make careful tradeoffs on which part of the activation frame should be in registers and which part in memory
The compiler must determine, at compile-time, the layout of activation records and generate code that correctly accesses locations in the activation record. Thus AR layout and the code generator must be designed together!

Globals and Heap

A global variable's lifetime transcends all procedure lifetimes. It has a global lifetime
- Can't store a global in an activation record
Globals are assigned a fixed address once
- Variables with fixed address are "statically allocated"
Depending on the language, there may be other statically allocated values
Memory layout: Code, static data, stack (grows downwards)
Mention the possibility of self-referential and self-modifying code, with possibly the example of variable shifting through constant-shifting instruction opcodes (using code modification at runtime).

Heap objects

A value that outlives the procedure that creates it cannot be kept in the AR
e.g., new A should survive deallocation of the current procedure's AR.

Usually

The code area contains object code: fixed size and read only
Static area contains data (not code) with fixed addresses: fixed size, may be readable or writeable
Stack contains an AR for each currently active procedure. Each AR usually fixed size, contains locals
Heap contains all other data: In C, heap is managed through malloc and free

Both heap and stack grow. Must take care they don't grow into each other. Simple solution: put them at opposite ends of memory and let them grow towards each other. If the two regions start touching each other, then the program is out of memory (or it may request more memory, etc.). This design allows different programs to use these different areas as they see fit, e.g., some programs may have large stacks, other may have large static data regions.

Alignment

Low-level but important detail that every compiler writer needs to be aware of.

Most modern machines are 32 or 64 bit
- 8 bits in a byte
- 4 or 8 bytes in a word in a 32 and 64-bit machine respectively.
- Machines are either byte or word addressable
- Machines are optimized for word-sized accesses (as all pointers are word-sized).
- A word-sized access should ideally be word-aligned (to avoid straddling across cache lines for example)
- For this reason, an unaligned access either throws an exception or is much slower, e.g., up to 10x slower.
- Alignment restrictions may be opcode specific.
Did you know: C requires an object of type int to be aligned by sizeof(int), which is typically four. The compiler may need to generate padding bytes on stack to meet this requirement for local variables.
Example: a string "Hello" takes 5 characters (without a terminating '\0').
- Show an array of bytes with word-boundaries marked with darker lines
- To word align next word, add 3 "padding" characters to the string
- The padding is not part of the string, it's just unused memory

Consider the following C declaration

    struct foo {
      int i1;
      char s[5];
      int i2; //the compiler will like introduce three "pad" bytes between s and i2, to ensure that integer accesses are four-byte aligned
    };

The expression sizeof(struct foo) would include the pad bytes.

Here is an example translation of the declaration of a local variable x:
```
  void foo(...) {
     int x;
     ...
  }
  
```
to the following x86 code:
```
  foo:
    and $0xfffffffc, %esp
    sub $4, %esp
  
```
The mask through the and instruction implements the alignment restriction required in C. Some calling conventions require esp to be aligned at the start of every procedure-call (to avoid some of these masking instructions).

Stack machines

Begin talking about code generation. Stack machines are the simplest model of code generation

In a stack machine

Only storage is a stack
An instruction
```
r = F(a1, ..., an):
	
```
- Pop the top n words off the stack
- Apply f
- Push the result on the stack
Example: two instructions push i and add
Location of the operands/result is not explictly stated
- Always the top of the stack

Invariant: after evaluating an expression e, the value of e is pushed to the top of the stack

Important property: expression evaluation preserves the stack (below the pushed value)
```
      cgen(i) =  push i
      
```

      cgen(a + b) =  cgen(a)
                  +  "push acc"
                  +  cgen(b)
                  +  "push acc"
                  +  "add"

In contrast to a register machine
- add instead of add r1,r2,r3
- More compact programs
- One reason that Java bytecode uses stack evaluation model
  - In early days, Java was meant for mobility and memory was scarce
There is an intermediate point between a pure stack machine and a pure register machine
An n-register stack machine: Conceptually, keep the top n locations of the stack in the registers
A one-register stack machine: the one register is called an accumulator.
Advantage of a one-register stack machine:
- In a pure stack machine
  - An add does 3 memory operations (load two arguments and store one result
- In a one-register stack machine
  - The add does acc <-- acc + top_of_stack
  - Just one memory read
Consider an expression op(e1,...,en)
- Note e1,..,en are subexpressions
For each ei, compute ei with the result in the accumulator; push the result on the stack
For en, just evaluate into accumulator, do not push on stack.
Evaluate op using values on stack and en in accumulator to produce result in accumulator
e.g., 7+5: push 7; acc <-- 5; acc <-- acc + [top]

Invariant: after evaluating an expression e, the accumulator holds the value of e and the stack is unchanged

Important property: expression evaluation preserves the stack
```
      cgen(i) =  acc := i
      
```

      cgen(a + b) =  cgen(a)
                  +  "push acc"
                  +  cgen(b)
                  +  "add"
                  +  "pop"

Another example: 3 + (7 + 5)
- First evaluate 3
- Save 3
- Evaluate 7
- Save 7
- Evaluate 5
- Add
- Add
Another important property: the expressions are being evaluated left-to-right. i.e., the evaluation order is left to right which also determines the order of the operands on the stack.
- Some code generation strategies may depend on the evaluation order like this one
- Others may not, e.g., where we had named identifiers (e.g., registers) storing the result for each intermediate expression