Register allocation

Intermediate code uses unlimited temporaries:

Simplifies code generation and optimization
Complicates final translation to assembly

Typical IR code uses too many temporaries

The problem: Rewrite the intermediate code to use no more temporaries than there are machine registers

Potential Method:

Assign multiple temporaries to each register
But without changing the program behaviour

Many-to-one mapping

Consider the program

a := c + d
e : = a + b
f : = e - 1

Assume a and e dead after use. A dead temporary can be "reused".

Can allocate a, e, and f all to one register (r1). Assume that c, d and b to r2, r3, and r4 resp.

r1 := r2 + r3
r1 := r1 + r4
r1 := r1 - 1

One approach: Temporaries t1 and t2 can share the same register if at any point in the program at most one of t1 or t2 is live. If t1 and t2 is live at the same time, they cannot share a register.

Compute live variables at each point

BB1:
           <-- b,e,f
 a := b+e
           <-- a,c,f
 d := -a
           <-- c,d,f
 e := d + f
           <-- c,d,f,e

BB1->BB2
BB1->BB3

BB2:
           <-- c,e
 f := 2*e
           <-- c,f,e

BB3:
           <-- c,d,e,f
 b := d + e
           <-- b,c,e,f
 e := e - 1
           <-- b,c,f,e

BB2->BB4
BB3->BB4
BB3->BB5 (b live)

BB4:
          <-- c,f,e
 b := f + c
          <-- b,e
BB4->BB1
BB4->BB6 (b live)

Construct an undirected graph

A node for each temporary
An edge between t1 and t2 if they are live simultaneously at some point in the program.

This is the register interference graph (RIG): two temporaries can be allocated to the same register if there is no edge connecting them.

For our example

a -> f
a -> c
f -> b
f -> c
f -> d
f -> e
b -> e
b -> c
e -> c
e -> d
c -> d

E.g., b and c cannot be in the same register while b and d can be in the same register.

RIG extracts exactly the information needed to characterize legal register assignments, and gives a global picture (i.e., over the entire graph) of the register requirements.

Graph coloring: This register allocation problem is identical to the graph-coloring problem. A coloring of a graph is an assignment of colors to nodes, such that nodes connected by an edge have different colors. A graph is k-colorable if it has a coloring with k colors. For us, colors = registers.

Our example graph: There is no coloring with less than 4 colors. There are 4 colorings of the graph. Can assign temporaries to registers using one of the 4 colorings, and change the original program accordingly.

Spilling

What if the graph cannot be colored with the available number of registers? e.g., what if the number of available registers is 3 in our example.

Pick a node as a candidate for spilling. This temporary will "live" in memory. Remove the picked node and all its incident edges from the RIG.

Picking the right node is hard. We will just use heuristics to decide which one to pick. Let's say we pick a. But we are again stuck as the remaining graph is also not 3-colorable. Let's say we now pick f. After removing f, the remaining graph can be colored with three colors. But now we try to add back f. It is possible that after we add back f, the colors of all four neighbors of f allow that f can be colored too. THis is called optimistic coloring. In this example however, optimistic coloring will fail.

If optimistic coloring fails, we spill f:

Allocate a memory location for f
- Typically in the current stack frame.
- Call this address fa.
Before each operation that reads f, insert:
- f := load fa
After each operation that writes f, insert:
- store f, fa
Also rename each use of f to separate variables, say f1, f2, ...
Recompute liveness and retry

It is possible that even after spilling the RIG is not colorable, in which case we should spill another variable (that has not already been spilled). Eventually we should get a colorable graph, because we are making the RIG much sparser by by significantly reducing the live range of the variables. For example, the new variables f1, ... are only live at the instruction using them and the one preceding/succeeding them (for load/store resp.).

Limitations of this method of register allocation

The primary limitation of this algorithm is that it allocates registers statically. For example, if we have two regions of the program, one where t1 is used heavily and another where t1 is used sparingly, then it would make sense to register-allocate t1 in the first region and spill it in the second region. However the current algorithm makes an all-or-nothing choice.

Region-based register allocator: partition the program into regions, assign temporaries to registers or memory within each region, and resolve discrepancies at region boundaries. The overall quality of the solution depends heavily on the region partitioning. Some example heuristics:

A region for each loop region.
Regions based on register pressure.

Even after doing this, there are issues relating to some opcodes requiring only register operands; further other opcodes may require certain specific registers. To deal with these problems, compilers add extra passes (like the "reloading" pass in GCC's register allocator) to resolve these issues (by adding instructions to spill/copy registers). These problems arise due to this layered pass-by-pass nature of our analysis and transformation. For example, it is possible that a special opcode called mul_and_add operates only on registers, but our register allocator spills the corresponding operand to memory causing unnecessary memory accesses; these will now need to be cleaned up by the reloading pass. Similarly, it is possible that a certain opcode operates only on memory, and yet the register allocator (being unaware) register-allocated the operand.

An alternative technique is to do instruction-selection (opcode selection) simultaneously with register-allocation. More expensive in general but can use heuristics to prune the search space while still getting most of the benefits. Some previous work on binary translation demonstrates one such algorithm based on dynamic-programming --- the performance results in this work clearly bring out the weaknesses of the pass-by-pass register allocation approach in compilers like GCC.