Typical IR code uses too many temporaries
The problem: Rewrite the intermediate code to use no more temporaries than there are machine registers
Consider the program
a := c + d e : = a + b f : = e - 1Assume
edead after use. A dead temporary can be "reused".
f all to one
r1 := r2 + r3 r1 := r1 + r4 r1 := r1 - 1
One approach: Temporaries
t2 can share the same register if at any point in the program at most one of
t2 is live. If
t2 is live at the same time, they cannot share a register.
Compute live variables at each point
BB1: <-- b,c,f a := b+c <-- a,c,f d := -a <-- c,d,f e := d + f BB1->BB2 BB1->BB3 BB2: <-- c,e f := 2*e <-- c,f BB3: <-- c,d,e,f b := d + e <-- b,c,e,f e := e - 1 <-- b,c,f BB2->BB4 BB3->BB4 BB4->BB5 (b live) BB4: <-- c,f b := f + c BB4->BB1 BB4->BB6 (b live)
Construct an undirected graph
t2if they are live simultaneously at some point in the program.
For our example
a -> f a -> c f -> b f -> c f -> d f -> e b -> e b -> c e -> c e -> d c -> dE.g.,
ccannot be in the same register while
dcan be in the same register.
RIG extracts exactly the information needed to characterize legal register assignments, and gives a global picture (i.e., over the entire graph) of the register requirements.
Graph coloring: This register allocation problem is identical to the graph-coloring problem. A coloring of a graph is an assignment of colors to nodes, such that nodes connected by an edge have different colors. A graph is k-colorable if it has a coloring with
k colors. For us, colors = registers.
Our example graph: There is no coloring with less than 4 colors. THere are 4 colorings of the graph. Can assign temporaries to registers using one of the 4 colorings, and change the original program accordingly.
Pick a node as a candidate for spilling. This temporary will "live" in memory. Remove the picked node and all its incident edges from the RIG.
Picking the right node is hard. We will just use heuristics to decide which one to pick. Let's say we pick
a. But we are again stuck as the remaining graph is also not 3-colorable. Let's say we now pick
f. After removing
f, the remaining graph can be colored with three colors. But now we try to add back
f. It is possible that after we add back
f, the colors of all four neighbors of
f allow that
f can be colored too. THis is called optimistic coloring. In this example however, optimistic coloring will fail.
If optimistic coloring fails, we spill
f := load fa
store f, fa
fto separate variables, say
It is possible that even after spilling the RIG is not colorable, in which case we should spill another variable (that has not already been spilled).
Eventually we should get a colorable graph, because we are making the RIG much sparser by by significantly reducing the live range of the
variables. For example, the new variables
f1, ... are only live at the instruction using them and the one preceding/succeeding them (for load/store resp.).
t1is used heavily and another where
t1is used sparingly, then it would make sense to register-allocate
t1in the first region and spill it in the second region. However the current algorithm makes an all-or-nothing choice.
Region-based register allocator: partition the program into regions, assign temporaries to registers or memory within each region, and resolve discrepancies at region boundaries. The overall quality of the solution depends heavily on the region partitioning. Some example heuristics:
Even after doing this, there are issues relating to some opcodes requiring only register operands; further other opcodes may require certain specific registers. To deal with these problems, compilers add extra passes (like the "reloading" pass in GCC's register allocator) to resolve these issues (by adding instructions to spill/copy registers). These problems arise due to this layered pass-by-pass nature of our analysis and transformation.
An alternative technique is to do instruction-selection (opcode selection) simultaneously with register-allocation. More expensive in general but can use heuristics to prune the search space while still getting most of the benefits. Some previous work on binary translation demonstrates one such algorithm based on dynamic-programming --- the performance results in this work clearly bring out the weaknesses of the pass-by-pass register allocation approach in compilers like GCC.