Typical IR code uses too many temporaries
The problem: Rewrite the intermediate code to use no more temporaries than there are machine registers
Potential Method:
Consider the program
a := c + d e : = a + b f : = e - 1Assume
a
and e
dead after use. A dead temporary can be "reused".
Can allocate a
, e
, and f
all to one
register (r1
). c
, d
and b
to r2
, r3
, and r4
resp.
r1 := r2 + r3 r1 := r1 + r4 r1 := r1 - 1
One approach: Temporaries t1
and t2
can share the same register if at any point in the program at most one of t1
or t2
is live. If t1
and t2
is live at the same time, they cannot share a register.
Compute live variables at each point
BB1: <-- b,c,f a := b+c <-- a,c,f d := -a <-- c,d,f e := d + f BB1->BB2 BB1->BB3 BB2: <-- c,e f := 2*e <-- c,f BB3: <-- c,d,e,f b := d + e <-- b,c,e,f e := e - 1 <-- b,c,f BB2->BB4 BB3->BB4 BB4->BB5 (b live) BB4: <-- c,f b := f + c BB4->BB1 BB4->BB6 (b live)
Construct an undirected graph
t1
and t2
if they are live simultaneously at some point in the program.For our example
a -> f a -> c f -> b f -> c f -> d f -> e b -> e b -> c e -> c e -> d c -> dE.g.,
b
and c
cannot be in the same register while b
and d
can be in the same register.
RIG extracts exactly the information needed to characterize legal register assignments, and gives a global picture (i.e., over the entire graph) of the register requirements.
Graph coloring: This register allocation problem is identical to the graph-coloring problem. A coloring of a graph is an assignment of colors to nodes, such that nodes connected by an edge have different colors. A graph is k-colorable if it has a coloring with k
colors. For us, colors = registers.
Our example graph: There is no coloring with less than 4 colors. THere are 4 colorings of the graph. Can assign temporaries to registers using one of the 4 colorings, and change the original program accordingly.
Pick a node as a candidate for spilling. This temporary will "live" in memory. Remove the picked node and all its incident edges from the RIG.
Picking the right node is hard. We will just use heuristics to decide which one to pick. Let's say we pick a
. But we are again stuck as the remaining graph is also not 3-colorable. Let's say we now pick f
. After removing f
, the remaining graph can be colored with three colors. But now we try to add back f
. It is possible that after we add back f
, the colors of all four neighbors of f
allow that f
can be colored too. THis is called optimistic coloring. In this example however, optimistic coloring will fail.
If optimistic coloring fails, we spill f
:
f
fa
.f
, insert:
f := load fa
f
, insert:
store f, fa
f
to separate variables, say f1
, f2
, ...It is possible that even after spilling the RIG is not colorable, in which case we should spill another variable (that has not already been spilled).
Eventually we should get a colorable graph, because we are making the RIG much sparser by by significantly reducing the live range of the
variables. For example, the new variables f1
, ... are only live at the instruction using them and the one preceding/succeeding them (for load/store resp.).
Does it help if the IR is in SSA form? It does --- because different assignments to the same temporary now become different temporaries, the register allocator can now take different allocation decisions for these different temporaries.
t1
is used heavily and another where t1
is used sparingly, then it would make sense to register-allocate t1
in the first region and spill it in the second region. However the current algorithm makes an all-or-nothing choice.
Region-based register allocator: partition the program into regions, assign temporaries to registers or memory within each region, and resolve discrepancies at region boundaries. The overall quality of the solution depends heavily on the region partitioning. Some example heuristics:
Even after doing this, there are issues relating to some opcodes requiring only register operands; further other opcodes may require certain specific registers. To deal with these problems, compilers add extra passes (like the "reloading" pass in GCC's register allocator) to resolve these issues (by adding instructions to spill/copy registers). These problems arise due to this layered pass-by-pass nature of our analysis and transformation.
An alternative technique is to do instruction-selection (opcode selection) simultaneously with register-allocation. More expensive in general but can use heuristics to prune the search space while still getting most of the benefits. Some previous work on binary translation demonstrates one such algorithm based on dynamic-programming --- the performance results in this work clearly bring out the weaknesses of the pass-by-pass register allocation approach in compilers like GCC.