Optimization Overview

Most complexity in modern compilers is in the optimizer

Also by far, the largest phase in terms of compile-time and in terms of source code size (recall: lexing, parsing, semantic analysis, optimization, code generation)

When should we perform optimizations?

On AST?
- Pro: Machine independent
- Con: Too high level
  - Too abstract --- need to have more details to be able to express the "kind" of machine for which the AST needs to be compiled (e.g., register or stack machine or quantum computer)
On assembly language
- Pro: exposes optimization opportunities
- Con: may be too low level, making optimization difficult (need to undo/redo certain decisions)
- Con: machine dependent
- Con: must reimplement optimizations when retargetting
On IR
- Pro: can be machine independent, if designed well (can represent a large family of machines)
- Pro: can expose optimization opportunities, if designed well.
- Con: IR design critical for exposing optimization opportunities.

We will be looking at optimizations on an IR which has the following grammar:

P --> S P | S
S --> id := id op id
       | id := op id
       | id := id
       | push id
       | id := pop
       | if id relop id goto L
       | L:
       | jump L

id's are register names
constants can replace ids
typical operators: +, -, *

A basic bloack is a maximal sequence of instructions with

no labels (except at the first instruction), and
no jumps (except in the last instruction)

Idea:

Cannot jump into a basic block (except at beginning)
Cannot jump out of a basic block (except at end)
A basic block is a single-entry, single-exit, straight-line code segment

Once we reach the start of a basic block, we are guaranteed to execute all instructions in the BB. Furthermore, the only way into the basic block is through the first statement.

Consider the basic block:

1. L:
2.   t := 2*x
3.   w := t + x
4.   if w > 0 goto L'

(3) executes only after (2)

We can change (3) to w := 3 * x
Can we eliminate (2) as well? Need to be sure that t is not used in other basic blocks.

A control-flow graph is a directed graph with

Basic blocks as nodes
An edge from block A to block B if the execution can pass from the last instruction in A to the first instruction in B
- e.g., the last instruction in A is jump Lb
- e.g., the last instruction in A is if id relop id then goto Lb
- e.g., execution can fall-through from block A to block B

Example control-flow graph:

BB1:
  x := 1
  i := 1

BB1-->BB2

BB2:
L:
  x := x * x
  i := i + 1
  if i < 10 goto L

BB2 --> BB2

BB2 --> BB3

The body of a method (or procedure) can be represented as a control-flow graph. There is one initial node (entry node). All "return" nodes are terminal.

Optimization seeks to improve a program's resource utilization

Execution time (most often)
Code size
Memory usage
Network messages sent, disk accesses, etc.
Power consumption

Optimization should not alter what the program computes

The answer must still be the same.

For languages like C, there are typically three granularities of optimization

Local optimizations
- Apply to a basic block in isolation
Global optimizations
- Apply to a control-flow graph (method body) in isolation
Inter-procedural optimizations
- Apply across method boundaries

Production compilers do all these types of optimizations. In general, easies to implement local optimizations and hardest to implement inter-procedural optimizations.

In practice, often a conscious decision is made not to implement the fanciest optimization known. Why?

Some optimizations are hard to implement
Some optimizations are costly in compilation time
Some optimizations have low payoff. But hard to establish the payoff. Often one optimization may trigger another one, and this is hard to predict in advance.
Many fancy optimizations are all three!

Current state-of-the-art: the goal is "Maximum benefit for minimum cost"