A common framework for expressing algorithms that compute information about a program.
Control Flow Graph
Each statement becomes a node
Edges between nodes represent control flow
Dataflow Analysis
Worklist Algorithm
- Initialize all d to empty set…
- add nodes to worklist…
- while worklist is not empty
- remove a node n
- info_in will be map of sets of incoming information
- info_out will be map of sets of outgoing information computed through F
- for each outgoing edge
if info_out is different from what has been previously calculated, update with info_out and add all destinations of the outgoing edges…- let new_info be superset of previously calculated and info_out
- if new_info is different from what has been previously calculated, update with new_info and add all destinations of the outgoing edges…
Issue: (1) Ordering? Random order will take long due to redundancy, so use reverse depth-first post-order (2) Does this algorithm terminate? May not terminate oscillating in the original algorithm, so make supersets to make guarantees…
May vs Must

Forward vs Backward
Although constraints are not directional, flow functions are
Forward
– the constraints are of the form out = F(in)
Backward
– In some cases, the constraints are of the form in = F(out)
– e.g. live variables
Can formalize backward analysis in two ways
– Option 1: reverse flow graph, and then run forward problem
– Option 2: re-develop the theory, but in the backward direction
Precision
Pretty much any problem in compiler optimization is undecidable.
In order to find solutions for such problem, we approximate! Therefore, we may lose precision.
There are several factors that may cause imprecision:
– Unreacheable code
– Path sensitivity
. Branch correlations: some paths are infeasible
. Path merging: can lead to loss of precision
MOP: meet over all paths
We find all paths that lead to certain program point, do analysis on each path, and merge all information.
MOP is the “best” possible answer given a fixed set of flow functions
However, there are too many paths, so it is not computable.
MOP != dataflow
However, by imposing some restrictions on the flow functions, we could guarantee that dataflow is the same as MOP.
If F(a U b) == F(a) U F(b) (distributivity holds), then dataflow is the same as MOP.
Reaching Definitions
For each program point in the CFG, compute the set of definitions (statements) that may reach that point.
D = 2^A
Top = All, Bottom = Empty (May)
Join = U, Meet = ^
Constant Propagation
D = 2^A
Top = Empty, Bottom = All (MUST)
Join = ^, Meet = U
We could improve the lattice by reducing the infinite height lattice to simpler one
– D = {Top, Bottom} U Z
– Two possibilities
. Option 1: tuple of lattices
. Option 2: map from variables to single lattice
Common Subexpression Elimination
Want to compute when an expression is available in a var
S = {X–>E | X \in Var, E \in Expr}
D = 2^S
Top = Empty, Bottom = All (MUST)
Join = ^, Meet = U
Live Analysis
A variable is live at a program point if it will be used before being redefined
A variable is dead at a program point if it is redefined before being used
D = 2^{Vars}
Top = All, Bottom = Empty (MAY)
Join = U, Meet = ^