A common framework for expressing algorithms that compute information about a program.

#### Control Flow Graph

Each statement becomes a node

Edges between nodes represent control flow

#### Dataflow Analysis

__Worklist Algorithm__

- Initialize all d to empty set…
- add nodes to worklist…
- while worklist is not empty
- remove a node n
- info_in will be map of sets of incoming information
- info_out will be map of sets of outgoing information computed through F
- for each outgoing edge
~~if info_out is different from what has been previously calculated, update with info_out and add all destinations of the outgoing edges…~~- let new_info be superset of previously calculated and info_out
- if new_info is different from what has been previously calculated, update with new_info and add all destinations of the outgoing edges…

Issue: (1) Ordering? Random order will take long due to redundancy, so use reverse depth-first post-order (2) Does this algorithm terminate? May not terminate oscillating in the original algorithm, so make supersets to make guarantees…

__May vs Must__

__Forward vs Backward__

Although constraints are not directional, flow functions are

Forward

– the constraints are of the form out = F(in)

Backward

– In some cases, the constraints are of the form in = F(out)

– e.g. live variables

Can formalize backward analysis in two ways

– Option 1: reverse flow graph, and then run forward problem

– Option 2: re-develop the theory, but in the backward direction

__Precision__

Pretty much any problem in compiler optimization is undecidable.

In order to find solutions for such problem, we approximate! Therefore, we may lose precision.

There are several factors that may cause imprecision:

– Unreacheable code

– Path sensitivity

. Branch correlations: some paths are infeasible

. Path merging: can lead to loss of precision

#### MOP: meet over all paths

We find all paths that lead to certain program point, do analysis on each path, and merge all information.

MOP is the “best” possible answer given a fixed set of flow functions

However, there are too many paths, so it is not computable.

MOP != dataflow

However, by imposing some restrictions on the flow functions, we could guarantee that dataflow is the same as MOP.

If F(a U b) == F(a) U F(b) (distributivity holds), then dataflow is the same as MOP.

#### Reaching Definitions

For each program point in the CFG, compute the set of definitions (statements) that may reach that point.

D = 2^A

Top = All, Bottom = Empty (May)

Join = U, Meet = ^

#### Constant Propagation

D = 2^A

Top = Empty, Bottom = All (MUST)

Join = ^, Meet = U

We could improve the lattice by reducing the infinite height lattice to simpler one

– D = {Top, Bottom} U Z

– Two possibilities

. Option 1: tuple of lattices

. Option 2: map from variables to single lattice

#### Common Subexpression Elimination

Want to compute when an expression is available in a var

S = {X–>E | X \in Var, E \in Expr}

D = 2^S

Top = Empty, Bottom = All (MUST)

Join = ^, Meet = U

#### Live Analysis

A variable is live at a program point if it will be used before being redefined

A variable is dead at a program point if it is redefined before being used

D = 2^{Vars}

Top = All, Bottom = Empty (MAY)

Join = U, Meet = ^