[Compilers] Dataflow Analysis

A common framework for expressing algorithms that compute information about a program.

Control Flow Graph

Each statement becomes a node
Edges between nodes represent control flow

Dataflow Analysis

Worklist Algorithm

  • Initialize all d to empty set…
  • add nodes to worklist…
  • while worklist is not empty
    • remove a node n
    • info_in will be map of sets of incoming information
    • info_out will be map of sets of outgoing information computed through F
    • for each outgoing edge
      • if info_out is different from what has been previously calculated, update with info_out and add all destinations of the outgoing edges…
      • let new_info be superset of previously calculated and info_out
      • if new_info is different from what has been previously calculated, update with new_info and add all destinations of the outgoing edges…

Issue: (1) Ordering? Random order will take long due to redundancy, so use reverse depth-first post-order (2) Does this algorithm terminate? May not terminate oscillating in the original algorithm, so make supersets to make guarantees…

May vs Must

Forward vs Backward

Although constraints are not directional, flow functions are

Forward
– the constraints are of the form out = F(in)

Backward
– In some cases, the constraints are of the form in = F(out)
– e.g. live variables

Can formalize backward analysis in two ways
– Option 1: reverse flow graph, and then run forward problem
– Option 2: re-develop the theory, but in the backward direction

Precision

Pretty much any problem in compiler optimization is undecidable.
In order to find solutions for such problem, we approximate! Therefore, we may lose precision.

There are several factors that may cause imprecision:
– Unreacheable code
– Path sensitivity
. Branch correlations: some paths are infeasible
. Path merging: can lead to loss of precision

MOP: meet over all paths

We find all paths that lead to certain program point, do analysis on each path, and merge all information.

MOP is the “best” possible answer given a fixed set of flow functions
However, there are too many paths, so it is not computable.
MOP != dataflow

However, by imposing some restrictions on the flow functions, we could guarantee that dataflow is the same as MOP.
If F(a U b) == F(a) U F(b) (distributivity holds), then dataflow is the same as MOP.

Reaching Definitions

For each program point in the CFG, compute the set of definitions (statements) that may reach that point.

D = 2^A
Top = All, Bottom = Empty (May)
Join = U, Meet = ^

Constant Propagation

D = 2^A
Top = Empty, Bottom = All (MUST)
Join = ^, Meet = U

We could improve the lattice by reducing the infinite height lattice to simpler one
– D = {Top, Bottom} U Z
– Two possibilities
. Option 1: tuple of lattices
. Option 2: map from variables to single lattice

Common Subexpression Elimination

Want to compute when an expression is available in a var

S = {X–>E | X \in Var, E \in Expr}
D = 2^S
Top = Empty, Bottom = All (MUST)
Join = ^, Meet = U

Live Analysis

A variable is live at a program point if it will be used before being redefined
A variable is dead at a program point if it is redefined before being used

D = 2^{Vars}
Top = All, Bottom = Empty (MAY)
Join = U, Meet = ^

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.