Program Representations
Goals
– Make analysis easy and effective
– Make transformation easy to perform
– General across input languages and target machines
– Compact in memory
– Easy to translate and form
– Tracks information from source through to binary, from source-level debugging, profiling, typed binaries
– Extensible and displayable
Components
– Control dependencies: sequencing of operations; evaluation of if & then, side-effects of statements occur in right order (JTS comes before JMP in SRP)
– Data dependencies: fl0ow of definitions from defs to uses; operands computed before operations
– Only ones that are pertinent should be represented
High-level vs Low-level
High-level syntax based
Analysis can exploit high-level knowledge of constructs
Easy to map to source code (debugging, profiling)
e.g. Abstract Syntax Tree (AST)
Low-level syntax based
Can do low-level, machine specific reasoning
Can be language-independent
e.g. Assembly code, Virtual machine code, LLVM IR, three-address code, RTL
Components
Data dependencies
Def/Use Chains… MAYBE?
– Ignores control flow: misses some optimization opportunities since conservatively considers all paths, not executable by itself and is not appropriate for code motion transformations
– Must update after each transformation
– Space consuming
Static Single Assignment (SSA)… HELL YEAH!
– To SSA : Create a new variable for each def, and insert PHI pseudo-assignments at merge points
– From SSA: Insert assignments at the end of pertinent predecessors, which can be removed later if register allocator assigns them to the same registers
– Invariant since each use of a variable has only one def…
– Captures more chances for optimization…
– Pointers complicate things
. don’t use SSA for pointed to variables
. adapt SSA to account for pointers (LLVM)
. define src language so that variables cannot be pointed to (Java)
Control dependencies
A node (basic block) Y is control-dependent on another x if X determines whether Y executes
– X is not post-dominated by Y (there exists a path from X to Y such that every node in the path order than X and Y is post-dominated by Y)
Control Dependence Graph… EMM
– Y becomes descendent of X if Y is control dependent on X
– Cannot trash CFG after making control dependence graph, because data dependence information is not present in control dependence graph
Program Dependence Graph… EMM
– Super-impose dataflow graph (in SSA form or not) on top of the control dependence graph
Control Dependence Graph & Program Dependence Graph
– More flexible way of representing control dependencies than CFG (less constraining), even makes code motion a local transformation
– However, much harder to convert back to an executable form, so not used much
Examples
Common subexpression Elimination


Loop-invariant Code Motion (LICM)
Step 1: detecting loop invariants



Step 2: move them outside loop (hoisting / sinking)
Need to preserve relative order of invariant computations to preserve dataflow among move statements
– To move statement S to loop pre-header, S must dominate all loop exits
– Domination restrictions can be circumvented through loop normalization

In Static Single Assignment, such restriction is unnecessary… HELL YEAH!