Insert “No-op”s. This is the simplest method, but leads to underutilization
Hoist invariant expressions that come after branch to inside the if-else statement (on the premise that if-else statement and the hoisted expressions do not conflict with each other). The compiler inserts these expressions inside if-else statement (copied to both –> increased code size) so that instructions executed do not have to be flushed.
* The term “hoisting” seems to be more widely used in compilers to name some cases of Loop-invariant code motion (LICM)
Change control dependencies into data dependency (by using conditional instructions).
Static branch prediction
All Taken/Not Taken
T/NT for all branches.
We profile all the executed to choose between T and NT, but this is done at static time.
In LLVM there exists methods to help weight the branch to T or NT.
Dynamic branch prediction
Pattern History Table (PHT)
Table that uses a set of saturating counters (1-bit or 2-bit mostly) to record the past, and checks the table using PC as the hash key to get prediction when branch instruction hits instruction fetch stage.
* 2-bit adds transient/hysteresis states that removes some anomalies.
* different branches can contaminate (aliasing) the table
Branch History Table (BHT)
Saves local branch history, where the key would be the PC and the local branch history for each entry will be shifted in to be used as a key for PHT.
Global History Register (GHR)
This register saves all branch history. All branch history will be shifted in to be used as a key for PHTs or to be combined with PC in G-share.
Local history table here is a BHT of 1024 entries with 10 bits per entry. All prediction are PHT with 3 or 2 bits per entry. Path history is a GHR.
Combine PC and GHR using XOR to better capture both local and global behavior.
Add more non-linearity to prediction using a perceptron.
Constructive: when branch contaminations occur, reinforces the prediction of branches that share the same entry
Destructive: when branch contaminations occur, the branches disrupt the predictions of other branches occupying the same entry
Branch Target Buffer (BTB)
BTB is used to reduce the branch penalty by identifying a branch and its predicted target in the IF stage.
BTB is a cache where the full PC is compared to get the prediction for the branch target. It also has a bit for preliminary branch prediction (T or NT), because branch prediction could happen over multiple cycles in modern processors and this bit can be an initial guess before the real prediction.