[Computer Architecture] Branch Prediction

No Speculation

“Bubble” insertion

Insert “No-op”s. This is the simplest method, but leads to underutilization

Hoisting

Hoist invariant expressions that come after branch to inside the if-else statement (on the premise that if-else statement and the hoisted expressions do not conflict with each other). The compiler inserts these expressions inside if-else statement (copied to both –> increased code size) so that instructions executed do not have to be flushed.

* The term “hoisting” seems to be more widely used in compilers to name some cases of Loop-invariant code motion (LICM)

Predication

Change control dependencies into data dependency (by using conditional instructions).

Speculation

Static branch prediction

All Taken/Not Taken

T/NT for all branches.

Profile based

We profile all the executed to choose between T and NT, but this is done at static time.

In LLVM there exists methods to help weight the branch to T or NT.

Dynamic branch prediction

Pattern History Table (PHT)

Table that uses a set of saturating counters (1-bit or 2-bit mostly) to record the past, and checks the table using PC as the hash key to get prediction when branch instruction hits instruction fetch stage.

* 2-bit adds transient/hysteresis states that removes some anomalies.
* different branches can contaminate (aliasing) the table

Branch History Table (BHT)

Saves local branch history, where the key would be the PC and the local branch history for each entry will be shifted in to be used as a key for PHT.

Global History Register (GHR)

This register saves all branch history. All branch history will be shifted in to be used as a key for PHTs or to be combined with PC in G-share.

Tournament Predictor

Local history table here is a BHT of 1024 entries with 10 bits per entry. All prediction are PHT with 3 or 2 bits per entry. Path history is a GHR.

Gshare

Combine PC and GHR using XOR to better capture both local and global behavior.

Perceptron

Add more non-linearity to prediction using a perceptron.

https://www.cs.utexas.edu/~lin/papers/hpca01.pdf

Aliasing

Constructive: when branch contaminations occur, reinforces the prediction of branches that share the same entry
Destructive: when branch contaminations occur, the branches disrupt the predictions of other branches occupying the same entry

Branch Target Buffer (BTB)

BTB is used to reduce the branch penalty by identifying a branch and its predicted target in the IF stage.

BTB is a cache where the full PC is compared to get the prediction for the branch target. It also has a bit for preliminary branch prediction (T or NT), because branch prediction could happen over multiple cycles in modern processors and this bit can be an initial guess before the real prediction.

Reference

https://www.cs.cmu.edu/afs/cs/academic/class/15740-s17/www/lectures/L19-BranchPrediction.pdf
http://web.engr.oregonstate.edu/~benl/Projects/branch_pred/
https://courses.cs.washington.edu/courses/csep548/06au/lectures/branchPred.pdf
http://www-ee.eng.hawaii.edu/~tep/EE461/Notes/ILP/buffer.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.