[Computer Architecture] Instruction Level Parallelism

Parallelism

Thread: running program on a piece of hardware (context)

Single thread program: we can exploit ILP on SISD (ex. Single core)
Multi thread program: we can exploit TLP on MIMD (ex. CMP)
Many threaded program: we can exploit DLP on SIMD (ex. GPU)

What limits infinite simultaneous multithreading?
– Data hazard
– Control hazard
– Memory bandwidth
– Memory latency

Instruction Level Parallelism

Instruction Level Parallelism is the characteristic of a program that certain instructions are independent, and can potentially be executed in parallel.

Compiler optimization (SW)

Instruction scheduling: changes ILP within a basic block
Loop unrolling: allows ILP across iterations by putting instructions from multiple iterations in the same basic block
Trace scheduling: allows ILP across multiple basic blocks by checking their paths
Software pipelining: interleaving of multiple iterations of loop to form a kernel that better utilizes hardware resources

Example

Dynamic scheduling

Issue in-order, Execute and Write back out-of-order

Scoreboarding

Instruction storage is added to each functional execution units, but control and buffer are centralized as scoreboard and register file
Both source registers are read together (WAR/WAW limited)

Tomasulo

  1. Issue (get instruction from FP Instruction Queue): If reservation station is free, the Instruction Queue issues instruction and sends operands (renames registers)
  2. Execution (operate on operands): When both operands are ready, then execute; if not ready, watch Common Data Bus (CDB) for result.
  3. Write result (finish execution): Write on CDB to all waiting units; mark reservation station available.

Instruction storage is added to each functional execution units, control and buffers are distributed as reservation stations.
Each source registers are read as soon as available (only RAW matters)

Tomasulo + ROB

  1. Issue (get instruction from FP Instruction Queue): If reservation station is free, the Instruction Queue issues instruction and sends operands (renames registers)
  2. Execution (operate on operands): When both operands are ready, then execute; if not ready, watch Common Data Bus (CDB) for result.
  3. Write result (finish execution): Write on CDB to all waiting units; mark reservation station available.
  4. Commit (update register with reorder result): When instruction is at the head of the reorder buffer and receives the result, update register with result (or store to memory) and remove instruction from reorder buffer.

ROB (in-order commit) provides speculative execution, provides precise exceptions in an out-of-order machine, provides larger window of instructions across branch boundaries.

Instruction Queue

It uses explicit register renaming. Registers are not read until instruction dispatches (begin execution), and register renaming ensures no conflicts.

References

http://courses.csail.mit.edu/6.888/spring13/lectures/L2-multicore.pdf

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.