Parallelism
Thread: running program on a piece of hardware (context)
Single thread program: we can exploit ILP on SISD (ex. Single core)
Multi thread program: we can exploit TLP on MIMD (ex. CMP)
Many threaded program: we can exploit DLP on SIMD (ex. GPU)
What limits infinite simultaneous multithreading?
– Data hazard
– Control hazard
– Memory bandwidth
– Memory latency
Instruction Level Parallelism
Instruction Level Parallelism is the characteristic of a program that certain instructions are independent, and can potentially be executed in parallel.
Compiler optimization (SW)
Instruction scheduling: changes ILP within a basic block
Loop unrolling: allows ILP across iterations by putting instructions from multiple iterations in the same basic block
Trace scheduling: allows ILP across multiple basic blocks by checking their paths
Software pipelining: interleaving of multiple iterations of loop to form a kernel that better utilizes hardware resources
Example



Dynamic scheduling
Issue in-order, Execute and Write back out-of-order
Scoreboarding
Instruction storage is added to each functional execution units, but control and buffer are centralized as scoreboard and register file
Both source registers are read together (WAR/WAW limited)
Tomasulo

- Issue (get instruction from FP Instruction Queue): If reservation station is free, the Instruction Queue issues instruction and sends operands (renames registers)
- Execution (operate on operands): When both operands are ready, then execute; if not ready, watch Common Data Bus (CDB) for result.
- Write result (finish execution): Write on CDB to all waiting units; mark reservation station available.
Instruction storage is added to each functional execution units, control and buffers are distributed as reservation stations.
Each source registers are read as soon as available (only RAW matters)
Tomasulo + ROB

- Issue (get instruction from FP Instruction Queue): If reservation station is free, the Instruction Queue issues instruction and sends operands (renames registers)
- Execution (operate on operands): When both operands are ready, then execute; if not ready, watch Common Data Bus (CDB) for result.
- Write result (finish execution): Write on CDB to all waiting units; mark reservation station available.
- Commit (update register with reorder result): When instruction is at the head of the reorder buffer and receives the result, update register with result (or store to memory) and remove instruction from reorder buffer.
ROB (in-order commit) provides speculative execution, provides precise exceptions in an out-of-order machine, provides larger window of instructions across branch boundaries.
Instruction Queue
It uses explicit register renaming. Registers are not read until instruction dispatches (begin execution), and register renaming ensures no conflicts.
References
http://courses.csail.mit.edu/6.888/spring13/lectures/L2-multicore.pdf