Processor Performance

* VAX-11: Reference Design, a real design we use to compare with real processors
* Logarithmic Y-axis means exponential enhancement until 2011. Now it is flattening
Pareto optimality
Pareto optimal point: points that represent the state of allocation of resources from which it is impossible to reallocate so as to make any one individual or preference criterion better off without making at least one individual or preference criterion worse off.
Pareto frontier: line that represents all the Pareto optimal points in the design space. Where above this, you would get diminishing return.
* Regarding Power vs Area, area used to be the cost, but due to dark silicon phenomena and other advances in area, power is the primary concern. You would trade area for power efficiency!
* Designing computer architecture, deciding on which technology to use is about Pareto optimality analysis (cost-benefit analysis, trade-off)
No Free Lunch!
What is performance?
- How much it takes to run a benchmark.
- Instruction per unit time (MIPS, FLOPS) is a proxy definition of performance
Benchmark
- Single program cannot represent workload everybody uses. So we use suite of benchmarks.
- Early people used synthetic programs, random generation of instructions to run on the processor. In reality there is no randomness, so it does not capture the behavior of the real programs.
- It is important to capture commonality among programs to be able to design general purpose CPUs.
- SPEC: real benchmarks from real world. ex) gcc
- FP: floating point, including C/C++ and Fortran programs
- INT: int, including C/C++ programs
- Running these would give you SPEC Mark
* Moore’s law is about cost for performance, economics of manufacturing processors. Price of transistor is not going down anymore, power limits us from improving performance and we are just getting transistors that we cannot switch on.
Dataset
Benchmark alone is not enough. Without clear dataset, benchmarks have no value. ex) number of gcc file you compile with gcc changes the time it takes.
So, time, benchmark, data are three elements that define performance
Time
Wall clock time: start to the end, including the time it takes to load.
How do we measure performance?
There are many vague notions, but are not real metrics. ex) Core count, RAM, Storage type, etc…

- CT =1/f
- circuit and microarchitecture
- Clock Per Instruction (CPI) = 1/IPC
- microarchitecture and architecture
- We don’t change clock frequency nor program when we measure gains in IPC or CPI
- Instruction Count (IC)
- program
Amdahl’s Law
Overall speedup that you get given any method

Geometric mean
Even the unweighted arithmetic mean implies a weighting, whereas ratios of geometric means never change (regardless of which machine is used as the baseline).