[Computer Architecture] Processor Performance and Benchmarks

Processor Performance

* VAX-11: Reference Design, a real design we use to compare with real processors
* Logarithmic Y-axis means exponential enhancement until 2011. Now it is flattening

Pareto optimality

Pareto optimal point: points that represent the state of allocation of resources from which it is impossible to reallocate so as to make any one individual or preference criterion better off without making at least one individual or preference criterion worse off.
Pareto frontier: line that represents all the Pareto optimal points in the design space. Where above this, you would get diminishing return.

* Regarding Power vs Area, area used to be the cost, but due to dark silicon phenomena and other advances in area, power is the primary concern. You would trade area for power efficiency!
* Designing computer architecture, deciding on which technology to use is about Pareto optimality analysis (cost-benefit analysis, trade-off)

No Free Lunch!

What is performance?

  • How much it takes to run a benchmark.
  • Instruction per unit time (MIPS, FLOPS) is a proxy definition of performance

Benchmark

  • Single program cannot represent workload everybody uses. So we use suite of benchmarks.
  • Early people used synthetic programs, random generation of instructions to run on the processor. In reality there is no randomness, so it does not capture the behavior of the real programs.
  • It is important to capture commonality among programs to be able to design general purpose CPUs.
  • SPEC: real benchmarks from real world. ex) gcc
    • FP: floating point, including C/C++ and Fortran programs
    • INT: int, including C/C++ programs
    • Running these would give you SPEC Mark

* Moore’s law is about cost for performance, economics of manufacturing processors. Price of transistor is not going down anymore, power limits us from improving performance and we are just getting transistors that we cannot switch on.

Dataset

Benchmark alone is not enough. Without clear dataset, benchmarks have no value. ex) number of gcc file you compile with gcc changes the time it takes.

So, time, benchmark, data are three elements that define performance

Time

Wall clock time: start to the end, including the time it takes to load.

How do we measure performance?

There are many vague notions, but are not real metrics. ex) Core count, RAM, Storage type, etc…

  • CT =1/f
    • circuit and microarchitecture
  • Clock Per Instruction (CPI) = 1/IPC
    • microarchitecture and architecture
    • We don’t change clock frequency nor program when we measure gains in IPC or CPI
  • Instruction Count (IC)
    • program

Amdahl’s Law

Overall speedup that you get given any method

Geometric mean

Even the unweighted arithmetic mean implies a weighting, whereas ratios of geometric means never change (regardless of which machine is used as the baseline).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.