4th September 2018 at 6:34am

Evolution of the General Purpose CPU:

Overview

  • Unpipelined
  • In-Order Pipeline
    • Classic RISC Pipeline
      • Address Stage
      • Pentium Pipeline
      • MIPS 8-stage Pipeline
      • UltraSPARC-III Pipeline
    • Superscalar
  • Out-Of-Order

In-Order Extensions/Improvements

Out-Of-Order Optimizations

  • Branch Target Buffer
  • Return Stack
  • Aliasing
  • Tomasulo's Algorithm
  • Scoreboarding
  • Most OOE cores don't have a flags register. Why?

Spectre Attacks

  • In-order core attacks? Possible mitigations?
  • Explicit Address Space regions for accessing cache (L1, L2, etc).
    • Is this what is meant by making cache explicit?
      • "Architecture researchers: having caches be implicit was a decision; it doesn't have to be that way. Perhaps high-speed memories can be exposed in the address space with fixed timings at each level of the hierarchy. (1/4)"
    • Why is constant-time code required to be written like that? And even w/ explicit cache, won't entries need to be evicted still and thus leaked?
      • "With constant-time code we're basically working hard to nullify implicit caches. If things were explicit, we wouldn't have to read a whole table every time to get a single value etc. (2/4)"
  • DRAM Controller leaks?
    • "Ultimately some structures (e.g. DRAM controllers) are shared and misspeculation side-effects there might be measureable."
  • Constant Time Speculation
    • "If we have to make speculation appear to run in constant time, can it still gain perf?"
      • I don't see how...
  • I really hope the tweet I linked is wrong

General Design Decisions

  • Explicit PC?
    • Where does PC point?
  • Stack direction?
  • Branch/Load Delay Slots?
  • Flags Register?
  • Number of Registers?
  • Zero Register?
  • Caches?
    • Eviction Policy?
    • Write-through/Write-back?
    • Levels of Cache
      • Can locking the bus for atomics even work w/ write-back cache each w/ separate L1?
        • Bus signals are propagated I don't think. Of course it's slow anyway.
  • Global Bus Lock Signal?
    • Can Global Lock, keeping cache caveat in mind, be an alternative to MESI, etc? Has it been deployed?
  • Does Fault Restart Instruction or Continue in the Middle?
  • Explicit Stack Pointer?
  • One Reg, Two Reg, Three Reg Assembly?
  • User/Supervisor Modes? More modes?
  • Special Address Space Allocations?
    • MIPS kseg*
    • Cell SPEs
    • Cached/Uncached Regions
      • Possibly user-programmable?
    • Zero/Direct Page
  • Shadow Registers
    • Register Windows
  • Weak/Strong Memory Model?
  • TAS, CAS, LL/SC?

MMU Design Decisions

  • Hardware/Software Walk?
  • Any regions hardcoded non-mapped?
    • Again, MIPS kseg*
  • Shared/non-shared VA?
  • Extra bits (permission) beyond mapping stored in TLB?
  • Pinnable TLB entries?
  • Hardcoded page-table format (even for software walk)?
  • Why is copy_from_user needed if the kernel (typically?) maps user code into its addr space?
  • Special instructions to interact with TLB?
  • Special instructions to move data between kernel and user space?

Benchmarking