Evolution of the General Purpose CPU:
Overview
- Unpipelined
- In-Order Pipeline
- Classic RISC Pipeline
- Address Stage
- Pentium Pipeline
- MIPS 8-stage Pipeline
- UltraSPARC-III Pipeline
- Superscalar
- Out-Of-Order
In-Order Extensions/Improvements
Out-Of-Order Optimizations
- Branch Target Buffer
- Return Stack
- Aliasing
- Tomasulo's Algorithm
- Scoreboarding
- Most OOE cores don't have a flags register. Why?
Spectre Attacks
- In-order core attacks? Possible mitigations?
- Explicit Address Space regions for accessing cache (L1, L2, etc).
- Is this what is meant by making cache explicit?
- "Architecture researchers: having caches be implicit was a decision; it doesn't have to be that way. Perhaps high-speed memories can be exposed in the address space with fixed timings at each level of the hierarchy. (1/4)"
- Why is constant-time code required to be written like that? And even w/ explicit cache, won't entries need to be evicted still and thus leaked?
- "With constant-time code we're basically working hard to nullify implicit caches. If things were explicit, we wouldn't have to read a whole table every time to get a single value etc. (2/4)"
- DRAM Controller leaks?
- "Ultimately some structures (e.g. DRAM controllers) are shared and misspeculation side-effects there might be measureable."
- Constant Time Speculation
- "If we have to make speculation appear to run in constant time, can it still gain perf?"
- I really hope the tweet I linked is wrong
General Design Decisions
- Explicit PC?
- Stack direction?
- Branch/Load Delay Slots?
- Flags Register?
- Number of Registers?
- Zero Register?
- Caches?
- Eviction Policy?
- Write-through/Write-back?
- Levels of Cache
- Can locking the bus for atomics even work w/ write-back cache each w/ separate L1?
- Bus signals are propagated I don't think. Of course it's slow anyway.
- Global Bus Lock Signal?
- Can Global Lock, keeping cache caveat in mind, be an alternative to MESI, etc? Has it been deployed?
- Does Fault Restart Instruction or Continue in the Middle?
- Explicit Stack Pointer?
- One Reg, Two Reg, Three Reg Assembly?
- User/Supervisor Modes? More modes?
- Special Address Space Allocations?
- MIPS
kseg*
- Cell SPEs
- Cached/Uncached Regions
- Possibly user-programmable?
- Zero/Direct Page
- Shadow Registers
- Weak/Strong Memory Model?
- TAS, CAS, LL/SC?
MMU Design Decisions
- Hardware/Software Walk?
- Any regions hardcoded non-mapped?
- Shared/non-shared VA?
- Extra bits (permission) beyond mapping stored in TLB?
- Pinnable TLB entries?
- Hardcoded page-table format (even for software walk)?
- Why is
copy_from_user
needed if the kernel (typically?) maps user code into its addr space? - Special instructions to interact with TLB?
- Special instructions to move data between kernel and user space?
Benchmarking