Domain-Specific CPU Architectures

Most speedup comes from parallelism enabled by specialization - the main source of efficiency. Underlying algorithms are volatile - trading hardware-friendly computation for reduced memory bandwidth. Accelerator design is really parallel programming with a cost model forcing function. Math is free. Global memory is expensive. Memory dominates area & power of domain-specific processors. Specialized instructions leverage the advantage of a DSA at a fraction of the development cost. DSAs are of the few ways to continue scaling & efficiency.