At its Next Horizon event today, AMD unveiled substantial new details about its 7nm server CPU, codenamed Rome. The new chip debuts at a significant time for AMD. Its first server CPU, Epyc, has won accolades and adoption, including a new announcement of support from Amazon. Epyc 2 is, in some ways, even more important. Businesses are intrinsically conservative and companies don’t tend to leap from CPU vendor to CPU vendor at the drop of a hat. Pushing AMD CPUs into more businesses means demonstrating a sustained roadmap and ability to deliver new product generations that continue to compete effectively. AMD’s Rome disclosures imply that their first 7nm chip will indeed deliver on these gains. (Apologies for potato photos — I am an acknowledged miserable photographer.)
According to AMD’s CTO, Mark Papermaster, second-generation Epyc CPUs will feature a number of significant improvements over its original design. Floating point throughput has been doubled, thanks to the adoption of 256-bit AVX2 registers. Load/Store bandwidth has been doubled as well, and the CPU’s dispatch and retire bandwidth have both increased, as has the micro-op cache. Epyc
These improvements should collectively boost Epyc and Ryzen performance substantially, though AMD did not state if it would reduce the clock speed of 7nm Ryzen and Epyc CPUs when running AVX2 in the same way Intel does. 128-bit AVX2 support actually worked quite well for AMD in Ryzen — server tests and comparisons showed that while Intel had a definite advantage in some FPU workloads, AMD was quite strong, or even performance-leading in others.
As for PCIe 4.0 support, AMD will offer backwards compatibility with existing Naples platforms and future compatibility with AMD’s Milan platform, guaranteed. That means the CPU can use either PCIe 3.0 or 4.0 depending on the platform in question.
Infinity Fabric is also getting a major update, though some details weren’t disclosed. As some have predicted, Epyc 2 will be AMD’s first CPU to deploy chiplets based on 7nm while the I/O block is built on 14nm. This isn’t necessarily a bad thing. As node shrinks have progressed, contact and interconnect resistance has become a major limiting factor to improving overall performance. There’s not necessarily much intrinsic benefit to simply packing more wires and pads into smaller and smaller spaces — and so, AMD is splitting its I/O and chiplets into two separate sections.
AMD’s current Infinity Fabric implementation is wired together as below (focus on the lighter-colored arrows within each CPU, not the cross-CPU linkages).
The new second-generation Infinity Fabric looks rather different:
It’s not clear what impact this will have on latency, but it shows how AMD will avoid what could have been a significant problem. With eight DDR4 channels and presumably doubled chip density (AMD alluded to this without giving any formal core counts for Epyc 2), AMD would’ve had just one DDR4 channel per eight CPU cores. That’s substantially lower than previous designs. This approach should avoid that problem by making full DDR4 bandwidth available to whatever set of cores need to access it.
There’s still a number of specific details we don’t have, including information on how much Infinity Fabric power consumption has improved or how much bandwidth it provides in the new CPUs. I would caution readers against concluding that these load/store and FPU throughput improvements will have a huge impact on performance. The degree of uplift will depend on application specifics and where bottlenecks were in the original Epyc design. Haswell, if you recall, promised a number of substantial low-level bandwidth and throughput gains, but the actual uplift in most software was far smaller.
Still, Epyc 2 looks like a potent chip based on what we’ve seen of it thus far. We don’t know exact clocks or core distribution, beyond a maximum 64-core CPU (a comment from stage seemed to imply that clock scaling on 7nm is small, but we haven’t been able to confirm that yet). But between the IPC gains and the expected core count increases, Epyc 2 should deliver significant uplift compared with its predecessor.
According to Lisa Su, Rome will offer a 2x performance improvement per socket and a 4x improvement in FPU performance per socket based on previous generation CPUs. That’s a huge claimed improvement and we expect it reflects best-case scenarios — obviously applications that don’t scale perfectly from 32 cores to 64 cores won’t hit that target — but in the right circumstances, Epyc 2 should be a performance titan.
The one caveat to this? AMD gave absolutely no guidance on when the CPU might launch, beyond “2019.” No 1H, no 2H. This last piece of information, delivered at the tail end of the presentation, makes it much harder to judge the potential impact of the launch. If 1H is typically read to mean “June,” “2019,” can be read to mean “December” under precisely the same theory. It seems unlikely that this would be true, but the lack of even a quarterly time frame sapped the energy from AMD’s announcement overall.
Now Read: Nvidia Tesla, AMD Epyc to Power New Berkeley Supercomputer, Epyc Achievement: AMD Now Available for Oracle Cloud Compute Instances, and AMD Will Fab Its 7nm ‘Rome’ Epyc CPUs at TSMC, Not GlobalFoundries