Will AMD Get Back Into Arm Server Chips? – The Next Platform

There was a bit of a kerfuffle this week when it looked like AMD was changing its position a little bit on whether or not it would get back into designing and selling server chips based on the Arm architecture.
The funny thing is that the world has changed around AMD, and a lot of it for the good, as its Epyc X86 server processors have attained around ten percent market share (by shipments), but its position on Arm servers has not really changed. AMD’s top brass has said all along that if customers wanted Arm chips, it would make them.
Well, maybe AMD’s position, internally, has changed just a little now that the server market has changed so much since the summer of 2015 when AMD stopped its “SkyBridge” X86-Arm shared socket effort, launched in 2014 with much fanfare. Prior to that announcement, AMD’s Opteron Arm server chips based on its homegrown “K12” core were expected in 2016 and were pushed out to 2017, and they were SoCs that combined Arm cores and GPUs, much as the “Seattle” Opteron A110 was. Seattle fizzled because it was too underpowered to do anything useful in the datacenter. So AMD started shutting things down, killing off SeaMicro microservers and its interconnect, killing off Arm server chips, so it could just hunker down on the Epyc server CPU. Because, frankly, the Arm server chip demand never materialized.
Maybe it never will in the ways we thought it might. But as we pointed out as this crazy year began, hope springs eternal for Arm server chips.
For good reason, in this case. Amazon Web Services is making and using its own Graviton Arm server chips and Ampere Computing is getting traction with its Altra line at Microsoft, Oracle, Tencent, and Baidu. That is five of the Super 8 that we know about, and those eight hyperscalers and cloud builders comprise around 50 percent of shipments and 40 percent of revenues in the server racket. And maybe AMD is having some thought about that Neoverse roadmap from Arm Holdings, which looks pretty impressive if anyone will actually implement it. Nvidia is in the game with its “Grace” Arm CPU, aimed at HPC and AI workloads and is actually trying to buy Arm Holdings for $40 billion, and the HiSilicon arm of Huawei Technologies is certainly designing and using its Kunpeng line of chips, too. And don’t forget Fujitsu with its homegrown A64FX supercomputer chip, in which it helped create the Scalable Vector Engine (SVE) vector math engines for HPC and AI workloads, and the also don’t forget the “Rhea” and “Chronos” Arm server chips being designed by SiPearl for European exascale supercomputers.
We were not able to attend the Deutsche Bank Technology Conference last week, where AMD Chief Financial Officer Devinder Kumar spoke, but we did catch the transcription that appeared on SeekingAlpha and thanks to our good buddy Paul Alcorn of Tom’s Hardware for seeing this and starting all the tongues wagging on Wall Street. And ours, too. It is an opportunity for us to re-examine the competitive landscape and simulate what AMD might do.
Kumar was asked about the competitive landscape in the datacenter, with Intel trying to rejuvenate itself and at least one of the hyperscalers and cloud builders being vertically integrated and designing its own Arm server chips. (As we showed above, the competitive landscape is a hell of a lot more hilly than this question implies.) Specifically he was asked to put on his CFO hat and look at it and ponder if datacenter market was big enough and growing fast enough that it didn’t matter if others came into the glass house. Kumar conceded right off the bat that AMD Chief Executive Officer Lisa Su and Chief Technology Officer Mark Papermaster were better people to answer that question, but he was a good sport all the same and gave an answer:
“I’ll tell you from my standpoint, when you look at compute solutions, whether it’s X86 or Arm or even other areas, that is an area for our focus and investment for us,” Kumar explained. “You read about the Tau VM announcement, that Google selected AMD’s Epyc product, and we feel good about that because in the end, we want to deliver high-performance compute solutions to our customers. And it’s really the solutions that are important. We know compute really well, even Arm as you referenced. We have a very good relationship with Arm, and we understand that some of our customers want to work with us with that particular product to deliver the solutions. We stand ready to go ahead and do that even though it’s not X86, although we believe X86 is our dominant strength in that area. But there are other areas where we are willing to partner with customers of ours to go ahead and deliver those compute solutions. Because that’s what we believe, from my standpoint at the CFO level: that it’s all about delivering the solutions, what the customers want from a compute standpoint.”
That doesn’t sound like much of an Arm server CPU roadmap to us, and seems even thinner than what Marvell is saying about its very interesting “Triton” ThunderX3 Arm server chip, which we profiled a little more than a year ago in anticipation of its launch and which Marvell first moved from a standard product to a custom one and then shut down in late 2020 with barely a whimper. I mean, if you want Marvell to grab the VHDL and spin those out for you, it will do it. So technically the ThunderX3 is not dead, but the server CPU chip team is gone and development on the ThunderX4 has stopped as far as we know. This is more of an Arm server chip than AMD has — but only by a little, if you want to be honest.

AMD has plenty of interesting options when it comes to Arm server chips. So let’s play out this thought experiment on a Friday afternoon in late summer when the clusters of the Concord, Niagara, and Reliance grapes in the garden are magical — just amazing, such wealth — and the creative juices are flowing here at The Next Platform.
First of all, as Cavium, which is now part of Marvell, proved so well and as AMD proved with its Seattle Arm chips, it is not all that hard to do a global replace and put Arm cores in the same place where an Octeon NPU core or an X86 CPU core is in a processor design. The uncore stuff can be recycled, and if you are an Arm licensee as AMD is, you can take some of the stuff in the guts of the core — branch predictors, caches and cache hierarchies, vector units, other accelerators, memory controllers, and peripheral controllers and reuse these elements in an Arm server chip.
So if a hyperscaler (not likely a public cloud since no one has a lot of Arm server workloads outside of the hyperscalers) doesn’t want to do an Arm server chip design all by its lonesome, they could hire AMD to do it. Or Marvell, for that matter. But to what end? Picking up the phone and calling Ampere Computing is a lot cheaper than plunking down a few tens of millions of dollars for development. If this is about intense hardware-software co-design, we get that and believe in it. But for a general-purpose CPU, the cost/benefit analysis doesn’t really work out. Google’s Tau instances show this, which is a special AMD “Milan” Epyc SKU that can deliver better price/performance than the Graviton2 chip at AWS for certain workloads.
There are some neat things in the Arm architecture, and sometimes they can yield better performance or lower power, but in general, the advantage is not the kind of 20 percent or 30 percent or 40 percent swing that is necessary to justify an architectural shift on server compute. The hassle of refactoring and recertifying your entire application stack for Arm is too much for the market at large and for many SaaS suppliers as well. But the Super 8 can afford to do anything, pretty much. Which is why we see them doing anything they please. The rest of the world does not have that luxury.
Ampere Computing is setting itself up to be able to cram more cores into a socket than even AMD can do, and this is going to turn into a competitive advantage for certain kinds of workloads. But customers looking at the 80-core “Quicksilver” Altra and 128-core “Mystique” Altra Max processors from Ampere Computing, as well as the “Siryn” kicker due in 2022 (with perhaps 192 cores, more bandwidth, and a homegrown A1 core) and the follow-on we are calling “Polaris” that will presumably turn all of the possible cranks in 2023 (possibly up to 256 cores), are seeing a roadmap with aggressive throughput performance and core counts. AMD could drop a streamlined X86 core into a new line of Epyc chips to take on Ampere Computing indirectly, or come up with its own Arm variants with similar core counts and take it on directly.
While that would be interesting, what we think would actually be far more interesting is for AMD to pull a sequel to the SkyBridge effort and create a common socket for X86 and Arm server chips that is completely neutral about what kind of chip is plugged in. This is a real value for those who want flexibility, and if it really wanted to hedge its bets, it could do server CPU designs that had a lot of the components between X86 and Arm — and maybe someday RISC-V — be as common as possible. AMD is the only Arm licensee other than Intel that also has an X86 license — AMD got its through winning lawsuits, and it seems unlikely that Intel would ever sell an X86 license for a reasonable price to anyone who might ask for it until it was too late for it to be worth much. (Should that day come.) So the SkyBridge II common socket idea might take off. And by the way, the SkyBridge I socket didn’t take off because the X86 and Arm server CPUs that AMD had aimed at them were not particularly interesting. If customers were telling AMD that SkyBridge was not all that interesting, we suspect this is why.
And we think all hell could break loose if there was a common socket that the entire industry — minus Intel, of course, who would never go for that — could get behind. Imagine if compute engines were absolutely interchangeable. Only users — and only the hyperscalers and the cloud builders — could compel this. It is a bit of a wonder why they have not. Four different socket SKUs per generation should cover most use cases, we think.
With that thought experiment done, we don’t think a universal server socket will ever come to pass. This is a control point no server CPU maker wants to let go of. But we can dream — of AMD Arm servers, and of common sockets — can’t we? All good fruits come from continuous and long labors.
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now
“Times were simpler not so long ago” is an understatement these days, but when it comes to supercomputing, this has yet another meaning. The early days of GPUs brought some challenges, but dedication from developers and Nvidia to make sure as many HPC codes were ported and CUDA-ready over the
Success in any endeavor is not just about having the right idea, but having that idea at the right time and then executing well against that plan. It is safe to say that in the traditional HPC simulation and modeling arena, the combination of the “Shasta” Cray EX supercomputer line,
If you don’t measure something, you can’t manage it. And if you don’t set ambitious goals, then you can’t attain them. This is why AMD in 2014 took on the task of raising the efficiency of its mobile processors with its 25X20 program, which sought to increase the power efficiency
K12 was not Seattle as that SKU made use of ARM Holdings A57 reference design cores. That K12 Custom ARM core project borrowed a lot from Zen-1 but was engineered to execute the ARMv8A ISA. I’m sure that AMD retains the verilog for K12 and that was rumored to be SMT capable like Zen/x86. The nice thing about ARM/RISC is the Instruction decoders are rather small affairs compared to those large and legacy ISA bloated x86 instruction decoders and that can be seen in Apple’s A14 Firestorm custom ARM core design that has 8 ARM ISA instruction decoders on that CPU. So AMD could really make a very wide Custom ARM core for all sorts of applications and net a much higher IPC and power efficiency there for servers and for mobile devices.
So if one is looking for loads of IPC via some very wide order superscalar design then RISC ISAs like Powwer8/Later and ARM and Apple’s A14/Firestorm are the way to go there as the A14 matches at 3.2GHz what the current x86 cores require at least 1.5GHz higher clock speeds to match, so much better the performance that A14/Firestorm core offers in raw IPC. It’s rather hard to go wider with x86 power budget wise and transistor count wise.
Your email address will not be published.


*

*

document.getElementById( “ak_js” ).setAttribute( “value”, ( new Date() ).getTime() );
This site uses Akismet to reduce spam. Learn how your comment data is processed.
The Next Platform is published by Stackhouse Publishing Inc in partnership with the UK’s top technology publication, The Register.
It offers in-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Read more…
Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.
Subscribe now
All Content Copyright The Next Platform

source