Intel has adopted a “Domain-Specific Architecture” strategy espoused by John L. Hennessy, Alphabet Chairman and former President of Stanford University. Consequently, the company has at least one of everything: CPU, GPU, various ASICs, and FPGAs. While this may appear to be a strategy of throwing everything against the wall and see what sticks, it does offer some appeal in shaping the epitome of heterogeneous computing. However, this approach is capital intensive, requiring a lot of engineering, and places a significant burden on the software developer. Let’s talk about one of these architectures that is a net-new addition to Intel: high performance GPUs.
Intel recently announced details on their forthcoming data center GPU, the Xe HPC, code named Ponte Vecchio (PVC). Intel daringly implied that the peak performance of the PVC GPU would be roughly twice that of today’s fastest GPU, the Nvidia A100. PVC and Sapphire Rapids (the multi-tile next-gen Xeon) are being used to build Aurora, the Argonne National Lab’s Exascale supercomputer, in 2022, so this technology should finally be just around the corner.
Intel is betting on this first-generation datacenter GPU for HPC to finally catch up with Nvidia and AMD, both for HPC (64-bit floating point) and AI (8 and 16-bit integer and 16-bit floating point). The Xe HPC device is a multi-tiled, multi-process-node package with new GPU cores, HBM2e memory, a new Xe Link interconnect, and PCIe Gen 5 implemented with over 100-billion transistors. That is nearly twice the size of the 54-billion Nvidia A100 chip. At that size, power consumption could be an issue at high frequencies. Nonetheless, the Xe design clearly demonstrates that Intel gets it; packaging smaller dies helps reduce development and manufacturing costs, and can improve time to market.
Ponte Vecchio is a multi-tile, multi-process node package.
Ponte Vecchio is expected to start shipping early next year to the Argonne National Laboratory for Aurora, with tens of thousands of GPUs installed to deliver at least 1.1 exaflops in the first HPC Exascale supercomputer, funded by the US DOE.
Ponte Vecchio promises a major step function in Intel HPC and AI performance.
Initial performance claims are quite impressive, roughly doubling Nvidia A100 performance with 45 trillion FP32 flops per second from the vector engine and 1,468 INT8 TOPS from the matrix-processing units. The monster chip is rumored to consume some 600 watts, which puts it into rare air indeed. The slide below was shared at the Intel Architecture Day event in August. While the slide is missing an X axis, and surely reflects peak performance, this seems to reinforce the “twice A100” claims.
The Xe platform includes on-die interconnect links and a switch that enables efficient scaling up to … [+]
It will be interesting to see how Intel positions Xe HPC versus the Habana Labs Gaudi. A likely bifurcation would be to focus Ponte Vecchio in HPC supercomputing, and Gaudi as a scalable training platform for Cloud Service Providers. A lot will depend on Intel’s ability to get software teams lined up for both.
Perhaps more important than all the specs and competitive comparisons, Intel should be able to leverage Aurora to begin to build the credibility and community around development for the Ponte Veccio GPU, including serious use of oneAPI for AI as well as HPC. A lot has been said about Intel’s ambitious goal to provide a single abstraction to high performance computing and AI, and the company has not given up on this noble but difficult goal. In the recent briefings, the company reiterated its intentions, and provide some pretty compelling evidence that oneAPI is getting market traction. We remain concerned that Habana is not yet supporting the software, but putting this into perspective, it is not and should not be the priority at this point.
Intel says over 80 HPC and AI applications now support oneAPI on early Ponte Vecchio chips. Thats … [+]
I am very impressed with what Intel has accomplished on the GPU front. However, Ponte Vecchio will be an alternative to AMD and Nvidia only if the platform has reasonable power consumption, and is complemented with the software that can assure ease of use and optimization. The latter point is critical; performance matters a lot, and Intel will need to make it simple and effective to optimize code and models to produce the claimed performance level.
In my mind, there is no doubt that Ponte Vecchio has a good shot to become the poster child for the New Intel under Pat Gelsinger’s leadership.
I love to learn and share the amazing hardware and services being built to enable Artificial Intelligence, the next big thing in technology.