{"id":31688,"date":"2022-06-01T06:26:02","date_gmt":"2022-06-01T06:26:02","guid":{"rendered":"https:\/\/harchi90.com\/next-gen-ai-successor-to-ponte-vecchio-xe-hpc-gpu-with-up-to-160-xe-cores-over-20000-alus-oam-2-0-sampling-in-2023\/"},"modified":"2022-06-01T06:26:02","modified_gmt":"2022-06-01T06:26:02","slug":"next-gen-ai-successor-to-ponte-vecchio-xe-hpc-gpu-with-up-to-160-xe-cores-over-20000-alus-oam-2-0-sampling-in-2023","status":"publish","type":"post","link":"https:\/\/harchi90.com\/next-gen-ai-successor-to-ponte-vecchio-xe-hpc-gpu-with-up-to-160-xe-cores-over-20000-alus-oam-2-0-sampling-in-2023\/","title":{"rendered":"Next-Gen AI Successor To Ponte Vecchio Xe-HPC GPU With Up To 160 Xe Cores, Over 20,000 ALUs, OAM 2.0, Sampling in 2023"},"content":{"rendered":"\n
\n
\n
<\/div>\n<\/div>\n

Intel has officially unveiled its next-generation successor to its flagship Xe GPU, the Ponte Vecchio, known as Rialto Bridge. The new graphics chip is designed for the next generation of AI & HPC data center segment, aiming at AMD’s CDNA & NVIDIA’s CUDA processors.<\/p>\n

Intel Rialto Bridge GPU Unveiled: Successor To Ponte Vecchio With 25% More Cores, Increased Flops, Targeting AMD & NVIDIA Data Center GPUs<\/h2>\n

The Intel Rialto Bridge GPU can be seen as an upgraded version of Ponte Vecchio with more cores, more flops, more bandwidth, and more GT \/ s. Intel hasn’t disclosed a lot of details but claims Rialto Bridge will feature up to 160 Xe cores. We don’t know yet if these are the same cores as the current Ponte Vecchio GPUs or based on a brand new architecture but it looks like the latter might be true.<\/p>\n

\n

Today we are announcing our successor to this powerhouse data center GPU, code-named Rialto Bridge. By evolving the Ponte Vecchio architecture and combining enhanced tiles with next process node technology, Rialto Bridge will offer significantly increased density, performance, & efficiency, while providing software consistency.<\/p>\n<\/blockquote>\n

Intel’s Rialto Bridge is named after the bridge of the same name which is the oldest of the four bridges spanning the Grand Canal in Venice, Italy. So was the case with Ponte Vecchio & it looks like the next gen that comes after Rialto will also be named after an iconic bridge. According to Intel, its Rialto Bridge GPU will power the next generation of AI & HPC Data Center solutions while aiming at the AMD CDNA & NVIDIA CUDA accelerators.<\/p>\n

\"\"<\/p>\n

In terms of specifications, we only know that the Rialto Bridge GPU will house up to 160 Xe cores in its brand new OAM v2 form factor. But besides the unveiling of the specs, Intel also gives us a first look at the chip itself and there are some things we can dissect. The biggest change to the GPU is in its GPU die layout. While Ponte Vecchio has 16 Xe-HPC dies, each with 8 Xe cores for a total of 128 cores or 16,384 ALUs, the Rialto Bridge GPU comes with 8 16 Xe-HPC dies. So that should be 20 Xe cores per die for a total of 160 Xe cores on the 8 dies. That rounds up to 20,480 ALUs which is a 25 percent increase over its predecessor.<\/p>\n

The rest of the Rialto Bridge GPU structure is pretty much the same as the Ponte Vecchio GPU with two Xe Link Tiles, eight HBM Tiles (HBM3) with four HBM stacks tied to each compute tile (4 Xe HPC dies), There are also the passive die stiffeners located around the Compute Tiles while the Xe Link & HBM3 Tiles are connected to the Compute Tile using an EMIB Tile. The Foveros chip interconnect is used by the Compute Tile to communicate with the rest of the Xe Dies. We don’t know the actual variation of each tile yet but it should be based on the new Foveros Omni (3rd Gen) design. Also, it looks like the Rambo Cache tile is missing but it is highly possible that given the die size increase of each Compute tile, the cache is now featured on the Compute tile itself rather than have it separate on its own tile.<\/p>\n

\"\"<\/p>\n

As for performance, Intel hasn’t revealed any clear numbers and only stated that we should expect more FLOPs, GT \/ s, and increased bandwidth. The increased bandwidth should be coming from the upgraded HBM3 memory dies. The Ponte Vecchio GPUs are already equipped with up to 128 GB of VRAM capacities so that should be what we also see on the Rialto Bridge GPUs but Intel could stack it up even higher.<\/p>\n

Following is the full Intel Rialto Bridge die configuration that we can dissect at the moment:<\/p>\n

    \n
  • 8 Xe HPC (internal \/ external)<\/li>\n
  • 2 Xe Base (internal)<\/li>\n
  • 11 EMIB (internal)<\/li>\n
  • 2 Xe Link (external)<\/li>\n
  • 8 HBM (external)<\/li>\n<\/ul>\n

    Intel hasn’t given any release time or details regarding the process node for the Rialto Bridge GPU but it is likely that we will hear more about it in mid-2023 when it will be sampled to first customers and a launch that aims either late 2023 or 1H of 2024.<\/p>\n

    Next-Gen Data Center GPU Accelerators<\/h2>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
    GPU Name<\/th>\nAMD Instinct MI250X<\/th>\nNVIDIA Hopper GH100<\/th>\nIntel Ponte Vecchio<\/th>\nIntel Rialto Bridge<\/th>\n<\/tr>\n<\/thead>\n
    Packaging Design<\/td>\nMCM (Infinity Fabric)<\/td>\nMonolithic<\/td>\nMCM (EMIB + Foveros)<\/td>\nMCM (EMIB + Foveros)<\/td>\n<\/tr>\n
    GPU Architecture<\/td>\nAldebaran (CDNA 2)<\/td>\nHopper GH100<\/td>\nXe-HPC<\/td>\nXe-HPC<\/td>\n<\/tr>\n
    GPU Process Node<\/td>\n6nm<\/td>\n4N<\/td>\n7nm (Intel 4)<\/td>\n5nm (Intel 3)?<\/td>\n<\/tr>\n
    GPU Cores<\/td>\n14.080<\/td>\n16,896<\/td>\n16,384 ALUs
    (128 Xe Cores)<\/td>\n
    20,480 ALUs
    (160 Xe Cores)<\/td>\n<\/tr>\n
    GPU Clock Speed<\/td>\n1700 MHz<\/td>\n~ 1780 MHz<\/td>\nTBA<\/td>\nTBA<\/td>\n<\/tr>\n
    L2 \/ L3 Cache<\/td>\n2 x 8 MB<\/td>\n50 MB<\/td>\n2 x 204 MB<\/td>\nTBA<\/td>\n<\/tr>\n
    FP16 Compute<\/td>\n383 TOPs<\/td>\n2000 TFLOPs<\/td>\nTBA<\/td>\nTBA<\/td>\n<\/tr>\n
    FP32 Compute<\/td>\n95.7 TFLOPs<\/td>\n1000 TFLOPs<\/td>\n~ 45 TFLOPs (A0 Silicon)<\/td>\nTBA<\/td>\n<\/tr>\n
    FP64 Compute<\/td>\n47.9 TFLOPs<\/td>\n60 TFLOPs<\/td>\nTBA<\/td>\nTBA<\/td>\n<\/tr>\n
    Memory Capacity<\/td>\n128GB HBM2E<\/td>\n80 GB HBM3<\/td>\n128GB HBM2e<\/td>\n128GB HBM3?<\/td>\n<\/tr>\n
    Memory Clock<\/td>\n3.2 Gbps<\/td>\n3.2 Gbps<\/td>\nTBA<\/td>\nTBA<\/td>\n<\/tr>\n
    Memory Bus<\/td>\n8192-bit<\/td>\n5120-bit<\/td>\n8192-bit<\/td>\n8192-bit<\/td>\n<\/tr>\n
    Memory Bandwidth<\/td>\n3.2 TB \/ s<\/td>\n3.0 TB \/ s<\/td>\n~ 3 TB \/ s<\/td>\n~ 3 TB \/ s<\/td>\n<\/tr>\n
    Form Factor<\/td>\nOAM<\/td>\nOAM<\/td>\nOAM<\/td>\nOAM v2<\/td>\n<\/tr>\n
    Cooling<\/td>\nPassive Cooling
    Liquid Cooling<\/td>\n
    Passive Cooling
    Liquid Cooling<\/td>\n
    Passive Cooling
    Liquid Cooling<\/td>\n
    Passive Cooling
    Liquid Cooling<\/td>\n<\/tr>\n
    TDP<\/td>\n560W<\/td>\n700W<\/td>\n700W<\/td>\n800W<\/td>\n<\/tr>\n
    Launch<\/td>\nQ4 2021<\/td>\n2H 2022<\/td>\n2022?<\/td>\n2024?<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n

    \t\t\t<\/p><\/div>\n