{"id":54012,"date":"2022-08-23T16:33:48","date_gmt":"2022-08-23T16:33:48","guid":{"rendered":"https:\/\/harchi90.com\/1074mm2-on-7nm-77-billion-transistors-up-to-2-8x-faster-than-nvidia-ampere-at-550w\/"},"modified":"2022-08-23T16:33:48","modified_gmt":"2022-08-23T16:33:48","slug":"1074mm2-on-7nm-77-billion-transistors-up-to-2-8x-faster-than-nvidia-ampere-at-550w","status":"publish","type":"post","link":"https:\/\/harchi90.com\/1074mm2-on-7nm-77-billion-transistors-up-to-2-8x-faster-than-nvidia-ampere-at-550w\/","title":{"rendered":"1074mm2 on 7nm, 77 Billion Transistors, Up To 2.8x Faster Than NVIDIA Ampere at 550W"},"content":{"rendered":"<div id=\"\">\n<p>Earlier this month, we reported that Birentech, a company hailing from China, was working on its fastest GPU to date, the Biren BR100.  Based on what the company has publicly revealed, the Biren BR100 aims to be a General-Purpose GPU that would offer faster performance than NVIDIA&#8217;s A100 GPUs in AI processing.  Now at Hot Chips 34, the company is presenting us with more details on the specs and architecture within its Biren GPGPU lineup.<\/p>\n<h2>China&#8217;s Fastest General-Purpose MCM GPU, The Birentech Biren BR100, Architecture Detailed<\/h2>\n<p>The Birentech BR100 is the flagship General-Purpose GPU that China has to offer, featuring an in-house GPU architecture that utilizes a 7nm process node and houses 77 Billion transistors within its die.  The GPU has been fabricated on TSMC&#8217;s 2.5D CoWoS design and also comes packed with 300 MB of on-chip cache, 64 GB of HBM2e with a memory bandwidth of 2.3 TB\/s, and support for PCIe Gen 5.0 (CXL interconnect protocol).  The whole chip measures 1074mm2 which is beyond the reticle limit of the process node.<\/p>\n<figure class=\"wp-lightbox\"><\/figure>\n<p>Some of the fundamentals that went into designing the BR100 GPU included:<\/p>\n<ul>\n<li>To break the reticle size limit and integrate more transistors on a chip<\/li>\n<li>One tape out to empower multiple SKUs<\/li>\n<li>Smaller die for better yield, hence lower cost<\/li>\n<li>896GB\/s high-speed die-to-die interconnect<\/li>\n<li>30% more performance, and 20% better yield compared with a monolithic design<\/li>\n<\/ul>\n<figure class=\"story-gallery\">\n<div class=\"swiper-container\">\n<div class=\"swiper-wrapper\">\n<div class=\"swiper-slide\" data-src=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_4-1480x833.png\"><img srcset=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_4-1480x833.png 2x, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_4-740x416.png 1x\" src=\"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_4-265x166.png?resize=265%2C166&#038;ssl=1\" alt=\"birentech-biren-br100-chinas-fastest-general-purpose-gpu-hot-chips-34_4\" data-recalc-dims=\"1\"\/><\/div>\n<div class=\"swiper-slide\" data-src=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_3-1480x833.png\"><img srcset=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_3-1480x833.png 2x, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_3-740x416.png 1x\" src=\"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_3-265x166.png?resize=265%2C166&#038;ssl=1\" alt=\"birentech-biren-br100-chinas-fastest-general-purpose-gpu-hot-chips-34_3\" data-recalc-dims=\"1\"\/><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p>Talking about the architecture itself, the Biren BR100 is made up of two chiplets, each housing 16 SPC or Streaming Processing Clusters.  Each SPC has 16 EUs and four of these EUs form an internal Compute Unit or CU that is attached to 64 KB of L1 cache (LSC) while the SPC features a shared 8 MB L2 cache across all Execution Units.  So that&#8217;s a total of 32 SPCs with 512 Execution Units, 256 MB of L2 cache, and 8 MB of L1 cache.<\/p>\n<p>A deeper look at the Execution Unit reveals 16 streaming processing cores (V-Core) and a single Tensor Engine (T-Core).  There&#8217;s 40 KB of TLR (Thread Local Register), 4 SFUs, and a TDA (Tensor Data Accelerator).  Interestingly, each CU can contain 4, 8, and up to 16 EUs.  The V-Core itself is a general-purpose SIMT processor which features 16-cores that supports FP32, FP16, INT32 &#038; INT16 along with SFU, Load\/Store, and Data Processing, while handling deep learning operations such as Batch Norm, ReLu, etc.  It also features an enhanced SIMT Model that can run up to 128K threads on 32 SPCs in a super-scalar mode (static and dynamic).  For the T-Cores, the tensor design is used to accelerate AI operations such as MMA, Convolution, etc.<\/p>\n<figure class=\"story-gallery\">\n<\/figure>\n<p>Birentech disclosed various performance metrics of the chip.  It offers up to 2048 TOPs (INT8), 1024 TFLOPs (BF16), 512 TFLOPs (TF32+), and 256 TFLOPs (FP32), and based on the performance figures, it looks like this chip is going to be faster than the NVIDIA Ampere A100, at least on paper.  The GPU has been compared against the NVIDIA Ampere A100 in various HPC workloads and it looks like it would offer up to a 2.6x average speedup and up to a 2.8x speedup over its main competitor.<\/p>\n<figure class=\"wp-lightbox\"><img loading=\"lazy\" class=\"alignnone size-large wp-image-1369507\" src=\"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_2-740x416.png?resize=740%2C416&#038;ssl=1\" alt=\"\" width=\"740\" height=\"416\" srcset=\"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_2-740x416.png?resize=740%2C416&#038;ssl=1 740w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_2-768x432.png 768w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_2-1536x864.png 1536w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_2-2048x1152.png 2048w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_2-550x309.png 550w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_2-1100x619.png 1100w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_2-1480x833.png 1480w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_2-1030x579.png 1030w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_2-1920x1080.png 1920w\" sizes=\"(max-width: 740px) 100vw, 740px\" data-recalc-dims=\"1\"\/><\/figure>\n<p>The Hopper H100 GPU offers nearly 2x or 2.5x the performance in the same GPU performance metrics.  The chip also supports 64-channel encoding and 512-channel encoding.  As for the interconnects, the chip comes with an 8 BLink solution which offers 2.3 TB\/s of external I\/O bandwidth.<\/p>\n<p>What&#8217;s interesting is that the BR100 isn&#8217;t that far behind in terms of overall transistor count compared to the NVIDIA H100.  The H100 features 80 Billion transistors on the new N4 process node whereas the BR100 is only 3 Billion transistors behind the 7nm process node.  This would lead to a much bigger die size.<\/p>\n<figure class=\"story-gallery\">\n<div class=\"swiper-container\">\n<div class=\"swiper-wrapper\">\n<div class=\"swiper-slide\" data-src=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-GPU-low_res-scale-4_00x-1480x1367.png\"><img srcset=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-GPU-low_res-scale-4_00x-1480x1367.png 2x, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-GPU-low_res-scale-4_00x-740x683.png 1x\" src=\"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-GPU-low_res-scale-4_00x-265x166.png?resize=265%2C166&#038;ssl=1\" alt=\"birentech-biren-br100-gpu-low_res-scale-4_00x\" data-recalc-dims=\"1\"\/><\/div>\n<div class=\"swiper-slide\" data-src=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-low_res-scale-4_00x-1480x987.png\"><img srcset=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-low_res-scale-4_00x-1480x987.png 2x, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-low_res-scale-4_00x-740x494.png 1x\" src=\"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-low_res-scale-4_00x-265x166.png?resize=265%2C166&#038;ssl=1\" alt=\"birentech-biren-br100-low_res-scale-4_00x\" data-recalc-dims=\"1\"\/><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<table class=\"table table-hover\" style=\"width: 100%;\">\n<tbody>\n<tr>\n<th style=\"width: 99.1892%;\" colspan=\"2\">Birentech Biren BR100<\/th>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">Process<\/td>\n<td style=\"width: 54.7297%;\">7nm<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">System interface, bandwidth, interconnection protocol<\/td>\n<td style=\"width: 54.7297%;\">PCIe5.0 X16, 128GB\/s, support CXL<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">FP32 TFLOPS (peak)<\/td>\n<td style=\"width: 54.7297%;\">256<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">TF32+ TFLOPS (peak)<\/td>\n<td style=\"width: 54.7297%;\">512<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">BF16 TFLOPS (peak)<\/td>\n<td style=\"width: 54.7297%;\">1,024<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">INT8 TOPS (peak)<\/td>\n<td style=\"width: 54.7297%;\">2,048<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">Memory capacity, interface bit width, bandwidth<\/td>\n<td style=\"width: 54.7297%;\">64GB HBM2E\uff1b4,096bit, 1.64TB\/s<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">interconnection<\/td>\n<td style=\"width: 54.7297%;\">512GB\/s BLink\u2122, supports 8 x8 ports<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">secure virtual instance<\/td>\n<td style=\"width: 54.7297%;\">Up to 8 servings<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">Video codec (FHD@30fps)<\/td>\n<td style=\"width: 54.7297%;\">64-channel HEVC\/H.264 encoding\/512-channel HEVC\/H.264 decoding<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">TDP<\/td>\n<td style=\"width: 54.7297%;\">550W<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 44.4595%;\">Product form<\/td>\n<td style=\"width: 54.7297%;\">OAM module<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The Biren BR100 isn&#8217;t the only chip that the China-based company has announced.  There&#8217;s also the Biren BR104 which offers half the performance metrics of the BR100 but the specifications aren&#8217;t told yet.  The only detail available on the other chip is that, unlike the Biren BR100 which uses a chiplet design, the BR104 is a monolithic die and comes in a standard PCIe form factor with a TDP of 300W.<\/p>\n<figure class=\"wp-lightbox\"><img loading=\"lazy\" class=\"alignnone size-large wp-image-1367073\" src=\"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR104-low_res-scale-4_00x-740x390.png?resize=740%2C390&#038;ssl=1\" alt=\"\" width=\"740\" height=\"390\" srcset=\"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR104-low_res-scale-4_00x-740x390.png?resize=740%2C390&#038;ssl=1 740w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR104-low_res-scale-4_00x-768x404.png 768w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR104-low_res-scale-4_00x-1536x809.png 1536w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR104-low_res-scale-4_00x-2048x1078.png 2048w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR104-low_res-scale-4_00x-550x290.png 550w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR104-low_res-scale-4_00x-1100x579.png 1100w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR104-low_res-scale-4_00x-1480x779.png 1480w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR104-low_res-scale-4_00x-1030x542.png 1030w, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR104-low_res-scale-4_00x-1920x1011.png 1920w\" sizes=\"(max-width: 740px) 100vw, 740px\" data-recalc-dims=\"1\"\/><\/figure>\n<table class=\"table table-hover\">\n<tbody>\n<tr>\n<th colspan=\"2\">Birentech Biren 104<\/th>\n<\/tr>\n<tr>\n<td>Process<\/td>\n<td>7nm<\/td>\n<\/tr>\n<tr>\n<td>System interface, bandwidth, interconnection protocol<\/td>\n<td>PCIe5.0 X16, 128GB\/s, support CXL<\/td>\n<\/tr>\n<tr>\n<td>FP32 TFLOPS (peak)<\/td>\n<td>128<\/td>\n<\/tr>\n<tr>\n<td>TF32+ TFLOPS (peak)<\/td>\n<td>256<\/td>\n<\/tr>\n<tr>\n<td>BF16 TFLOPS (peak)<\/td>\n<td>512<\/td>\n<\/tr>\n<tr>\n<td>INT8 TOPS (peak)<\/td>\n<td>1,024<\/td>\n<\/tr>\n<tr>\n<td>Memory capacity, interface bit width, bandwidth<\/td>\n<td>32GB HBM2E;  2,048bit, 819GB\/s<\/td>\n<\/tr>\n<tr>\n<td>interconnection<\/td>\n<td>192GB\/s BLink\u2122, supports 3 x8 ports<\/td>\n<\/tr>\n<tr>\n<td>secure virtual instance<\/td>\n<td>up to 4 servings<\/td>\n<\/tr>\n<tr>\n<td>Video codec (FHD@30fps)<\/td>\n<td>32 channels of HEVC\/H.264 encoding, 256 channels of HEVC\/H.264 decoding<\/td>\n<\/tr>\n<tr>\n<td>TDP<\/td>\n<td>300W<\/td>\n<\/tr>\n<tr>\n<td>Product form<\/td>\n<td>Full-height full-length, dual-slot PCIe card<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<figure class=\"story-gallery\">\n<div class=\"swiper-container\">\n<div class=\"swiper-wrapper\">\n<div class=\"swiper-slide\" data-src=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_7-1480x833.png\"><img srcset=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_7-1480x833.png 2x, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_7-740x416.png 1x\" src=\"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_7-265x166.png?resize=265%2C166&#038;ssl=1\" alt=\"birentech-biren-br100-chinas-fastest-general-purpose-gpu-hot-chips-34_7\" data-recalc-dims=\"1\"\/><\/div>\n<div class=\"swiper-slide\" data-src=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_6-1480x833.png\"><img srcset=\"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_6-1480x833.png 2x, https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_6-740x416.png 1x\" src=\"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34_6-265x166.png?resize=265%2C166&#038;ssl=1\" alt=\"birentech-biren-br100-chinas-fastest-general-purpose-gpu-hot-chips-34_6\" data-recalc-dims=\"1\"\/><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p>The company states that a chip with 77 Billion transistors can mimic the human brain nerve cells and the chip itself will be used for DNN and AI purposes so it is more or less going to replace China&#8217;s dependence on NVIDIA&#8217;s AI GPUs.<\/p>\n<\/p><\/div>\n<p><script>\n\t!function(f,b,e,v,n,t,s){if(f.fbq)return;n=f.fbq=function(){n.callMethod?\n\tn.callMethod.apply(n,arguments):n.queue.push(arguments)};if(!f._fbq)f._fbq=n;\n\tn.push=n;n.loaded=!0;n.version='2.0';n.queue=[];t=b.createElement(e);t.async=!0;\n\tt.src=v;s=b.getElementsByTagName(e)[0];s.parentNode.insertBefore(t,s)}(window,\n\tdocument,'script','https:\/\/connect.facebook.net\/en_US\/fbevents.js');\n\tfbq('init', '1503230403325633');\n\tfbq('track', 'PageView');\n\t<\/script><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Earlier this month, we reported that Birentech, a company hailing from China, was working on its fastest GPU to date, the Biren BR100. Based on what the company has publicly revealed, the Biren BR100 aims to be a General-Purpose GPU that would offer faster performance than NVIDIA&#8217;s A100 GPUs in AI processing. Now at Hot &hellip;<\/p>\n<p class=\"read-more\"> <a class=\"\" href=\"https:\/\/harchi90.com\/1074mm2-on-7nm-77-billion-transistors-up-to-2-8x-faster-than-nvidia-ampere-at-550w\/\"> <span class=\"screen-reader-text\">1074mm2 on 7nm, 77 Billion Transistors, Up To 2.8x Faster Than NVIDIA Ampere at 550W<\/span> Read More &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"default","ast-global-header-display":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[4],"tags":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":53423,"url":"https:\/\/harchi90.com\/intel-details-ponte-vecchio-gpu-sapphire-rapids-hbm-performance-up-to-2-5x-faster-than-nvidia-a100\/","url_meta":{"origin":54012,"position":0},"title":"Intel Details Ponte Vecchio GPU &#038; Sapphire Rapids HBM Performance, Up To 2.5x Faster Than NVIDIA A100","date":"August 23, 2022","format":false,"excerpt":"During Hot Chips 34, Intel once again detailed its Ponte Vecchio GPUs running on a Sapphire Rapids HBM server platform. Intel Shows off Ponte Vecchio 2-Stack GPU & Sapphire Rapids HBM CPU Performance Against NVIDIA's A100 In the presentation by Intel Fellow & Chief GPU Compute Architect, Hong Jiang, we\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2021\/11\/Intel-Ponte-Vecchio-GPU-1030x579.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":53006,"url":"https:\/\/harchi90.com\/nvidia-hopper-h100-with-4th-gen-tensor-core-is-twice-as-fast-clock-for-clock-frequency-delivers-30-performance-gain\/","url_meta":{"origin":54012,"position":1},"title":"NVIDIA Hopper H100 With 4th Gen Tensor Core Is Twice As Fast Clock-For-Clock, Frequency Delivers 30% Performance Gain","date":"August 22, 2022","format":false,"excerpt":"NVIDIA is further dissecting its Hopper H100 GPU at Hot Chips 34, giving us a taste of what the 4th Gen Tensor Core architecture has to offer. NVIDIA Kepler GK110 GPU Is Equivalent To A Single GPC on Hopper H100 GPU, 4th Gen Tensor Cores Up To 2x Faster While\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"NVIDIA Kepler GK110 GPU Is Equivalent To A Single GPC on Hopper H100 GPU, 4th Gen Tensor Cores Up To 2x Faster 2","src":"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/05\/NVIDIA-GH100-H100-Hopper-GPU-very_compressed-scale-6_00x-Custom-740x370.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":43793,"url":"https:\/\/harchi90.com\/amd-rdna-3-gpus-for-radeon-rx-7000-graphics-cards-detailed\/","url_meta":{"origin":54012,"position":2},"title":"AMD RDNA 3 GPUs For Radeon RX 7000 Graphics Cards Detailed","date":"August 13, 2022","format":false,"excerpt":"Tech outlet Angstronomics has a very detailed roundup published the specifications of AMD's RDNA 3 GPUs such as the Navi 31, Navi 32, and Navi 33 chips regarding that will go on to power their next-gen Radeon RX 7000 series graphics cards. AMD Navi 31, Navi 32, Navi 33 \"RDNA\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"AMD RDNA 3 Navi 3x GPU illustration show possible chip configurations.  (Image Credits: Olrak_29)","src":"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/05\/AMD-RDNA-3-Navi-3x-GPU-SKUs-1030x525.jpg?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":52398,"url":"https:\/\/harchi90.com\/apple-will-be-first-to-receive-3nm-chips-from-tsmc-but-not-for-the-device-youre-thinking-of\/","url_meta":{"origin":54012,"position":3},"title":"Apple will be first to receive 3nm chips from TSMC, but not for the device you&#8217;re thinking of","date":"August 22, 2022","format":false,"excerpt":"The world's leading foundry is Taiwan Semiconductor Manufacturing Company, Limited (TSMC). Both it and Samsung are shipping chips this year made using their 3nm process node. The smaller the process node, the higher the transistor count in chips. With the iPhone 14 series expected to be released around the second\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":31296,"url":"https:\/\/harchi90.com\/chinese-made-zhaoxin-kx-6000g-cpu-with-gt10c0-integrated-gpu-features-the-same-performance-as-nvidias-gt-630\/","url_meta":{"origin":54012,"position":4},"title":"Chinese-Made Zhaoxin KX-6000G CPU With GT10C0 Integrated GPU Features The Same Performance As NVIDIA&#8217;s GT 630","date":"July 31, 2022","format":false,"excerpt":"Chinese domestic chipmaker, Zhaoxin, is entering the realm of APUs with their first product, the KX-6000G CPU, offering up to 1.5 TFLOPs of GPU horsepower. Chinese Domestic Chipmaker, Zhaoxin, Preps KX-6000G CPU With A 1.5 TFLOPs Integrated GPU That's As Fast As NVIDIA's Decade-Old GT 630 To elaborate things, Zhaoxin\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/06\/13f355de5ed37b5349727eeaeef55d61baa32a95.jpg@942w_707h_progressive-low_res-scale-6_00x-Custom-Custom-740x555.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":55814,"url":"https:\/\/harchi90.com\/72-arm-v9-0-cores-117-mb-l3-cache-68-pcie-gen-5-lanes-tsmc-4n-process-500w-tdp\/","url_meta":{"origin":54012,"position":5},"title":"72 Arm V9.0 Cores, 117 MB L3 Cache, 68 PCIe Gen 5 Lanes, TSMC 4N Process &#038; 500W TDP","date":"August 25, 2022","format":false,"excerpt":"NVIDIA has revealed new details of its Grace CPU, Orin SOC, and NVLINK chip interconnects during Hot Chips 34. NVIDIA's Grace CPU Breaks Cover, Features 72 Arm v9.0 Cores Per Chip, 117 MB L3 Cache, 68 Gen 5 Lanes, All on TSMC 4N Process Node NVIDIA first announced its Grace\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"NVIDIA Grace CPU Detailed: 72 Arm V9.0 Cores, 117 MB L3 Cache, 68 PCIe Gen 5 Lanes, TSMC 4N Process & 500W TDP 2","src":"https:\/\/i0.wp.com\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/NVIDIA-Grace-CPU-Superchips-_-Hot-Chips-34-_1-740x416.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"fifu_image_url":"https:\/\/cdn.wccftech.com\/wp-content\/uploads\/2022\/08\/Birentech-Biren-BR100-Chinas-Fastest-General-Purpose-GPU-Hot-Chips-34-low_res-scale-2_00x-740x437.png","_links":{"self":[{"href":"https:\/\/harchi90.com\/wp-json\/wp\/v2\/posts\/54012"}],"collection":[{"href":"https:\/\/harchi90.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/harchi90.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/harchi90.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/harchi90.com\/wp-json\/wp\/v2\/comments?post=54012"}],"version-history":[{"count":0,"href":"https:\/\/harchi90.com\/wp-json\/wp\/v2\/posts\/54012\/revisions"}],"wp:attachment":[{"href":"https:\/\/harchi90.com\/wp-json\/wp\/v2\/media?parent=54012"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/harchi90.com\/wp-json\/wp\/v2\/categories?post=54012"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/harchi90.com\/wp-json\/wp\/v2\/tags?post=54012"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}