July_GAI Topic｜Analyzing the market competition among Intel, AMD and NVIDIA in terms of computing architecture development (in Chinese)

Published On: 2023/07/17|Categories: 科技(Technology)|

After briefly reviewing the revenue performance of the three major chipset vendors in their main product lines, we will now analyze and discuss their technological competitiveness in CPUs, GPUs, and even computing products with other computing architectures. Taking an overview of the recent development history of the three companies, all of them have made acquisitions of semiconductor companies targeting non-CPU and GPU architectures. For example, the most famous acquisitions in the past were Intel's acquisition of Altera, a major FPGA manufacturer, and AMD's recently completed acquisition of Xilinx, etc. In addition, NVIDIA has completed the acquisition of Mellanox, which has further enhanced its competitiveness in DPU and network transmission technologies. In addition, NVIDIA has also completed the acquisition of Mellanox, which further strengthens its capabilities in solutions for DPU and network transmission technologies, and enhances the completeness of its own server system solutions. Whether it is from the laptop or server applications, generally speaking, they are not free from the discussion of solutions such as CPU and GPU, and in the server field, recently there are FPGAs and DPUs and other solutions introduced in an attempt to share the burden of the CPU and try to use the Off Loading method, so that the CPU can focus on the core computing work, so that the overall performance of the system to maximize the performance. This will maximize the performance of the whole system.

(i) CPUCompetition Analysis

From the perspective of CPUs, the x86 architecture used to dominate both the server and PC fields. As smartphone performance continues to improve, the Arm architecture is gradually emerging, and Arm's CPU IP is dedicated to the development of energy efficiency optimization. Although the Arm camp has started to make a lot of efforts in the server and PC fields in the past, the overall results have been far from satisfactory. However, with Apple and NVIDIA each entering the PC and server space with their own Arm architectures, the situation has clearly begun to change. Recently, the most important thing to explore is NVIDIA's Grace CPU product, which adopts Arm's Neoverse V2 CPU design, integrates 72-core CPUs in a single die and provides dual die packaging, with NVIDIA's own NV-Link C2C technology connecting the dies to each other. NVIDIA calls this the Grace CPU Superchip, and the concept also requires the use of LPDDR5X (up to 960GB capacity), with an overall module power consumption of 500 W. In terms of NVIDIA's official announcement in 2022, the Grace CPU and Grace Hopper product lines have already been adopted by ASUS, Foxconn, Gigabyte, Gentec, American Supermicro, and NVIDIA. The Grace CPU and Grace Hopper product lines have been adopted by ASUS, Foxconn Internet, Gigabyte, Vantec, SuperMicro, and Wistron, and the related server systems will be launched in the first half of 2023. It is thus evident that the Arm camp, with the assistance of NVIDIA, has already stabilized itself and gained a foothold in the server ecosystem. Another representative of the Arm camp is Ampere, which was founded not long ago but has been cultivating the server and CSP market for many years, and will release its next-generation server product line, AmpereOne, in 2023. According to the information provided by the company, AmpereOne adopts its own customized cores developed using Arm's v8.6 instruction set, with up to 196 cores. AmpereOne uses Arm's v8.6 instruction set to develop its own customized cores, with a maximum of 196 cores, TSMC's 5nm process, and a TDP of 350W, which makes it easy to see that the company's technological prowess is no less impressive than that of big players such as AMD and Intel.

Figure 1. Ampere Server Processor Development History

Source: Ampere Computing; Collated by Ji-Pu Industrial Trend Research Institute 2023/07

但另一方面，x86陣營的態度並未落入被動的姿態，AMD在六月份中旬，就伺服器市場的第四代EPYC處理器系列推出兩款全新產品線，其一就是AMD EPYC 97X4系列產品線，採用Zen 4c核心，核心數量最多達128核，其TDP最高僅有360W。另一系列是EPYC 9004系列，採用Zen 4核心，核心數量最多達96核，同時也搭載AMD的3D V-Cache技術，其TDP最高為400W。不過，AMD的第四代EPYC處理器的作法，大抵上也是與NVIDIA的Grace CPU Superchip雷同，都是採用多裸晶的方式再以先進封裝的方式加以整合，其CPU是採用台積電的5nm製程，控制I/O與記憶體控制等功能的部份，則是採用台積電的6nm，之後再以MCM（Multi-Chip Module）的作法加以整合。

至於Intel的第四代Xeon Scalable 處理器則是市場所熟知的Sapphire Rapids，採用自家製程Intel 7，其發布的時間為2023年第一季，進入到六月也有更新相關的產品線，綜整來看，Sapphire Rapids最多核心數量達到60核，其TDP也僅有350W。事實上，Intel歷代Xeon Scalable處理器依其運算負載與實際需求等，大致上可以分為Platinum、Gold與Silver等版本，其主要的分野在於能夠支援的Socket（插座）數量。Platinum最多可達八個、Gold版本為四個，Silver則為兩個，前面所提及的CPU數量為60核的版本，即為Platinum的8490H。不過，綜觀來看，第四代Xeon Scalable 處理器的主要特色在於能夠支援PCIe Gen5並同時兼容CXL技術，記憶體版本則是可以支援到DDR5（4,800 MT/s 1DIMM per Channel or 4,400 MT/s 2DIMMSs per Channel），其他AVX-512指令集、支援深度學習推論與模型訓練的AMX指令集等加速運算單元皆一應俱全。平均效能方面，相較於前一代提升約1.53倍，PyTorch運算性能相較於前代則大幅提升達十倍，而針對虛擬化無線接取網路，在相同的功耗，第四代相較於前一代，能夠提供高達兩倍的網路處理容量，平均電力效率則高達2.9倍。而在功耗方面，第四代的TDP範圍為250W到350W，第三代則為150W至270W，可以想見第四代在TDP有著一定程度的提升，但若再進一步觀察第二代的版本，有出現過高達400W的處理器，這其實也意味著Intel在性能與功耗之間的拿捏，嘗試取得一個最佳的平衡。

（二）GPUCompetition Analysis

眾所皆知，Intel在GPU領域也是處於高度積極的態勢，但不論是在個人電腦亦或是在伺服器上。與市場上處於絕對領先地位的NVIDIA以及切入時間較早的AMD都有著不小的距離。NVIDIA在GPU市場上有著超過八成的市占率，究其原因，在於NVIDIA長年苦心經營CUDA（Compute Unified Device Architecture，統一計算架構），以完整的軟硬體方案，在個人電腦與伺服器市場打下相當厚實的基礎，近年更有車用與嵌入式應用也逐漸發光發熱，使得NVIDIA透過自家的GPU方案，幾近是跳脫出傳統的個人電腦領域的侷限，往所謂的元宇宙發展。隨著NVIDIA決定進入AI與Machine Learning領域後，觀察NVIDIA近年來在CUDA架構的發展，到了Volta架構正式導入Tensor Core設計，可以進一步提升AI的Model Training與Inference的運算工作之處理效率。而到了Turing架構則是正式加入以光影追蹤技術為主的RT Core（Ray Tracing），可以讓圖像光影呈現更加栩栩如生，隨後不論是Ampere、Hopper乃至於Ada Lovelace，Tensor Core與RT Core都會升級其版本，讓各代GPU架構在AI與圖像運算能力可以有更多的突破。以AI的Model Training來說，NVIDIA的主力產品就是近期話題性頗高的H100，以及2022年九月所發布的L40。而AMD的GPU在AI與光影追蹤領域的發展腳步則略落後於NVIDIA，因應NVIDIA的作法，AMD在GPU的設計上採取兩種不同的發展路徑，針對遊戲運算領域，以RDNA（Radeon DNA）架構為主，持續優化遊戲體驗，更於第二代RDNA架構道導入光影追蹤技術嘗試追上NVIDIA的腳步。而針對AI運算領域，則是祭出CDNA架構，並進一步導入Instinct GPU產品線，其產品名稱以MI100做為起始產品，並應用於資料中心與伺服器領域。而近期AMD於六月份在舊金山發布了更多關於MI300系列的產品規格細節，以MI300X為例，該產品便是鎖定H100所打造，進而搶佔近期十分火紅的生成式AI與AI伺服器市場。至於Intel近期也重新推出獨立GPU產品，嘗試分食NVIDIA與AMD的市佔，不過以現況來看，市佔率仍有相當大的努力空間。在NVIDIA已經佔有相當比重的份額，加上AMD在該市場的市佔雖然不高，但緊追NVIDIA的腳步仍未停下的情況下，Intel短期內在獨立GPU市場的發展應不至於會有太大的突破。

（三）FPGACompetition Analysis

Intel收購Altera以及AMD收購Xilinx，這兩件收購案所帶來的成效一直是產業界所討論的重點之一，兩件收購案的時間落差有著不小的距離。以Intel收購Altera所帶來的成效來看，由於近年的營收表現並不如預料般的出色，甚至有衰退的情況出現，而對於AMD來說，儘管完成收購Xilinx的蜜月期剛結束，但單以AMD的Embedded部門的營收表現仍舊可以繳出出色的成績單，在2023年期間，至少可以推測Xilinx的FPGA產品應能維持一定的成長動能，藉此抵消Client營收積弱不振所帶來的衝擊。不過，雖然Intel的FPGA產品線在營收表現不盡理想，但產品線的開發上，仍持續導入Intel自有的先進製程，現階段Intel的最高階產品Agilex系列主要是採用Intel的10nm SuperFin與Intel 7製程，並搭配Arm不同的Cortex-A CPU與PCIe世代的組合，形成完整的產品布局。其中Agilex 7 FPGA產品也導入CXL與PCIe Gen5技術藉此在協助Intel的伺服器CPU進行記憶體資源調度與其他負載工作的分擔。而Agilex 9 FPGA產品線則聚焦於無線射頻領域應用，日前也獲得美國國防部採用，這不難看出Intel的FPGA產品線仍有相當的技術實力。

圖2. Intel Agilex系列產品基本介紹及其相關應用場景

Source：Intel；Collated by Ji-Pu Industrial Trend Research Institute 2023/07

*備註：Agilex 3相關規格與應用領域仍未公布

而AMD旗下的FPGA產品則以Versal系列做為主力，採用台積電的7nm製程，並採用Arm的雙核Cortex-A72 CPU設計，搭配不同矽IP的組合，來因應不同應用領域，現階段除了AI RF系列仍未有進一步的規格細節外，其餘五大系列產品皆有對應的應用領域。

圖3. AMD Versal FPGA產品線與應用領域對照圖

Source：AMD；智璞產業趨勢研究所整理 2023/07

眾所皆知，FPGA的應用範圍相對廣泛，且製程的先進程度可以採用略慢於CPU與GPU等產品，以現階段而言，Intel與AMD的FPGA產品線所導入的製程亦未落後太多，但以兩家業者的策略來看，基本上都是讓FPGA扮演配角的角色，盡可能協助CPU，分擔不同負載的運算工作，藉此讓整體系統的各項表現達到最佳化。也因此，單以伺服器場景來說，FPGA大多會以加速卡形式扮演其配角，若能通過各大OEM業者的認證許可，便能發揮其功能，不論是在5G訊號的處理、影像編解碼、AI的即時推論工作，皆是FPGA可以發揮的地方。