July_AI Chip Topics|AI Inference Chip Trend Analysis by the Edge of Hundreds of Controversies(Up)

Published On: 2025/07/30|Categories: 科技(Technology)|

As mentioned earlier, if AI agents are to be popularized on a large scale in the future, it is not enough to rely on "large-scale models" alone, and it is necessary to integrate computing capabilities at different levels, such as the cloud, edge, and endpoints, etc. Once LLMs are uploaded to the network, the cost of inference often exceeds the cost of training, and the scale of inference in a single day may be as high as hundreds of millions of tokens, and the computational efficiency and power consumption of chips will be magnified and examined at that time. Taking NVIDIA's Blackwell as an example, it is claimed that it can reduce the energy consumption and OPEX of LLM inference by 25 times, which highlights the importance of specialized hardware (e.g., GPUs, ASICs) in the inference scenario. In addition, cloud inference requires milliseconds of response latency, while edge devices are challenged by power consumption and thermal constraints. In summary, I believe that the two most critical challenges in today's inference technology are as follows, and will be the deciding factors in the competition for inference chips:

 

Challenge1: Throughput(Throughput) and Data Delay(Latency)balance optimization

Challenge2: Prefill(Prefill)Stage and decode(Decode)Stage specialization

 

Throughput in the first point refers to the amount of time a system can accomplish in a single hour.

For more details, please register or log in.Member Login.
News Release|Intel may eliminate 18A process nodes for foundry customers, leaving TSMC virtually unmatched!
July_AI Chip Topics|AI Inference Chip Trend Analysis by the Edge of Hundreds of Controversies(Next)
-For more information, please clickContact Us-
Share the article now!