In the second segment of Huang Renxun’s interview, he directly responded to the threat that TPUs and ASICs pose to NVIDIA. He emphasized that what NVIDIA is building is not a single AI chip, but an accelerated computing platform, with the focus on integration across the entire ecosystem. Just as in the U.S.-China chip war, in the AI race, it’s not about winning or losing at a single point—you have to see whether the whole technology stack can grow stronger at the same time.

In response to the criticism, “Since the essence of AI is massive matrix multiplication, why not let a more specialized TPU-like architecture take the lead?” Huang Renxun’s answer is: matrix multiplication is important, but it is not everything about AI. From new attention mechanisms, to hybrid SSM, diffusion and autoregressive fusion, to distributed execution of models and architectural innovation, progress in AI often comes from algorithmic breakthroughs, not just pushing forward the hardware via Moore’s Law.

Since NVIDIA has a lot of cash on hand—and has also been deeply involved in AI infrastructure and the model layer through investments in CoreWeave, Nebius, Nscale, and even OpenAI and Anthropic—why doesn’t it just step in and become a cloud services provider itself? Huang Renxun’s answer still comes back to that line: do the most of what is necessary, and the least of what is not necessary. This doesn’t fall under the scope of “If we don’t do it, nobody will.”

TPUs and ASICs are not without threats, but NVIDIA’s battlefield is bigger

Regarding the trend of Google TPU, AWS Trainium, and even OpenAI, Anthropic, and other large customers developing in-house or adopting alternative accelerators, Huang Renxun did not show a defensive posture. Instead, he repeatedly brought the focus back to: “What NVIDIA is doing is not a single AI chip, but an accelerated computing platform.”

He emphasized that NVIDIA is building accelerated computing—not just doing tensor processing. AI is certainly one of the most important applications today, but GPUs and CUDA can handle far more than AI. They include molecular dynamics, quantum chromodynamics, data processing, fluid dynamics, particle physics, drug discovery, image generation, and all kinds of scientific computing. This naturally gives NVIDIA’s market reach broader coverage than ASICs designed for a single workload.

In response to the criticism, “Since the essence of AI is massive matrix multiplication, why not let a more specialized TPU-like architecture take the lead?” Huang Renxun’s response is:

Matrix multiplication is important, but it is not everything about AI. From new attention mechanisms, to hybrid SSM, diffusion and autoregressive fusion, to distributed execution of models and architectural innovation, progress in AI often comes from algorithmic breakthroughs, not just pushing forward the hardware via Moore’s Law.

He put it very directly: if you rely only on transistor scaling, you might get an improvement of about 25% per year. But from Hopper to Blackwell, NVIDIA can achieve energy-efficiency leaps at the level of 35x, even 50x. That doesn’t come from pure process node advances—it comes from coordinated co-design across models, algorithms, networks, memory, system architecture, and CUDA.

That’s why Huang Renxun describes NVIDIA as an “extreme co-design company.” It’s not just making GPUs. It’s making coordinated changes to the processor, interconnects, networking, libraries, algorithms, and the entire system. Without a highly programmable stack like CUDA, it’s extremely difficult to achieve this kind of large cross-layer optimization.

The value of CUDA: installed base, trust, and global universality

When the host questioned whether, since big customers like OpenAI, Anthropic, Google, and AWS already know how to write kernels themselves and optimize frameworks themselves, CUDA still has such a strong moat, Huang Renxun responded from three angles.

First, ecosystem completeness and reliability. NVIDIA can provide extensive low-level support for frameworks such as Triton, vLLM, SGLang, and others, so researchers can build on a foundation that has already been thoroughly validated. For developers, the biggest fear isn’t making a mistake themselves—it’s not being able to tell whether the fault lies in their own code or in the underlying platform. One of the reasons CUDA is valuable is that it has been “run repeatedly,” making it trustworthy enough.

Second, massive installed base. Huang Renxun said plainly that if you are a framework developer or a model developer, what you absolutely want is an install base. You wouldn’t want to write software only for your own use—you want it to run on as many machines as possible. From A10, A100 to H100, H200, and then to cloud and on-prem, robots and workstations, CUDA is everywhere. This installed-base foundation means that once you develop it, you can cover a large number of systems worldwide.

Third, universality across clouds and across scenarios. Huang Renxun pointed out that NVIDIA is one of the very few compute platforms that can exist across all major cloud environments and on-prem setups at the same time. For AI companies, this means they don’t have to lock themselves into a single cloud provider too early, and they can deploy products more easily across different markets and scenarios.

In other words, CUDA’s value isn’t just that the “toolchain is convenient.” Instead, it combines ecosystem completeness, global installed base, and scenario universality to form a flywheel that is very hard to dislodge.

High gross margins aren’t driven by a software tax—they’re driven by “tokens produced per watt” and total holding costs

Regarding outside skepticism that NVIDIA can maintain high gross margins largely because of CUDA’s dominance—and whether those high margins might be eroded in the future if more customers have the capability to write kernels themselves and build alternative software stacks—Huang Renxun’s response is highly confident.

He noted that NVIDIA invests in engineering support personnel for major AI labs—“so many it’s surprising”—because GPUs aren’t as easy to manage as CPUs. Huang Renxun likened CPUs to Cadillacs: steady, easy to use, and everyone can get started with them. NVIDIA’s accelerators, on the other hand, are more like F1 race cars. In theory, anyone can drive them, but to squeeze performance to the absolute limit requires extremely high professional capability.

NVIDIA also uses AI assistance extensively to generate and optimize kernels on its own. As a result, when tuning together with customers, it can often further improve a certain model or stack by 50%, 2x, or even 3x performance. For customers that have large-scale GPU fleets, this kind of optimization is almost equivalent to directly doubling revenue.

Huang Renxun went a step further to argue that NVIDIA’s platform has the best performance per TCO across the world—meaning the best total cost of ownership efficiency ratio. He said that no one can truly prove that TPU, Trainium, or other platforms are superior to NVIDIA in overall cost and performance, and the market also lacks publicly available, credible, and side-by-side comparisons that can be positively confronted.

In his view, NVIDIA’s success fundamentally isn’t because customers are locked into CUDA. It’s because, under the same energy and the same capital expenditures, NVIDIA can produce the most tokens, which are then converted into the most revenue. For customers building AI data centers at the 1GW scale, the most important factor isn’t whether any single chip is cheap or not—it’s whether the entire data center can generate the maximum revenue. As long as NVIDIA remains the global best in tokens per watt and perf per dollar, the high gross margins are justified.

Why isn’t NVIDIA turning itself into a hyperscaler?

Huang Renxun’s answer still comes back to that line: “Do the most of what is necessary, and the least of what is not necessary.”

If NVIDIA didn’t build CUDA, NVLink, CUDA-X, a variety of domain libraries, and the underlying platform, then these things likely wouldn’t get built by anyone at all—so NVIDIA has to build them itself. But if it’s about cloud services, there are already many providers in the world, and that doesn’t fall under the scope of “If we don’t do it, nobody will.”

However, when new-genre AI cloud service providers are still small and weak—and may need a helping hand to get off the ground—NVIDIA is willing to provide funding, supply, and technical support to help this ecosystem grow. In other words, NVIDIA is willing to nurture the ecosystem, but it doesn’t want to personally become a financier or a hyperscaler.

As for investing in model companies like OpenAI and Anthropic, Huang Renxun also admits that this is actually a learning result from NVIDIA in recent years. In the past, NVIDIA didn’t realize that companies like OpenAI and Anthropic—foundation model companies—could not, in their early stages, complete the required capital intensity through the traditional VC model. Only after it truly understood this did it realize that, if it had the opportunity, it could have supported these efforts earlier.

He even said frankly that this counts as one of his own misjudgments: “At the time, I didn’t understand deeply that without support from large technology companies or capital at a similar level, these companies would be extremely hard to get established.” Now that NVIDIA has a larger scale, he also said he wouldn’t make the same mistake again.

The China issue: the sharpest segment of the entire conversation

The most intense back-and-forth in the entire interview centers on China and chip export restrictions. The host’s stance is that AI compute power is a direct input for training and deploying high-risk models. If China gains access to more advanced compute, it could build models with capabilities such as network attacks and vulnerability mining more quickly. This would pose a real risk to U.S. national security and corporate security.

Huang Renxun doesn’t deny that AI carries risks, nor does he deny that the U.S. should continue to maintain its lead. But he strongly opposes the extreme conclusion that equates AI chips with nuclear weapon materials or the idea that “if you sell a little more, it will cause trouble.”

His core arguments include several points.

First, he believes China is not a vacuum with no compute. China has abundant energy, chip manufacturing capacity, and communication and network infrastructure, and it also has an enormous share of AI research talent globally. In Huang Renxun’s account, China isn’t “unable to develop AI without NVIDIA chips.” Instead, it’s “if it can’t get the best, it will use its own—and be forced to build a local technology stack faster.”

Second, he believes that a side effect of export restrictions is that they force China’s open-source models, ecosystem, and chip industry to accelerate their divergence from the U.S. technology stack. In his view, that is a risk the U.S. should be more worried about in the long run. Because AI isn’t only about models—it includes the chip layer, the developer tools layer, the open-source ecosystem layer, the application layer, and the entire stack. If the U.S., to protect a specific layer—for example, the most cutting-edge model companies—sacrifices the impact on China’s market of the entire chip and developer ecosystem, then over the long term it may actually cause the U.S. to lose its position in global standards and platform competition.

China is the world’s second-largest technology market and also one of the largest contributors to open-source software and open-source models. If the U.S. voluntarily gives up this market, it effectively pushes an entire group of developers toward another technology stack. This doesn’t just harm NVIDIA—it harms the entire U.S. technology industry and national security.

Third, he repeatedly emphasized that the world isn’t an endless series of zero-sum extreme extrapolations. The U.S. should certainly have the most, the best, and the earliest compute power—he fully agrees with that. But that doesn’t mean the U.S. should proactively give up the world’s second-largest market, or describe AI as an absolute weapon similar to something like enriched uranium. For him, overly extreme narratives not only don’t help policy-making—they may also scare off talent, weaken industry confidence, and ultimately cause the U.S. to lose its competitive advantage.

He even brought this back to domestic industry policy context: “If the U.S., out of fear, over-weaponizes AI, it will also, along with it, cause more people to resist investing in software, engineering, and related fields.” In his eyes, this fear-based policy is a “loser mentality,” not the posture a country should have when leading a technological revolution.

What Huang Renxun truly wants to say is: “The AI race isn’t a single-point contest. It’s about whether the entire technology stack can grow stronger at the same time.”

This article Huang Renxun’s latest interview (Part 2): Why doesn’t Nvidia do Hyperscaler itself? First appeared on Lianxin News ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Comment

0/400

No comments