This is vol.6 of @hodlongpod Picks, a column by @mablejiang that features the mandarin podcasts and interviews done with top Chinese AI / robotics companies' executives. This article was created with a proprietary AI workflow based on the only 3 public interviews of Liang Wenfeng (founder of DeepSeek) in Chinese. Specifically, it draws on two exclusive interviews with Liang Wenfeng by Yu Lili for 36Kr's Waves in May 2023 and July 2024, and a detailed report by Cheng Manqi for LatePost in February 2026
Liang refused funding, refused overtime, refused to go closed-source. In three years, he built an AI lab that rattled Silicon Valley. Now comes the hard part.
On the day DeepSeek released its V2 model in May 2024, the pricing structure of China's entire large language model industry collapsed within a week.
Inference cost: one yuan per million tokens. One-seventh the price of Meta's Llama 3 70B. One-seventieth of GPT-4 Turbo. The press called it "the Pinduoduo of AI" — a reference to the discount e-commerce giant that had upended Chinese retail. Within five days, Zhipu, ByteDance, Alibaba, Baidu, and Tencent had all slashed their own prices. A price war nobody had anticipated was underway.
The real shock, though, wasn't the price. It was the math behind it: DeepSeek was turning a profit. This wasn't a cash-burning subsidy play. The company had genuinely engineered costs down to a level that made competitors' business models look broken.
The man behind DeepSeek was Liang Wenfeng. A year earlier, he had been virtually invisible — a quiet technologist running a quantitative hedge fund. A year later, Anthropic co-founder Jack Clark was publicly declaring that his team employed "profoundly talented individuals," predicting Chinese AI models would become a force as formidable as drones and electric vehicles.
A year after that, by the spring of 2025, DeepSeek would become the most talked-about tech company in China — and an organization being reshaped by the pressures of its own success.
This is a story about curiosity, original innovation, and technological idealism. It is also a story about what happens when idealism meets reality.
From a Chengdu Apartment to a Hundred-Billion-Yuan Fund
Liang Wenfeng was born in 1985 in a small town in Guangdong province. His father was an elementary school teacher. While studying AI at Zhejiang University, Liang became convinced that artificial intelligence would change the world. In 2008, almost nobody agreed.
After graduating, he bypassed the obvious path — no big tech company, no comfortable engineering salary. He rented a cheap apartment in Chengdu and spent years failing to apply AI across various industries before finally landing on finance. High-Flyer, his quantitative hedge fund, was born. A small footnote: in those early years, a friend tried to recruit him into a drone startup operating out of an urban village in Shenzhen. Liang passed. The friend's company became DJI.
High-Flyer was an outlier in Chinese quant — nearly every major fund in the country had founders with Wall Street or European hedge fund pedigrees. High-Flyer was purely homegrown. By 2021, six years after its founding, it was managing over 100 billion yuan and counted among the "Big Four" of Chinese quantitative finance.
In May 2023, Liang spun out his AI ambitions into an independent organization called DeepSeek, transferring over ten thousand Nvidia A100 GPUs and a team of AI engineers. He quoted Truffaut:
"Be fiercely ambitious, and be fiercely sincere."
His explanation for the move was characteristically blunt: "AGI might be the next hardest thing. For us, this is a question of how to do it, not why."
The plan: no vertical industry tools, no ChatGPT clone, no applications. Pure research toward artificial general intelligence. No fundraising, no closed source.
"If you insist on finding a business justification," he conceded, "you probably can't. It doesn't pencil out."
The Architecture Revolution
What made DeepSeek V2 possible wasn't incremental optimization. It was a ground-up redesign of how a language model computes.
DeepSeek's MLA (Multi-head Latent Attention) mechanism slashed memory usage to between 5 and 13 percent of the standard architecture — a radical modification to the Attention mechanism that serves as the computational backbone of every modern language model. Since Attention was introduced in 2017, the number of teams worldwide that have successfully altered it at scale can be counted on one hand. DeepSeek's proprietary Mixture of Experts sparse structure compressed computation further. Together, these innovations produced the cliff-drop in costs that detonated the price war.
Silicon Valley's reaction bordered on disbelief. SemiAnalysis' chief analyst called the V2 paper "possibly the best of the year." Former OpenAI engineer Andrew Carr described it as "remarkably insightful."
Liang seemed genuinely caught off guard. "We never intended to be a catfish," he said, using the Chinese metaphor for a market disruptor. "We accidentally became one. We just calculated our costs and set a price. Our principle: no subsidies, no excessive profits."
But the deeper story wasn't the price war — it was the choice that made it possible. Most Chinese AI companies had followed a straightforward playbook: adopt Meta's Llama architecture, replicate, deploy, monetize. DeepSeek had chosen the harder road.
Why?
"Because the most important thing right now is to participate in the global wave of innovation," Liang said. "For years, Chinese companies got used to a division of labor: the West innovates, we commercialize. But that's not some natural law. Our starting point in this wave is not to cash in — it's to reach the technological frontier and advance the entire ecosystem."
His diagnosis of the China-US AI gap cut against the consensus: "People talk about a one-to-two-year gap. But the real gap is between originality and imitation. Until that changes, China remains permanently subordinate."
He situated this in a longer arc: "In thirty-plus years of IT, China has essentially missed out on genuine technological innovation. We got used to Moore's Law appearing on schedule, to chips and software improving every eighteen months. We treated Scaling Laws the same way. But these were created across generations by Western technical communities. We overlooked them simply because we weren't part of the process."
Nvidia's dominance, he argued, was not one company's achievement but the cumulative product of an entire ecosystem. China needed to build something comparable — and that meant people willing to work at the frontier, not just follow from behind.
No Profoundly Talented Individuals
When Jack Clark praised DeepSeek's "profoundly talented" team, Liang's response carried a quiet defiance.
"There are no profoundly talented individuals here. Fresh graduates from top universities. PhD students who haven't finished their dissertations. People a few years out of school. The V2 team had no overseas returnees — they're all homegrown. The world's top fifty AI talents may not be at Chinese companies, but maybe we can develop them ourselves."
This was the same hiring philosophy that had built High-Flyer: select for raw ability, not credentials. "Experienced people will tell you without thinking how something should be done," Liang explained.
"Inexperienced people will experiment, struggle, think hard, and find a solution that actually fits the current situation."
The MLA breakthrough was a case in point — a young researcher, studying the mainstream evolution of Attention architectures, conceived an alternative design. It took months of team-building and validation to make it work. This kind of distributed inspiration wasn't a management outcome; it was a product of organizational design.
"DeepSeek operates entirely bottom-up. We don't preset divisions. Everyone carries their own ideas — you don't need to push them. But when an idea shows potential, we allocate resources from the top."
How did a company with no fundraising and no public profile attract talent?
"Because we're working on the hardest problems. The greatest talent magnet is tackling the world's most difficult challenges."
His answer to the moat question was equally unconventional: "Our moat is in our team. Our people gain experience, accumulate know-how, and build an innovative organization and culture. That's what's real." He dismissed closed-source as a viable strategy — "Even OpenAI's secrecy can't prevent being surpassed" — and rejected the premise that existing business models applied to AI at all: "All business models are products of the previous era. Applying internet monetization logic to AI is like discussing General Electric and Coca-Cola when Pony Ma was starting Tencent — it might be a complete anachronism."
After R1 Went Viral
In early 2025, DeepSeek-R1 exploded into mainstream attention. Almost overnight, DeepSeek became the most watched tech company in China.
And then every tension that idealism had held at bay came rushing in at once.
According to a detailed April 2025 investigation by LatePost, DeepSeek found itself at what the publication called "a turning point." The company experienced its most concentrated wave of talent departures since its founding — not a panicked exodus, but a series of precise, targeted poaching operations.
Wang Bingxuan, a key author of DeepSeek LLM, was recruited by Tencent's Yao Shunyu. Wei Haoran, lead author of the DeepSeek-OCR series, left around the Spring Festival for what was likely a major tech company. Guo Daya, a core contributor to R1, recently resigned. Ruan Chong, a core contributor to the multimodal Janus-Pro research, joined autonomous driving company DeepRoute.ai in January.
Four core researchers departed within months. These were not interchangeable engineers — they were named authors on DeepSeek's most important product lines.
The competing offers were blunt instruments. Headhunters told LatePost that doubling or tripling salaries for DeepSeek researchers was "not a problem," and that some companies were extending eight-figure total compensation packages including stock and options. DeepSeek, which had never raised outside funding and had no established valuation, found itself at a structural disadvantage. As rival startups MiniMax and Zhipu marched toward IPOs with eye-catching valuations, and as Stepfun and Moonshot's public listings moved onto the horizon, DeepSeek team members began asking an uncomfortable question: what were their unpriced stock options actually worth?
LatePost noted that "some left, but more chose to stay" and that the team "did not experience group departures." But the damage from precision poaching was real — each named author who walked out the door took with them the deepest experience on a specific research line.
A Man Who Resists Noise
Through all of this, Liang's response was remarkably still.
Someone close to him described him to LatePost as "a person who is exceptionally resistant to noise." In the spring of 2025 — with DeepSeek dominating national headlines, competitors raiding his roster, and the outside world pressing for V4 release dates — Liang continued doing what he had always done: reading papers, writing code, joining small-group technical discussions.
He was described as "more researcher than CEO," someone with "no CEO aura." When he talked to people, the conversation stayed on technical specifics, not strategy or fundraising or market positioning.
His management philosophy held: do fewer things, but do them to the extreme.
This showed everywhere. While developers at Google, OpenAI, xAI, and ByteDance routinely worked seventy- to eighty-hour weeks, most DeepSeek employees left the office by six or seven in the evening. No morning check-in. No explicit performance reviews. No deadlines.
Liang's reasoning was precise: "A person can rarely produce more than six to eight hours of high-quality output per day." But the deeper logic was economic. At DeepSeek, every experiment consumed expensive GPU time. A bad call made under fatigue could waste millions of yuan in compute. As Liang put it:
"Sloppy judgments from exhaustion waste precious computing resources. The cost outweighs any benefit."
The team remained strikingly small — just over a hundred researchers, no second-in-command, only two layers: Liang and everyone else. Sometimes a new research direction started simply because three or four people agreed an idea was promising and organically coalesced into a working group.
This organizational style had worked almost perfectly when DeepSeek was a small, obscure research outfit. But when it became the most scrutinized AI company in China, the same qualities that had made it distinctive began generating friction.
Two Unfashionable Bets
While the AI industry chased agents and multimodal generation, Liang staked DeepSeek's future on two less glamorous priorities.
The first was building large models on domestic Chinese chip ecosystems. As US export restrictions on high-end semiconductors continued to tighten, this was not merely a technical preference — it was a survival imperative. DeepSeek invested in adapting to domestic GPUs, adopted the Chinese open-source compiler TileLang as a replacement for Nvidia's Triton, and was already working on FP8 data compression formats designed for "next-generation domestic chip architectures."
The second was what Liang called "original-style innovation" — research lines that wouldn't generate revenue anytime soon but pointed toward a fundamental understanding of intelligence: the Janus series for unified multimodal architectures, the Prover series for formal mathematical verification, OCR research, continuous learning, autonomous learning. DeepSeek had even begun recruiting advisors with backgrounds in neuroscience and brain science, looking for learning mechanisms closer to how the human brain actually works.
Meanwhile, competitors were iterating at speed. After R1's release, LatePost reported, Zhipu shipped five model updates, MiniMax shipped four, and Moonshot's Kimi shipped three — all with targeted improvements to agent and coding capabilities. DeepSeek released V3.2 with agent enhancements, but its overall iteration cadence was visibly slower.
The application gap was starker still. OpenRouter data showed that among the top ten applications by token consumption over the previous thirty days, six came from Chinese companies — but DeepSeek-V3.2 ranked only twelfth.
Liang's stance on multimodal generation was characteristically contrarian: he believed it was "not the main line of intelligence," and so DeepSeek had "barely invested" in the direction. In an industry where most players treated multimodal as the next major battlefield, this was a conspicuously lonely position.
The Tension at the Core
This was the central contradiction DeepSeek faced in the spring of 2025.
Liang's priorities were ecosystem building and original exploration — domestic chip adaptation, novel architectures, probing the nature of intelligence itself. The industry's default priority was simpler: ship the strongest model, stay on top of the leaderboard, repeat.
These goals overlapped, but they did not perfectly align.
DeepSeek's distinctive approach to AGI was its greatest asset and the primary source of the friction it now faced. When your competitors release a new model every month, when your core researchers are being courted with doubled salaries, when your team members are uncertain about their equity's value — "do fewer things, do them to the extreme" becomes a harder sentence to live by.
And DeepSeek confronted a constraint that made every choice sharper: relative to its peers, it simply did not have as much computing power to throw at multiple exploratory directions simultaneously. Every research bet was a real bet.
The V4 timeline reflected these pressures. According to LatePost, a smaller-parameter version of V4 had been provided to open-source framework communities for adaptation as early as January 2026. The original optimistic target was a large-parameter release around the Spring Festival in mid-February, but this was pushed to April. The prevailing external expectation was that V4 would "most likely still be the strongest open-source model, but unlikely to be dominantly so."
Signs of Recalibration
LatePost's reporting also captured subtle signals of change.
Liang — the man who had been dismissive of fundraising, who had reportedly demanded that potential investors accept a cap on returns — was now "beginning to find ways to put a valuation on the company and give team members more certainty about their expectations."
DeepSeek would "invest more in products." For the first time, its job postings mentioned specific product names. It began recruiting "model strategy product managers" focused on agents. LatePost's assessment: "Going forward, you will certainly see more moves from DeepSeek on agent products."
Did this mean Liang's technological idealism was compromising?
Perhaps. But perhaps it was simply evolving. The Liang Wenfeng of 2023 could afford to say "if you insist on finding a business justification, you probably can't" — back then, DeepSeek had a few dozen people, zero public attention, and no poaching pressure. The Liang Wenfeng of 2025 faced a fundamentally different landscape: the livelihoods and expectations of over a hundred researchers, an accelerating global AI race, the urgency of domestic chip independence, and an organization that could no longer be described simply as "curiosity-driven."
"We Just Need Facts and Process"
In his July 2024 interview, Liang had told a small story.
"I grew up in a small Guangdong town in the eighties. My father was an elementary school teacher. In the nineties, money-making opportunities were everywhere. Parents would come to the school and say there was no point in education. But look at how things have changed now."
It was a compressed history of modern China: a generation that had chased fast money because fast money was there to be chased. But the shortcuts were vanishing. And as people reckoned with the fact that the old windfalls had been partly luck, some were discovering an appetite for genuinely difficult things.
"Hardcore innovation will only grow," Liang said. "We just need facts and process."
By the spring of 2025, DeepSeek stood at a delicate juncture. People were leaving; more were staying. External competition intensified; internal tensions surfaced. Every strategic choice Liang had made — no fundraising, no overtime culture, no closed source, no trend-chasing — looked, by conventional business logic, like a mistake. And now he appeared to be recalibrating some of those choices.
But return to that moment in May 2023. Return to Truffaut's words. The logic becomes clearer: be fiercely ambitious, and be fiercely sincere. Sincere enough to acknowledge the weight of reality. Sincere enough to adjust without betraying the core conviction.
Not everyone can sustain that kind of intensity forever. But as Liang once said:
"Most people, in their young years, can throw themselves into something with absolutely no ulterior motive."
DeepSeek is not yet three years old. It is still young. But it is growing up.






.png)




