Yao Shunyu: Let Me Go a Little Crazy! Training Models at Anthropic & Gemini, Heroism Is Over

Full Transcript

https://www.youtube.com/watch?v=ttkd0t5qTD4

[00:00] English subtitles were generated by AI and are for reference only.
[00:09] Hello everyone, I'm Xiaojun.
[00:11] Today our guest is Yao Shunyu, a researcher at Google DeepMind.
[00:14] There are two famous Yao Shunyus in Silicon Valley.
[00:16] One previously worked at OpenAI, then jumped ship to Tencent to become their Chief AI Scientist.
[00:22] He's been on our show before.
[00:23] Today I've invited the other Yao Shunyu.
[00:26] He was previously at Anthropic.
[00:28] Now he's at Google DeepMind.
[00:30] We'll start by talking about the recent series of massive model changes.
[00:34] So next is my interview with Shunyu.
[00:37] Anthropic as a company.
[00:38] It's able to implement this kind of relatively top-down mechanism is something quite unique.
[00:43] But is this difficult for other model companies?
[00:45] Very difficult. For example, OpenAI can't do it.
[00:47] And Gemini also finds it difficult.
[00:49] Big companies and startups.
[00:51] Their strategies are fundamentally different.
[00:53] Because for startups, what's important is making bets.
[00:57] I have to bet on something.
[00:59] I think everyone right now is basically
[01:01] Everyone is a surfer.
[01:02] Fundamentally it's a wave.
[01:04] Not the surfer.
[01:05] But anyway, it just feels like this AI thing doesn't really require much brains.
[01:09] Doesn't require much brains.
[01:10] Really doesn't require much brains.
[01:11] Then what does it require?
[01:12] I think in this industry, the most important trait is being reliable.
[01:17] Being detail-oriented.
[01:19] And taking responsibility for what you do.
[01:21] These are the most important traits.
[01:28] Aren't there two Yao Shunyus in Silicon Valley?
[01:29] Why don't you first introduce yourself to everyone and then explain to everyone the difference between the two Yao Shunyus?
[01:35] Ah.
[01:35] Sure, yeah.
[01:36] So my name is Yao Shunyu.
[01:39] And obviously there's also a friend with an almost identical name (Yao Shunyu, Chief AI Scientist at Tencent, former OpenAI researcher).
[01:44] And our main career paths also have some overlap (overlap).
[01:46] So it might look very difficult to tell us apart.
[01:48] Yeah, and I used to study physics.
[01:52] I did my undergrad at Tsinghua.
[01:55] I worked on condensed matter theory back then.
[01:57] Then later went to Stanford to do theoretical high-energy physics.
[02:01] And quantum information and black hole-related areas.
[02:04] After leaving Stanford, went to Berkeley.
[02:08] Briefly stayed for two weeks as a postdoc (postdoctoral researcher).
[02:11] Then quit.
[02:13] And went to Anthropic.
[02:14] Stayed at Anthropic for a year.
[02:17] Around late September to early October last year, joined Gemini.
[02:21] Yeah, and if everyone insists on telling us apart.
[02:24] I think the biggest difference is that Shunyu, he has always been doing CS from the start.
[02:29] Computer science-related stuff.
[02:30] While I actually, in a sense, came to this halfway.
[02:34] Yeah, I mainly did theoretical physics before.
[02:37] Yeah.
[02:38] Are you two good friends?
[02:39] You guys seemed to have known each other since college.
[02:40] And you were in the same year, right? (Yes)
[02:42] What kind of person is he?
[02:43] What kind of person are you?
[02:43] Evaluate him.
[02:44] Evaluate yourself too (hahaha).
[02:46] Yeah yeah, we knew each other since undergrad.
[02:47] Because we were in the same year in undergrad.
[02:49] At Tsinghua.
[02:49] But he, of course, he studied computer science from the start.
[02:51] So he was in that Yao Class, the computer science experimental class.
[02:53] And I studied physics.
[02:55] So I was in the Ji Class.
[02:56] Yeah, and later he went to Princeton.
[02:58] I went to Stanford.
[02:59] This might also be another somewhat puzzling point.
[03:02] Which is, it seems like in the general world people think Stanford is where computer science people should go.
[03:08] And think Princeton is where physics people should go.
[03:10] But we happened to do the opposite.
[03:11] Haha.
[03:12] So that might also have caused some confusion.
[03:15] And we really are quite different.
[03:18] I think he's a much more interesting person than I am.
[03:20] I think I've also learned a lot from him.
[03:22] In the past as well.
[03:23] I've been able to learn things that are quite different from my own strengths.
[03:26] For example, he probably spends a lot of time thinking.
[03:29] Like in AI.
[03:29] He spends a lot of time thinking about.
[03:31] Human-AI interaction.
[03:33] And also some product-related things.
[03:36] And I think, for me,
[03:38] He's a very different kind of friend.
[03:40] And I've also learned a lot from him.
[03:41] When you were in Silicon Valley,
[03:42] How often did you meet?
[03:43] Do you still call each other frequently now?
[03:45] How frequently?
[03:47] We did meet quite frequently when we were in Silicon Valley.
[03:51] Maybe every few weeks.
[03:53] But it seems like we mainly met just to hang out.
[03:56] Hahaha.
[03:57] Doing what?
[03:58] Well.
[03:59] It was really just purely for fun.
[04:00] Like going out for a walk.
[04:02] And chatting about random stuff.
[04:04] And sometimes having a meal.
[04:06] Playing cards or something like that.
[04:07] Right, haha, right.
[04:08] And after he went back.
[04:09] We actually still.
[04:10] Often call each other.
[04:12] What did you talk about in the most recent call?
[04:13] I think it was one or two weeks ago.
[04:15] Ah.
[04:16] How did you know?
[04:17] Uh, probably just.
[04:19] Every few months.
[04:20] Then we catch up a bit.
[04:23] Share recent updates, yeah.
[04:25] Has he tried multiple times to get you to join him?
[04:27] Uh.
[04:29] Hmm.
[04:30] Ha.
[04:31] Maybe he does, I guess.
[04:32] But, but.
[04:33] I don't think it matters.
[04:34] It doesn't matter, hahaha.
[04:35] Why don't you go?
[04:36] I think for myself.
[04:37] I.
[04:38] Haven't figured it out yet.
[04:39] Yeah, I think it's mostly my own reasons.
[04:41] And then.
[04:43] I didn't join any.
[04:46] Chinese companies either.
[04:46] And I think the main reason is.
[04:49] Around September or August-September last year.
[04:53] I think.
[04:54] When I left.
[04:55] Left Anthropic.
[04:56] And when deciding where to go after leaving, my biggest motivation was.
[05:01] I wanted to learn something different.
[05:03] Yeah, for me I probably didn't consider.
[05:06] No, no.
[05:07] More seriously consider being able to lead a project or lead a project or something.
[05:12] I was more at that time more focused on prioritizing learning something new.
[05:16] So that's why I chose to go to Gemini, right?
[05:19] I noticed you two are always being compared and discussed together.
[05:22] Is it more of a bother or more enjoyable for you?
[05:24] I don't really feel anything about it.
[05:27] And because I'm not really someone who pays attention to social media.
[05:31] So I really don't feel anything about it.
[05:34] Yeah.
[05:35] Because Shunyu, he said last year, AI has entered the second half.
[05:40] Entered the second half.
[05:41] This became a very famous viewpoint.
[05:44] What do you think of today's AI?
[05:45] What stage is it at?
[05:46] Can you give it a definition?
[05:47] Yeah, for me, I might not see so clearly what the first half means, what the second half means.
[05:54] Or rather, this definition has never been particularly clear to me.
[05:57] For me, AI has indeed entered a stage where I think everyone has started to worry less about one thing.
[06:04] Whether AI can do it.
[06:06] And more about whether the problem itself is well-defined.
[06:09] Yeah, I think this is a huge difference.
[06:11] For example, I think a year ago.
[06:13] Or maybe early last year.
[06:15] At that time.
[06:16] I was at Anthropic.
[06:17] And what everyone was worried about was like, 'Hey,'
[06:21] OpenAI's reasoning is so strong.
[06:23] Do we have a chance to catch up?
[06:25] And how likely are we to surpass them?
[06:28] Everyone was still very worried about this.
[06:29] I think now, at least among.
[06:31] At least among Gemini, OpenAI, and Anthropic, these three.
[06:34] I don't think any of them.
[06:35] Is really worried about not catching up.
[06:38] Mm-hmm.
[06:38] And I think what might be harder for everyone now.
[06:40] Is.
[06:42] Figuring out what to actually do.
[06:44] This is something that.
[06:46] I think is.
[06:47] Is a bet.
[06:48] But also.
[06:50] I think it's also.
[06:51] Something that requires a lot of human insight.
[06:51] Yeah.
[06:54] So that also means model capabilities have been leveled out.
[06:58] Right?
[06:58] They've become homogenized.
[06:59] Commoditized.
[07:00] So there's not a huge difference between the models.
[07:03] In terms of good versus bad, there's not a huge difference.
[07:06] But they need to differentiate.
[07:08] I think from the actual user experience, you can feel the differences between these three companies' models.
[07:15] But the hard part is in the past, you could see this difference on paper too.
[07:20] What do you mean by 'on paper'?
[07:22] 'On paper' means, like, publicly available there are many kinds of benchmarks these standardized measurement frameworks.
[07:26] And for example, people used to look at SWE-bench.
[07:29] Yeah, yeah, yeah, you could look at SWE-bench.
[07:31] And for math, back then people would compare things like simpler ones AIME and harder ones like IMO.
[07:36] Back then it felt like you could tell just from the numbers.
[07:39] 'Hey, this model seems stronger at reasoning', 'that model seems stronger at coding', 'that model is stronger at this.'
[07:44] Now, on paper, everyone is actually pretty close.
[07:47] And when you look at the numbers on paper like looking at SWE-bench you'll find, 'Hey, it seems like the best is only maybe one percentage point or two percentage points better than the not-so-good ones, but actually everyone is around 80%.
[07:55] A slightly higher number around there or a slightly lower one is mostly just noise.
[08:02] It's mainly just noise rather than signal.
[08:03] Yeah. But on the other hand,
[08:06] In actual usage, people can still experience the differences.
[08:08] I think, mm-hmm.
[08:11] From what I personally know, Claude is still the more general-purpose in terms of this tool-using agent, the best-performing one.
[08:22] And in pure coding, maybe Codex has caught up a bit recently, narrowing the gap a little.
[08:30] And Gemini might be better at pure reasoning and in some more everyday usage scenarios it might still be better for now.
[08:40] And then in in coding and agents, it's still in a state of catching up.
[08:45] Mm-hmm.
[08:45] These capabilities—are they deliberately choosing which direction to prioritize or is it simply a matter of good versus bad?
[08:52] Is it a capability issue or a prioritization issue?
[08:55] I think there is actually an element of prioritization involved.
[08:59] Especially in the past, it was mainly about prioritization.
[09:01] When everyone could see the differences on paper,
[09:06] Prioritization was definitely the dominant factor.
[09:09] Because maybe like Claude has always valued this tool-use capability more.
[09:16] And including coding.
[09:17] So maybe OpenAI also placed a lot of emphasis on reasoning for a while.
[09:20] Yeah, and of course now they're starting to focus on coding too.
[09:21] So back then, prioritization definitely accounted for most of it.
[09:23] Because if you're more willing to prioritize something, it means you can spend more effort building the right infrastructure.
[09:30] The right infrastructure, building the right data, and especially data, it's something that in a sense takes a lot of time and effort.
[09:38] Right, so back then, it was definitely driven by willingness.
[09:41] But at this point, I think both factors are at play.
[09:47] Because well, on paper, everyone looks pretty similar, and even if you do some more internal testing, the numbers become not that different.
[09:59] And then the harder thing becomes how you define your problem, define the behavior you want.
[10:08] isn't defined very clearly.
[10:09] a lot of the model differences actually come from things
[10:13] that you wouldn't even imagine.
[10:15] right?
[10:16] by 'things you wouldn't imagine,' I mean
[10:17] I mean, things you wouldn't imagine.
[10:21] if you ask me now,
[10:23] it's hard for me to give you a very clear answer.
[10:25] maybe after some time, looking back,
[10:27] I'll be able to give a clear answer.
[10:28] but I can give an example
[10:31] of something you wouldn't imagine.
[10:33] like, if we go back
[10:36] maybe one, two, or even three years
[10:38] back then, if you went online
[10:42] to collect pre-training data,
[10:44] you'd see models learning to write code.
[10:48] of course, there wasn't this agentic way of writing code back then.
[10:50] it was just writing a piece of code (mm-hmm).
[10:51] and you'd find that
[10:52] models wrote code very well.
[10:54] but back then, people didn't know why.
[10:56] but the unexpected reason behind this might be
[10:58] if you just randomly collect from the web
[10:59] without any data filtering,
[11:03] naturally,
[11:04] the quality of code data would be a bit higher than others.
[11:07] because if you look at web pages
[11:08] You'll find GitHub's quality is significantly higher than other normal web pages.
[11:13] Before we get into today's topic, I'd like to talk about some recent news about our models.
[11:18] You see, everyone's been talking about OpenClaw recently.
[11:22] As a frontline researcher, what do you think of this new product form?
[11:25] What discussions are happening around you?
[11:27] What's interesting is, I feel like the discussion outside the industry seems more intense than inside the industry.
[11:36] Oh, no one inside the industry is talking about it?
[11:38] People inside are talking about it, but I think for industry insiders, it's not really, um, a particularly surprising thing.
[11:45] Oh, what do you mean?
[11:46] Like, maybe inside the company, some people have already done similar experiments or demos like this.
[11:53] It's just that it wasn't packaged as a product and seriously marketed, polished and launched.
[12:00] Right, and of course, the reality is, if you look at OpenClaw, the earliest version of the code on GitHub, actually, that code was, in a sense.
[12:08] not particularly clean.
[12:10] but I think what's important is it showed everyone this possibility.
[12:15] mm-hmm, and after showing this possibility, the OpenClaw author himself joined OpenAI.
[12:22] and then then probably these model labs or some larger startups will catch up quickly and polish this into a truly usable product.
[12:31] mm-hmm (right), so I understand.
[12:32] actually, before OpenClaw was released, people at Google were already working on this.
[12:35] it just hadn't been released yet.
[12:36] because big companies have longer processes.
[12:39] right, my, my, my.
[12:40] at least personally, that's the impression I've gotten.
[12:45] What we're seeing is exactly that.
[12:46] Right.
[12:46] So behind this product form, similar to OpenClaw, what does that inherently tell us?
[12:50] At this point in early this year, I think, actually, technically speaking, it doesn't really prove much.
[12:59] I mean, this OpenClaw product, of course it relies on many things the model can do, but those capabilities weren't actually only ready by early this year.
[13:10] I think maybe last year, like when Opus released 4.5 (Claude series), and then, and then...
[13:17] of course back then, Opus was actually ahead of OpenAI and Gemini 3 in terms of tool use capabilities.
[13:22] So I think at that point, doing this thing, it was already something you could demonstrate.
[13:26] And actually, it didn't blow up immediately upon release.
[13:29] It only went viral some time after the launch.
[13:32] Hmm.
[13:32] So, for me personally, technically it's not really something so surprising.
[13:42] It's a natural overflow of model capabilities.
[13:43] Right, right, right, I'd say so.
[13:45] But I think the surprise for everyone might be that perhaps nobody had realized this before.
[13:50] It made everyone realize this could actually be done.
[13:51] Realize what?
[13:52] Realized that you can, like, let the model do very...
[13:55] I mean, you can control many different models and do many different things, and then aggregate all of that, and after aggregating, do this kind of very, very, very long-horizon task.
[14:02] This kind of work.
[14:03] I think maybe previously, people hadn't widely reached a consensus on this.
[14:08] This thing showed everyone this kind of possibility.
[14:13] You see, what went viral early last year was Manus,
[14:15] and what went viral early this year is OpenClaw.
[14:16] So from Manus to OpenClaw,
[14:18] what changed?
[14:19] Is it a change in model capabilities,
[14:19] or a change in the product?
[14:20] This is also something I've never really understood.
[14:23] Hmm.
[14:23] Like,
[14:27] What is the qualitative difference between Manus and OpenClaw?
[14:32] It's something
[14:32] I actually haven't quite figured out myself.
[14:34] To be honest, haha.
[14:36] OK.
[14:37] Hmm, like,
[14:38] or in other words,
[14:39] maybe OpenClaw went viral,
[14:44] but if you were to ask me retroactively,
[14:46] why Manus couldn't do this,
[14:49] I don't understand why Manus couldn't do it.
[14:51] Maybe they just didn't get it right.
[14:54] But you see,
[14:55] whether it's Manus or OpenClaw,
[14:56] they both chose to sell.
[14:57] Manus was sold to Meta (Note: This acquisition has since been revoked; our program was recorded before the revocation).
[14:58] OpenClaw was sold to OpenAI.
[14:59] What does this phenomenon tell us?
[15:01] Why did they both sell?
[15:03] I think, hmm,
[15:04] my own feeling is that for something
[15:08] to survive long-term,
[15:11] it still needs to have some moats.
[15:14] The moat is the model.
[15:15] I think at least for now, many moats are on the model side.
[15:21] But whether product-side moats will emerge in the future, I think that's hard to say.
[15:24] Because everyone... This is all an age-old topic in the market.
[15:29] Many people talk about this. Things like data flywheels and such.
[15:33] For now, I don't think there's any scenario that has truly formed a data flywheel.
[15:39] Even purely AI-native application scenarios.
[15:43] I think currently, besides agentic coding, other than writing code, there's no scenario that is truly AI-native.
[15:50] became hugely successful because in a sense chatbots are actually an extension of search
[15:57] A chatbot is an extension of search
[15:59] Right, that's why it's not independent of search
[16:01] It's because because think about it the most common way people interact with chatbots is
[16:08] I have a question and they ask the chatbot
[16:11] and that's essentially what search has always done
[16:13] But what it offers
[16:14] something far better than search is it becomes very interactive (交互的).
[16:19] It has interactivity.
[16:20] You can ask follow-up questions and it can even help you summarize some of the information you get through it.
[16:26] helping you distill it into a condensed answer to your question.
[16:30] Right, this is something search could never give you before.
[16:32] Mm-hmm (right).
[16:33] But of course it's not exactly the same need.
[16:37] But in terms of demand from a broad demand perspective it's fairly similar to the demand that existed before.
[16:42] Manus and OpenClaw.
[16:44] I think they're the most famous wrappers right now.
[16:46] But wrappers ended up being sold to model companies (Note: Meta's acquisition of Manus was later reversed; our show was recorded before the reversal).
[16:48] Doesn't that show that wrappers still can't escape the grip of model companies?
[16:52] The escape velocity isn't enough.
[16:53] It's not fast enough, is it?
[16:55] I think.
[16:58] I think for wrappers to survive in the current environment there are two approaches I can roughly imagine.
[17:02] One approach is what you just said.
[17:09] Escape fast enough.
[17:11] That is, my growth is fast enough that by the time model companies catch on.
[17:16] I've already captured significant user mindshare.
[17:19] And when model companies catch up to your product,
[17:23] by that time,
[17:23] I've already evolved my own model.
[17:25] I think Cursor is trying to take this path (mm-hmm).
[17:29] So Cursor, in this AI-native scenario,
[17:33] is pretty much the fastest-growing startup I can think of.
[17:36] Even a company like that is feeling a strong sense of crisis right now.
[17:40] How strong is that sense of crisis?
[17:42] Anyway, my feeling is that for Cursor,
[17:46] its relationship with Anthropic right now has entered a very delicate phase.
[17:50] It's like they used to be close, seamless partners.
[17:55] Anthropic provided the model, Cursor provided the product.
[17:57] Later Anthropic developed Claude Code itself.
[17:59] Claude Code has become very successful.
[18:01] And then Cursor is now trying to build its own model.
[18:03] So Cursor is working hard training its Composer.
[18:08] So I don't even think we need to talk about the future.
[18:11] It's already happening right now.
[18:11] They're already in a fairly competitive relationship (mm-hmm).
[18:14] If they lose in this competition
[18:17] I think it would be quite problematic.
[18:18] Because when it comes to coding, at its core it's essentially a professional need serving professional users.
[18:25] It's a productivity tool.
[18:26] A common scenario with productivity tools is winner takes all.
[18:32] I think this applies whether to Cursor or to Anthropic or for any company doing coding.
[18:38] It's probably something they're all quite worried about.
[18:40] Mm-hmm (right).
[18:41] So that's what I was saying.
[18:42] That's one path.
[18:43] (It has to be fast) That is, you grow fast enough.
[18:46] You grow like crazy before anyone even thinks about acquiring you.
[18:49] Just grow wildly.
[18:50] By the time they want to acquire you, you're big enough.
[18:53] Another way is for the market to be small enough so small that model companies can't even be bothered.
[18:58] I think Midjourney is exactly that example.
[19:01] That's it.
[19:02] The market is so small that perhaps even though you could say Gemini could make an effort to replicate what Midjourney does, it might take some effort, some money, some data to pull it off, but it's small enough.
[19:13] To the point where Gemini probably wouldn't want to spend much time on that.
[19:17] It's beneath them.
[19:18] Right, haha.
[19:20] I think that might also be a way to survive.
[19:22] Yeah.
[19:24] So even Cursor hasn't escaped the model's grasp today.
[19:29] Has anyone successfully escaped?
[19:31] For the big ones, I haven't seen any so far.
[19:35] For smaller ones, maybe Midjourney is an example.
[19:37] Of course there must be other examples.
[19:38] I just haven't seen them yet.
[19:39] Right, smaller ones.
[19:40] I think there will be.
[19:40] There will be examples.
[19:41] Does Lovart count?
[19:43] I think they have a shot.
[19:46] They have a shot.
[19:49] Anyway, you can't do the general-purpose thing.
[19:55] I think this is something the founder has to decide.
[19:59] Whether you want to bet on something big with a one-in-ten-thousand chance of survival and swing for the fences.
[20:05] Or go with a one-percent chance of survival and lock down something small first.
[20:11] If it were you.
[20:11] What would you choose?
[20:13] Hahahaha.
[20:16] If it were me.
[20:16] Deep down I'd definitely want to swing for the fences.
[20:18] But honestly
[20:20] I genuinely think
[20:22] You can't get there overnight
[20:26] So if it were me
[20:26] I'd choose to secure a small win first
[20:28] But I'd pick a small one with huge upside potential
[20:33] Why do you think OpenAI acquired OpenClaw?
[20:35] Why did Meta want to buy Manus (Note: Meta's acquisition of Manus was later revoked; our show was recorded before the revocation)
[20:37] Why doesn't Google acquire anyone?
[20:39] Oh, Google did acquire someone
[20:40] Google bought the Windsurf team
[20:42] Okay, Windsurf
[20:44] Yeah
[20:46] I don't get it
[20:47] Haha
[20:49] What do you mean you don't get it?
[20:50] Honestly, it's just that
[20:51] I don't get it
[20:52] I think
[20:55] I think Meta's acquisition of Manus
[21:00] I think for them
[21:01] The biggest benefit was
[21:04] If
[21:04] aside from how much they spent
[21:06] The biggest benefit was gaining a really strong
[21:09] product team in Asia
[21:12] What does being in Asia signify?
[21:13] Because
[21:14] I think on one hand
[21:17] Obviously everyone knows
[21:19] China's AI talent pool is still quite deep
[21:22] Although perhaps currently in terms of technology
[21:25] Purely from a technical standpoint
[21:26] Chinese AI hasn't really caught up with the US yet
[21:30] But
[21:31] Obviously there are many talented AI people in China
[21:33] Whether in pure technology or in product
[21:36] In terms of product, I think China essentially
[21:39] has better talent than the US
[21:40] Right, so for them
[21:42] I think Manus became a
[21:44] foothold in Singapore
[21:46] So they can attract some
[21:48] For example, from China
[21:48] Or from Singapore or East Asian talent
[21:52] And I actually haven't fully figured out
[21:58] How important this product itself is to Meta
[22:02] Or in other words
[22:03] Why couldn't Meta just build this product themselves?
[22:05] But whether it's Manus or OpenClaw
[22:07] They were in fact born from outside teams
[22:10] Why
[22:10] weren't they built by this group of Silicon Valley researchers?
[22:13] Have you thought about that?
[22:14] Yeah, I think
[22:16] Hmm, for me this question
[22:17] Actually
[22:20] I think once a company gets big
[22:23] Its burden gets bigger too
[22:25] Like, I might be a researcher
[22:30] and we can build something really
[22:33] interesting-looking
[22:34] very distinctive products
[22:36] But once I make that product public
[22:39] There's a ton of responsibility that comes with it
[22:41] First, you can't just launch this product
[22:43] and tell all your users
[22:46] You need to go buy another computer to do this
[22:48] Otherwise it might gain access to everything on your computer
[22:50] All the permissions—
[22:51] and crash your system
[22:52] Mm-hmm
[22:52] So for a big company
[22:54] Take Google, for example
[22:54] Google would never release a product like this
[22:56] Right? Mm-hmm
[22:57] So it takes a lot of time to polish the product
[22:59] And you have to make sure
[23:01] there are no legal risks
[23:03] and that it won't damage your brand with users
[23:07] Plus,
[23:08] if you ship it
[23:10] you probably have to allocate
[23:12] some relatively fixed resources
[23:13] to serve this model
[23:15] or serve this
[23:17] product line
[23:18] So yeah, yeah
[23:19] For big companies
[23:20] I think there's quite a lot of burden
[23:22] But for individuals
[23:23] it doesn't matter
[23:24] I mean, it's an open-source project anyway
[23:26] So what if my code is terrible
[23:28] Come help me write it
[23:29] Right? Hahaha, yeah
[23:31] I think whether it's Manus or OpenClaw
[23:33] they actually point to a direction
[23:34] which is
[23:35] this is also a possible narrative for 2026
[23:38] What are your thoughts on 2026
[23:39] and what are your expectations
[23:42] I think there are really so many possibilities
[23:46] And for me
[23:48] in terms of model capabilities
[23:52] I think
[23:54] Models— I sometimes really love saying this slogan
[23:57] which is that models should achieve
[23:59] train with finite context, use as infinite context (finite in training, infinite in use)
[24:03] In other words
[24:04] you use this limited
[24:06] this context length (context window) to train it
[24:08] but in usage, it can use a very, very long
[24:10] even nearly infinite context length
[24:12] I think this
[24:14] has a chance of being realized this year
[24:17] And once this is achieved
[24:19] I think it will unlock many new applications
[24:22] because, to give the simplest example
[24:24] you could potentially let this model
[24:27] interact with you continuously
[24:28] and continuously receive your information
[24:30] And as it runs
[24:32] it will continuously evaluate the current context and your conversation
[24:35] and possibly discard information it deems unimportant
[24:37] And then it becomes
[24:40] the personal assistant everyone dreams of
[24:42] Yeah, I think technically speaking
[24:44] I think this will
[24:45] definitely be realized this year no matter what
[24:47] But of course, of course
[24:48] I think what people haven't reached consensus on yet is
[24:52] how to technically achieve this
[24:53] Mm-hmm
[24:54] Obviously there are many technical paths
[24:56] But I think right now it's more about
[24:58] trying to see which path can work
[25:03] There might be several paths that all work
[25:05] Then we'll have to test them experimentally
[25:08] under common user scenarios
[25:11] to see which path is the most efficient
[25:13] Yeah, I think we're more at this stage right now
[25:15] rather than a stage where no one has ideas
[25:19] Everyone has ideas
[25:20] but we need to figure out which idea is the right one
[25:22] Standing here in Q1 2026 as
[25:25] a frontline researcher
[25:26] do you think the pace of model improvement is slowing down
[25:29] I think not at all (not at all) I think not at all
[25:32] How does its velocity curve compare to '25
[25:34] and what's changed from '24
[25:38] Mm, it's hard to say quantitatively
[25:40] because you need to give me a standard
[25:43] before I can quantitatively tell you
[25:44] because if the standard you give is, like
[25:46] I just look at some Benchmark
[25:48] like, say, SWE-bench
[25:49] how many points it gains each month
[25:51] then this will definitely slow down
[25:53] because by definition
[25:55] this Benchmark maxes out at 100%
[25:56] Mm-hmm
[25:57] so the closer you get
[25:57] the slower it definitely gets
[25:59] but this doesn't necessarily mean
[26:00] that users feel the model's capability growth has slowed
[26:03] because going from 50% to 60%
[26:06] it might feel like, hey
[26:07] that's a bit better
[26:08] but quite possibly
[26:09] For example, from 70% to 75%
[26:11] It found that the gains are even greater than from 50% to 60%
[26:13] Mm-hmm
[26:14] That's entirely possible
[26:15] If it's from 80% to 90%
[26:16] Or 90% to 100%, the difference would feel even more significant
[26:19] Not necessarily
[26:20] Because maybe past
[26:21] Maybe around 80% to 90%
[26:23] Users wouldn't notice any difference
[26:24] It might even get worse
[26:25] You said it doesn't get slower at all
[26:26] Based on what criteria?
[26:27] I think it's based on
[26:29] my personal feeling as a researcher
[26:32] Like
[26:32] My personal impression is
[26:34] the model's ability to learn things is getting stronger and stronger
[26:38] It used to take a lot of effort
[26:39] to get the model to learn to do something
[26:43] But now it probably doesn't require that much effort
[26:45] The most important thing is
[26:46] you need to clearly define the problem
[26:47] and figure out how to build the right data (Mm-hmm)
[26:51] Of course, data
[26:52] Data is broader now
[26:53] including environments and such
[26:56] And
[26:58] the rest
[26:59] often seems to fall into place naturally
[27:03] Right
[27:03] Why is the learning ability getting stronger?
[27:05] The model's learning ability has improved
[27:07] I think maybe on one hand
[27:11] There could be many reasons
[27:12] But I think one reason is pre-training
[27:15] Actually, over the past few months
[27:17] I think it has been getting stronger
[27:19] Pre-training
[27:19] Right, right
[27:20] Model pre-training
[27:21] has actually gotten stronger in the past few months (Mm-hmm)
[27:23] I think this might be
[27:26] somewhat controversial in a sense
[27:28] Because a few months ago
[27:30] I think
[27:31] many people were already discussing whether
[27:34] this Scaling Law
[27:34] had reached its limit
[27:35] Mm-hmm
[27:37] My experience is that it hasn't
[27:39] And my feeling is
[27:40] in the next four months
[27:43] I don't see any signs of it ending either
[27:46] Mm-hmm
[27:48] Why do people think it's reaching its limit?
[27:49] I think, well
[27:51] I-I-I
[27:51] obviously don't know
[27:53] why people think it's reached its limit
[27:54] Because I myself don't feel it's reached its limit
[27:56] But my guess would be
[28:00] When someone thinks a pattern has reached its limit
[28:03] it's basically
[28:04] one of these two situations
[28:06] Ah
[28:07] One situation is
[28:08] they feel the applicable range of this pattern has reached its limit
[28:12] Ah, maybe
[28:13] Maybe
[28:14] Fundamentally speaking
[28:15] Scaling Law
[28:16] simply can't extend infinitely
[28:18] which could be true
[28:19] But this is just a guess
[28:21] That is, this person might feel that
[28:22] the applicable range of this pattern has reached its limit
[28:25] Another possibility is
[28:25] this person feels that
[28:27] this pattern
[28:28] one of its conditions can no longer be met
[28:30] For example, they feel that data has already hit a wall
[28:35] Then I simply haven't extended it further
[28:37] Another possibility
[28:38] But actually there's a third possibility
[28:41] The third possibility is that
[28:44] there's a bug somewhere in their work
[28:46] that they haven't noticed themselves
[28:48] So they think it's reached its limit
[28:49] Oh
[28:51] From my perspective
[28:53] From my observation
[28:55] I think
[28:59] probably the vast majority of people who hit a wall
[29:01] it's because of the third reason
[29:03] It's because there's a bug
[29:04] What kind of bug?
[29:05] I think
[29:06] There are many possible kinds of bugs
[29:08] For example, one possibility is
[29:10] When you're working on Scaling Laws
[29:12] Some scientific assumptions weren't quite right
[29:14] For example, what kind of token horizon you choose
[29:16] That is, for each model size, what kind of
[29:19] expected training data volume you pick
[29:21] And then this amount of data
[29:24] Where this data comes from
[29:25] And then
[29:27] It's possible that these more scientific choices
[29:30] weren't made clearly
[29:30] That's one possibility
[29:31] But I think there's another possibility, which is
[29:33] there's simply a bug
[29:35] Actually, I don't think this is surprising in the industry
[29:39] Many times
[29:41] Fixing a single bug
[29:42] The progress it brings
[29:43] is far greater than some fancy tricks
[29:47] Right
[29:48] And of course, there are other situations
[29:52] These two examples I just gave
[29:54] are situations I've seen quite often
[29:57] So how do you deal with bugs
[29:58] How do you solve bug problems
[30:01] I think, right
[30:01] I feel like this is more of a mindset issue
[30:03] Because when you encounter a bug
[30:04] If you think it can't be fixed
[30:06] You'll say we've hit a wall
[30:07] When you encounter a bug
[30:08] I think, oh
[30:08] This can definitely be fixed
[30:10] Then you'll feel like we haven't hit a wall yet
[30:12] Because everyone definitely encounters bugs
[30:14] I think, I think
[30:17] This might be like what you said
[30:18] That is There are some things that are more about belief
[30:21] But for me
[30:22] A more important thing is the working system
[30:24] That is, when something
[30:30] is different from what you predicted
[30:31] Can you systematically rule out various possibilities
[30:34] I think this is a very important thing
[30:37] Mmhmm
[30:38] This is something I think Gemini and Anthropic do well
[30:41] That is
[30:43] Especially in pre-training
[30:44] That is, when behavior at a certain scale
[30:48] might be different from what you imagined
[30:50] People can design reasonable
[30:53] what we call ablation experiments (消融实验)
[30:55] reasonable experiments like this
[30:56] can help you see
[30:57] test whether some of your
[30:58] imagined possible factors
[31:00] are actually the real factors
[31:02] I think this
[31:03] systematic approach to problem-solving is the key
[31:07] Mmhmm
[31:09] You think
[31:10] Model capabilities can still improve
[31:12] Then its driving force
[31:13] Data and compute
[31:13] Algorithms
[31:14] Which do you think is the main driving force
[31:18] I think they all contribute
[31:20] But in a sense
[31:24] Data and compute are two things
[31:25] that are actually very strongly correlated
[31:28] Data and compute, mmhmm
[31:29] Right, because
[31:30] When your compute goes up
[31:31] you'll naturally attract more data
[31:32] When data goes up
[31:33] you'll naturally need more compute
[31:34] Right, and then
[31:36] For algorithms, I think
[31:39] Algorithmic progress often has a phase transition
[31:43] That is, there's a phase
[31:46] where you haven't figured out what to do at all
[31:49] At that stage, algorithms are extremely critical
[31:52] Because when you haven't figured out what to do at all
[31:54] you might have no way to scale up at all
[31:56] And then you might get stuck there
[31:57] But at a certain point
[31:58] you might discover
[32:00] the most important thing in the algorithm
[32:01] Then it might suddenly go from
[32:03] completely impossible to possible
[32:04] And then after that, algorithmic improvements
[32:06] are more of a gradual improvement
[32:08] That is It might improve your computational efficiency
[32:11] or the efficiency of using data
[32:12] Right, and then
[32:15] Let me give an example
[32:16] For example, from the perspective of language model pre-training
[32:22] Then this leap in algorithms
[32:24] Well
[32:25] I mean, the development of the Transformer
[32:28] But after the Transformer was discovered
[32:30] It's been mostly gradual and smooth
[32:32] Improving its efficiency
[32:34] Or your use of data
[32:35] Or the efficiency of compute usage has been improving
[32:37] Right
[32:38] So the current drivers are compute and data
[32:41] I think within the relatively clear frameworks we have now
[32:46] The main drivers are compute and data
[32:48] By clear framework, I mean
[32:49] For example, pre-training and post-training
[32:51] Whether it's post-training based on reinforcement learning
[32:53] Or based on supervised learning
[32:56] That is, post-training with supervised learning
[32:57] For example, within these two relatively clear
[33:00] paradigms (范式)
[33:02] Indeed, compute and data are the main drivers
[33:06] But it's undeniable
[33:07] That in some other directions, the driving factors might be different
[33:10] Hmm, what do you mean?
[33:11] To give a simple example
[33:12] For instance, multimodal generation
[33:15] Hmm
[33:15] Well I think it's probably something that, algorithmically speaking
[33:18] Hasn't been fully figured out yet
[33:21] So that's still a scientific problem
[33:22] That hasn't been solved yet
[33:23] Right
[33:25] But language is no longer a scientific problem
[33:29] Natural language generation
[33:31] I think, for now
[33:32] Before this technical approach hits a wall
[33:34] I think it's relatively clear scientifically
[33:37] But in terms of engineering
[33:38] There's still so, so, so much to be done
[33:41] How much more do you think pre-training can improve?
[33:43] Improving model capabilities through pre-training
[33:44] How much more
[33:45] How much further can it go
[33:46] Can we expect
[33:49] That's just how people are
[33:50] I mean, when you haven't hit the wall, you
[33:52] Don't actually know how long the road is
[33:54] What I can
[33:54] What I can see is that we haven't hit the wall yet
[33:57] But I don't know when we'll hit it either
[33:59] If I really had to estimate a timeline
[34:01] As I just said
[34:01] I think four months
[34:03] The next four months will still see progress
[34:06] But in the AI field
[34:07] No one can predict what happens after four months
[34:09] Hmm, so over the past few months
[34:10] When you look at pre-training and model capabilities
[34:12] You're still very excited
[34:14] Is this the general mindset and state around you?
[34:18] I think so
[34:21] Is this within a small environment at Google
[34:22] Or in the entire Silicon Valley environment
[34:24] I think it's hard to say for all of Silicon Valley
[34:25] Because Silicon Valley is too big a place
[34:27] People working on products might be excited about products
[34:29] Right, for product people
[34:29] What excites them most might be something like OpenClaw
[34:32] Hmm
[34:32] But for people working on models
[34:34] It's probably
[34:34] That we get more excited about this kind of model progress
[34:37] Hmm
[34:38] I think
[34:39] Uh
[34:41] For people working on models
[34:42] Is excitement a consensus?
[34:44] Over the past four months
[34:44] I personally think so
[34:46] Oh, I personally think so
[34:48] At least within the circle I have access to
[34:50] I think at
[34:51] Anthropic and Google, people
[34:53] Or at Gemini, people are probably thinking more about
[34:56] How our AI will keep progressing
[34:59] And soon we'll be replaced
[35:01] After being replaced, what should we do?
[35:03] Haha, rather than worrying about what to do when models hit a wall
[35:06] Hahaha
[35:08] Speaking of which
[35:09] Why
[35:10] Over the past few months
[35:11] Coding has been developing the fastest
[35:14] Why is this the case?
[35:16] I think the coding scenario
[35:18] First of all, coding itself
[35:20] Hasn't just been developing the fastest over the past few months
[35:23] I think coding itself
[35:24] Actually
[35:27] From Claude 3.5 (new)
[35:29] Or some people out there called it Claude 3.6 (yeah)
[35:32] After that
[35:32] It's been in a state of rapid development ever since
[35:35] And I think
[35:36] That was early last year
[35:37] Or the end of the year before
[35:38] That was October of the year before last
[35:43] Yeah, yeah
[35:44] It should be, maybe October or November
[35:47] But around that time
[35:49] From then on
[35:49] I've been in a state of rapid development
[35:51] I think the coding scenario has
[35:55] Two biggest advantages
[35:57] The first advantage is its reward signal (回馈信号)
[36:01] That is, its feedback signal
[36:04] Is very well-defined
[36:07] Because
[36:10] For example, if you
[36:11] For example, something like a software engineer (软件工程师) task
[36:14] Often the situation is
[36:16] I need to write some code
[36:18] To implement a feature
[36:20] A feature
[36:21] (Yeah) This feature needs certain inputs
[36:24] And produces certain outputs
[36:26] This is something very easy to
[36:28] Very easy to test
[36:30] So its feedback signal is very clear
[36:32] Your input and output match up
[36:36] Then it means your implementation is successful
[36:39] If not, then it's unsuccessful (Yeah)
[36:40] But this is just one example
[36:41] In coding-related work
[36:44] There are many, many
[36:45] Many such well-defined feedback signals
[36:47] And another big advantage is
[36:52] Coding data has a very natural foundation
[36:57] That foundation is GitHub
[36:59] GitHub has aggregated over the past few, roughly
[37:03] Decades
[37:05] A lot of high-quality code written by many excellent programmers
[37:10] And starting from that code
[37:11] You can build a tremendous number of environments
[37:15] I think these two things, from a model perspective
[37:18] Are why coding can be done very well
[37:20] Of course, I think from a product perspective
[37:23] There's another reason
[37:24] Which is that coding
[37:27] The demand for this product
[37:29] Is in a sense
[37:31] Relatively singular
[37:33] It's not like when you build something like a social media app
[37:38] Or a game
[37:39] Where everyone might have different tastes
[37:41] And it might be hard
[37:43] To satisfy everyone's needs
[37:46] Then you might need recommendation algorithms
[37:49] But with coding
[37:50] The good thing is that excellent programmers writing code
[37:54] Actually have fairly similar styles
[37:57] What kind of style
[37:58] Clean and concise
[37:59] Yeah, right, good code is
[38:01] (Not messy) There are some shared standards
[38:04] For example, like you said
[38:05] The code is concise
[38:07] Structurally clear
[38:09] Suitable for future development
[38:10] And has reasonable abstractions
[38:13] And of course many other standards
[38:14] But I think good programmers tend to have
[38:18] A fairly consensus-driven standard
[38:20] On this matter
[38:20] So from a product perspective
[38:23] It actually makes the coding product much simpler
[38:27] In your current work
[38:28] What percentage of code do you write with Claude Code
[38:33] How many times more productive does it make you
[38:37] You just asked a question that almost got me fired
[38:39] Google doesn't allow using Claude Code
[38:40] Hahahahaha, oh right
[38:44] I think, for me
[38:49] A conservative estimate
[38:51] Maybe 90% of the code is model-generated
[38:54] But it might be
[38:55] I need to spend a lot of time reviewing the code
[38:57] To see if it's written appropriately
[38:59] Written reasonably
[38:59] Whether it's really what I wanted it to write
[39:02] And I think after having AI-assisted tools
[39:06] The most important thing about writing code
[39:08] Has become
[39:10] How you design it
[39:12] How you design the logic of your code
[39:14] And which files it needs to interact with
[39:17] Files to associate with
[39:18] And what things need to be done
[39:20] And you need to give the model
[39:22] Maybe provide some reasonable context
[39:24] I mean, like
[39:25] For example, this code
[39:25] You can use it as a reference (参考) to take a look
[39:28] Right, actually outputting code
[39:31] I think models are way more capable than humans
[39:35] So for me
[39:36] If you actually count
[39:37] How many lines of code I wrote by hand
[39:40] How many lines of code the model wrote
[39:41] I'd say conservatively, the model wrote over 90%
[39:45] If not conservative, maybe 99% or 100%
[39:48] The remaining 10% is what it can't write
[39:50] Or why you didn't let it write
[39:53] Conservatively, 90%
[39:55] Giving myself some credit
[39:56] Hahaha
[39:57] I think what it can't write
[39:58] And the part I can write is becoming less and less
[40:01] Less and less and less
[40:02] What was it like in the past
[40:04] It was what it couldn't write
[40:06] I think
[40:07] Very early on, maybe about a year and a half ago
[40:14] At that time
[40:14] To be honest, on the market
[40:15] Only Claude was able to
[40:17] Actually write this kind of software engineering code
[40:21] At that time
[40:21] You could still feel many flaws in the model
[40:26] For example
[40:26] Sometimes when it wrote code
[40:27] It would only focus on this one file
[40:29] It wouldn't pay much attention to multiple
[40:32] The relationships between multiple files
[40:34] And if, say, a class
[40:37] Its definition was buried many layers deep
[40:40] Or it wasn't directly nested in this
[40:42] This direct tree structure
[40:44] The model probably couldn't find it
[40:46] Now I think this is happening less and less
[40:49] Hehe
[40:50] Really less and less
[40:51] As a researcher
[40:53] Your programming workload
[40:55] How many times that of the past
[40:57] Because from the perspective of writing code
[40:59] It's quite hard to quantify this
[41:00] But if we talk about, say, running experiments
[41:04] And the efficiency of implementing ideas
[41:07] I think compared to a year or even a year and a half ago
[41:10] It could be 20 or even 50 times faster
[41:15] Right, because models have really become
[41:18] It can be pretty insane
[41:19] You can open several at the same time
[41:21] And you have several ideas
[41:22] And test them simultaneously
[41:24] And sometimes even
[41:25] The model can help you monitor some experiments
[41:27] Monitor some results and stuff
[41:28] So
[41:29] It's really quite a significant efficiency boost
[41:32] Right, but
[41:36] If we talk about personal working hours
[41:38] I feel like it has made my working hours longer
[41:43] Why is that
[41:43] It's just that
[41:44] Because development speed has increased
[41:47] The more you try, the more you want to try
[41:48] There are more and more ideas to try
[41:50] So it feels like before, you might have had this situation
[41:54] You have something
[41:55] Like this file
[41:56] You haven't seen it before
[41:57] You might not quite understand it yourself
[42:00] Then you'd definitely have to spend time finding that person
[42:02] And you'd schedule
[42:03] That person, maybe a few hours later
[42:05] But now it's not like that
[42:05] You just see this file
[42:06] You don't understand it, just ask Claude or Gemini
[42:09] Gemini might tell you the result in five seconds
[42:12] And you just keep going
[42:13] Hahaha, so in terms of working hours
[42:16] I feel like working hours have actually gotten longer
[42:18] And the intensity has increased too
[42:20] Well, Google isn't that Google anymore
[42:22] Is that so
[42:22] Not that
[42:23] Google where you can coast along
[42:26] Not that work-life balance Google
[42:28] I feel like in the GenAI (生成式人工智能) field
[42:31] No one can just coast along
[42:32] Hahaha
[42:32] So what hours are you keeping these days
[42:34] I usually start around 9 in the morning
[42:38] Get to the office at 9
[42:41] At 9 AM, I might first get up and check emails
[42:42] And look at the experiments from the night before
[42:45] Then get to the office around 10
[42:47] And then at night
[42:50] If I'm alone in the US
[42:52] I might stay until around 10 or 11
[42:55] Of course, if my family is here
[42:57] If my wife is here
[42:58] I might go home a bit earlier
[42:59] But at home I'd be working anyway
[43:01] So I think in the GenAI field
[43:04] No one is just lying around
[43:06] Unless
[43:07] You've completely lost interest in technology
[43:10] And have no ambition for yourself
[43:12] Then no one would care if you just lay there
[43:14] But I think most people are quite self-driven
[43:18] They just want to do it themselves, right
[43:20] Do you think other fields
[43:21] Will have more of these Claude Code moments
[43:24] Where will the next explosion happen after coding
[43:27] You asked a good question
[43:29] If I could see it clearly
[43:30] I might have gone out to start a company already
[43:32] Hahahaha
[43:35] Right, but but
[43:36] It's true that besides coding
[43:39] We can already see
[43:40] That many
[43:40] Other directions are already having a big impact
[43:43] But if we only talk about those directions
[43:44] They might not be a good
[43:45] market direction
[43:46] Because
[43:48] Coding is special in that
[43:51] It itself is
[43:52] A very large market
[43:53] But if you look at some other directions
[43:54] They might not be
[43:56] Such a large market
[44:00] For example
[44:01] Some people say the next direction is
[44:03] This kind of
[44:05] AI-generated content or something
[44:08] But AI-generated content
[44:11] How big is that market
[44:12] Right
[44:14] I think
[44:15] If you say this content
[44:17] Is for people to consume
[44:18] Then people have limited time
[44:20] No matter how much content you generate
[44:25] People's time is only 24 hours a day
[44:27] Right
[44:29] Unless it completely replaces people
[44:30] Like replacing TV
[44:31] Then that would be another story
[44:32] Like the Vision Pro that came out before
[44:35] Then that would be another story
[44:36] But that would be
[44:38] A bigger story
[44:39] So I think
[44:40] Besides coding
[44:43] Everyone is still looking for
[44:44] The next big market
[44:45] And if there is one
[44:49] I think there will be
[44:51] But it's just
[44:52] Not necessarily that big
[44:54] I think the most likely one might be
[44:55] This kind of interactive education
[44:56] Or maybe
[44:57] You said coding is not a direction for you
[45:01] Because coding itself is already very big
[45:04] Yeah, it's already a huge market
[45:05] Do you think AI researchers
[45:07] How should we treat coding
[45:11] Should we use coding to validate our ideas
[45:12] Or should we make coding itself the end goal
[45:16] I think there are two types of people
[45:19] One type is
[45:21] They genuinely want to make coding better
[45:22] Another type is
[45:25] They want to use coding
[45:26] As a means to validate AI capabilities
[45:27] I think both are fine
[45:28] Both directions are fine
[45:29] But I think
[45:30] The people who genuinely want to make coding better
[45:32] They need to think more about products
[45:33] And the people who want to validate AI capabilities
[45:34] They need to think more about
[45:35] How to build better benchmarks
[45:37] Right
[45:40] I think both directions are very meaningful
[45:42] Just their focus is different
[45:45] Do you think
[45:48] The current state of AI research
[45:50] Is more like
[45:50] A gold rush
[45:51] Or more like
[45:52] A scientific revolution
[45:54] I think it's a bit of both
[45:57] Like those things AI
[45:58] There are many things that AI actually can't easily do
[46:01] But conversely, humans might do better
[46:05] For example, being a product manager
[46:09] To be honest, I think
[46:11] Being a good product manager
[46:12] Is something I currently can't figure out
[46:14] How to train AI to do
[46:17] Why is that
[46:19] There's no standard
[46:19] There's no standard (no metric)
[46:21] Like what makes a good product
[46:23] I can't really figure it out
[46:25] There's no very objective standard
[46:26] You have to build it and let people use it
[46:29] Only then do you know it's good
[46:30] Then everyone will say it's good
[46:32] Right, I think
[46:33] That's something with very unclear feedback signals
[46:36] Then I don't know how to train AI to do that
[46:38] Right
[46:40] When will programmers be completely replaced
[46:43] Will there be such a day
[46:45] Mm-hmm
[46:48] I think that day will come
[46:52] But it won't come all at once
[46:55] It won't be like programmers are all still there
[46:58] And after one night
[46:59] The next day all programmers are fired
[47:01] It won't be like that
[47:02] It will definitely be a gradual process
[47:04] But Everyone can already see this gradual process now
[47:06] Because some companies have already started laying people off
[47:09] Right, I think
[47:11] In a sense
[47:14] AI is a
[47:16] In a sense
[47:16] Of course it's a very good thing
[47:17] But from another perspective
[47:18] It might also be
[47:19] A very unfortunate thing
[47:21] That AI is a very centralized technology
[47:24] It will make a small number of people stronger
[47:27] But will make most people lose
[47:30] Their unique value
[47:32] Right, so I think
[47:33] For traditional software engineering
[47:39] The final result might be
[47:41] Now 1/1000 of the people do the work of everyone in the past
[47:45] Earning 100 times the current salary
[47:49] Then what advice do you have for programmers
[47:54] I think
[47:55] Haha, I think maybe
[47:59] Embrace new things
[48:00] I think that's very important
[48:02] I think
[48:03] One very important thing for future programmers might be
[48:05] How to effectively collaborate with AI
[48:07] Mm-hmm, like
[48:08] There are many things that AI might do
[48:11] Not that well
[48:12] Like how to
[48:13] Reasonably design an implementation plan for something
[48:17] And how to design it
[48:19] So that it might align with the company's
[48:20] Future development
[48:22] Those kinds of things
[48:23] You might have a hard time telling a model
[48:25] To make it understand these things
[48:26] Those things might still need humans to do
[48:29] But maybe things like specific
[48:30] Very specific
[48:31] Like the work many programmers did in the past
[48:33] Where your manager tells you to implement this plan
[48:37] And give it to me by next Friday
[48:39] I think that kind of work
[48:40] Might not exist in the future
[48:42] Then what kind of programmers would be in that 1/1000
[48:45] What are their traits
[48:46] 1/1000 is just a figurative number
[48:47] I really don't know if it would be 1/1000
[48:48] Or 1/10,000
[48:49] Or 1/100,000
[48:50] Or maybe 1%
[48:51] Don't be so pessimistic
[48:54] I'm a famous pessimist
[48:55] So don't take it too seriously
[48:58] And
[49:00] I think
[49:01] Good programmers in the future
[49:02] First, technically speaking
[49:05] They will definitely be very strong
[49:07] Because if you're technically weak
[49:09] There's no reason
[49:10] Why AI can't replace you
[49:11] But being technically strong might not be the only thing
[49:13] It won't be a necessary condition
[49:14] It might be a sufficient condition
[49:16] Another thing I think will be very important
[49:18] is that you have to understand how your part of the work
[49:22] fits into a large organization or a big company
[49:25] how to
[49:27] how to adapt and integrate into it (Mm-hmm)
[49:29] This might also be an important thing
[49:31] Mm-hmm, and
[49:31] And of course there might be many other things
[49:34] For example
[49:35] whether this person's planning ability is strong enough
[49:38] If their planning ability is strong
[49:39] they can definitely take this big
[49:41] very complex thing
[49:42] and break it down into many relatively smaller things
[49:44] and hand them over to different AIs to do
[49:46] But right now these three abilities seem important
[49:51] Things that AI might not be able to fully do yet
[49:53] doesn't mean it won't be able to in six months
[49:54] Maybe in six months you come ask me
[49:56] I find that the last thing AI can already do
[49:58] Then only two things remain
[49:59] Another six months later
[50:00] Maybe the remaining two can also be done
[50:01] Then maybe my answer would become more pessimistic
[50:04] So
[50:04] No one can predict what will happen in six months
[50:06] I can only speak from the current perspective
[50:10] That past Spring Festival
[50:11] Another thing many people paid attention to was Seedance
[50:13] Will Seedance make Google anxious
[50:15] I think actually
[50:19] Possibly yes
[50:20] But this anxiety
[50:22] Hasn't reached me yet
[50:24] Maybe it gives the Google DeepMind
[50:27] team responsible for multimodal generation
[50:29] some pressure
[50:31] But if you ask me
[50:35] I think
[50:36] I might not think they have much to be anxious about
[50:39] Like I think
[50:40] It doesn't reflect any paradigm shift
[50:43] More importantly, I think ByteDance
[50:45] whether it's the product effect
[50:48] or possibly in terms of data and such
[50:51] These details are done very very well
[50:53] I think indeed
[50:56] ByteDance has historically had
[50:57] a relatively strong advantage in multimodal generation
[51:01] But I think at least personally
[51:02] I haven't experienced
[51:03] that it's a paradigm shift
[51:06] Then maybe
[51:08] It's not enough to make everyone very anxious
[51:11] Right but there is definitely pressure
[51:14] Right
[51:14] Does Seedance's product capability come from model capability
[51:17] Or product capability
[51:18] I haven't worked
[51:21] At ByteDance
[51:22] So I don't
[51:23] know the specific details
[51:24] But if you ask me to guess
[51:25] I think the model probably accounts for the majority
[51:27] Mm-hmm
[51:29] What does good model capability come from
[51:31] Comes from data
[51:32] Because there probably isn't fundamental innovation in algorithms
[51:34] I think algorithms
[51:36] First of all because multimodal belongs to
[51:37] what we just said, still belongs to that
[51:38] scientific problem
[51:39] Multimodal generation belongs to scientific problems
[51:41] Right, multimodal generation
[51:42] Still belongs to a relatively scientific problem (Has multimodal understanding been solved)
[51:45] Compared to generation it's definitely more systematic
[51:48] Has a more systematic understanding
[51:50] But compared to text tokens
[51:54] Definitely still not that
[51:58] The paradigm isn't that fixed yet
[51:59] I think in generation it might be
[52:01] Because it's still something
[52:03] where the paradigm hasn't been fixed
[52:04] Maybe each company uses somewhat different techniques
[52:07] big or small differences
[52:09] And um right now we can mostly just see
[52:13] In terms of effects
[52:14] Maybe ByteDance and Google DeepMind are
[52:17] In terms of effects
[52:17] The two that do it better
[52:19] Mm-hmm, so it might also come from details
[52:21] Done better
[52:22] Right if you ask me to guess
[52:24] I would guess data
[52:26] Data
[52:26] If you ask me to guess I'd guess data but
[52:29] I haven't worked at ByteDance either
[52:30] So I'm just guessing blindly haha (Mm-hmm)
[52:34] What do you think about Wu Yonghui going from Google to ByteDance (ByteDance large model team Seed lead)
[52:38] Who am I to judge haha
[52:39] To evaluate Yonghui I think I think
[52:43] Of course, I haven't worked with
[52:45] Yonghui in the past,
[52:46] so
[52:46] actually I can't really give a very good assessment
[52:50] or an objective evaluation
[52:51] But I think after I joined Gemini,
[52:55] I saw more of Yonghui's good side
[52:57] I think, by looking at him,
[53:01] sneaking a peek at his past code commits
[53:04] and the projects he's led,
[53:05] my feeling is that he's one of the few people I've met at such a high level
[53:10] and also very senior
[53:12] yet still has very strong technical skills
[53:17] I think that's extremely rare
[53:19] So I think
[53:21] I'm probably not yet at the level to evaluate Yonghui
[53:26] at that level
[53:27] But if you ask me
[53:28] I think Yonghui is extremely strong
[53:30] You say, taking a snapshot in Q1 2026
[53:33] Do you think the capability gap between Chinese and US models
[53:36] is widening or narrowing?
[53:38] How far apart are they?
[53:40] I think
[53:40] Um
[53:42] If we take a snapshot right now
[53:44] and look at the development trends over the past year
[53:49] or the past year and a half
[53:51] Obviously
[53:51] the gap between China and the US is getting smaller and smaller
[53:55] But whether this gap will eventually close completely
[53:58] or even if China surpasses the US
[54:00] I think that's an open question
[54:04] I think for Chinese AI researchers
[54:08] and research institutions, it's also an opportunity
[54:11] And
[54:14] I think one very real thing is
[54:16] that
[54:17] China is indeed at a significant disadvantage in terms of actual compute resources
[54:20] It's at a big disadvantage
[54:23] But this significant disadvantage
[54:25] may have actually forced out some interesting things
[54:28] For example, Chinese model companies
[54:29] are actually quite good at distilling from others
[54:34] Right
[54:34] Recently Dario (Anthropic Co-founder and CEO) called out three companies for distilling from them
[54:39] I think distillation itself
[54:42] is actually an open secret
[54:47] But I think there are different ways to approach distillation
[54:51] There's brute-force distillation and smart distillation
[54:55] two different approaches
[54:58] Um
[54:59] What do you mean by brute-force distillation?
[55:01] To give the simplest example of brute-force distillation:
[55:04] It's
[55:05] taking a bunch of tokens generated by Claude
[55:10] and forcibly training on them
[55:15] If you do something like this
[55:16] I feel
[55:19] First, it's not very ethical from a business standpoint
[55:22] And intellectually, it's rather foolish
[55:26] Because the companies doing this
[55:29] essentially
[55:30] demonstrate one thing
[55:32] they don't even know what they want to do
[55:35] The only thing they can do is copy others
[55:37] and make their model
[55:38] look a bit better on the benchmarks
[55:40] Right, but essentially it shows that
[55:41] they don't even know what they should be doing
[55:43] That's brute-force distillation
[55:44] But
[55:45] actually, distillation also involves some very interesting scientific questions
[55:49] For example, is there a possibility that
[55:53] Just a random example
[55:54] Like, could it be that
[55:55] in my process of generating
[55:56] my own training data pipeline
[55:58] I use other models as assistants
[56:01] Or the answers generated by my own model
[56:04] use other models as their evaluators
[56:08] This is actually, I think, commercially
[56:11] a bit of a gray area
[56:13] But from a technical perspective, it's quite interesting
[56:16] Because if you think about it, in a sense
[56:20] Chinese labs may have become
[56:22] pioneers in Multi-Agent (multi-agent) training
[56:25] Oh
[56:26] And it's true Multi-Agent
[56:29] Because if they use models from different companies
[56:32] with these smarter approaches
[56:33] and integrate them into a single training system
[56:36] each model's distribution might be very different
[56:39] The distribution of their language is very different
[56:42] This is true Multi-Agent
[56:45] It might be more so than
[56:47] for example, using several Geminis together
[56:50] It's something more technically interesting
[56:52] So I think, for me, the distillation of intelligence
[56:57] I don't know, commercially
[56:58] whether it'll end up being clearly wrong
[57:01] or clearly right
[57:02] But technically it's actually quite interesting
[57:05] Which companies are you referring to with these two types of distillation?
[57:08] Can we bleep out the names in post-production?
[57:10] (Sure) Hahahaha
[57:12] First of all, I haven't worked in a Chinese lab（实验室）
[57:15] So I don't know exactly who
[57:17] But my feeling is
[57:19] XXX
[57:19] probably used hard distillation
[57:22] And XXX might have done hard distillation before
[57:25] But later they probably gradually tried
[57:27] to shift toward soft distillation
[57:29] I think it's fairly obvious
[57:31] The one that probably distills less is ByteDance
[57:34] I feel like ByteDance's model
[57:36] is still quite distinctive
[57:39] Hmm, what makes it distinctive?
[57:41] For example, this model
[57:42] How smart would you say it is?
[57:44] I think
[57:45] Doubao is definitely not as smart as Gemini or Claude
[57:50] But first of all, Doubao
[57:52] For example, Doubao's voice generation is extremely good
[57:55] Wait, is that difficult?
[57:56] Technically, Doubao is indeed the best at it
[57:59] Because I find that for life questions
[58:00] I just want to ask Doubao
[58:01] Because it's so fast
[58:02] But other models
[58:03] Why don't they optimize this product feature?
[58:05] I think it still has to do with their user base
[58:09] In the US
[58:09] I think people are more focused on
[58:15] how to improve work efficiency
[58:19] Don't you have life questions?
[58:22] I do in my life
[58:24] First of all
[58:25] I personally
[58:25] am indeed pretty boring in my personal life
[58:27] So I don't have many interesting life dilemmas
[58:28] to ask Doubao
[58:30] The questions I have more often in life
[58:31] are all technical ones
[58:32] Asking a smart model like Gemini is the best
[58:34] Hahahaha
[58:37] Right I don't have this urge to open Doubao at midnight
[58:39] for late-night emotional support
[58:41] It's not just emotions, but many things
[58:43] Like when you're cooking
[58:44] Hmm You might run into some problem
[58:46] You might need someone to tell you right away
[58:50] But you don't have such a person
[58:52] Hmm, those
[58:53] I think it's probably more of a data issue
[58:54] And probably for US companies
[58:57] the main priority right now is intelligence
[59:02] or work efficiency
[59:03] Someday in the future
[59:05] Will it become these daily matters?
[59:07] I think it's possible
[59:08] The fact is
[59:10] If you ask about these daily topics
[59:11] actually
[59:13] you'll find that Gemini
[59:13] from generation to generation
[59:15] does better and better
[59:17] Hmm
[59:17] Actually, many of my friends
[59:19] including myself in the past
[59:20] When I was at Anthropic before
[59:22] I might ask Claude to write code
[59:24] But for daily lookups
[59:26] I would ask Gemini, right
[59:27] Have you used Doubao?
[59:30] I've actually only used it once or twice
[59:32] I noticed you guys don't really use it much
[59:34] Hmm, first of all
[59:35] Is it a pecking order thing?
[59:36] (There's an intelligence pecking order) Hahaha, no no, not that serious
[59:40] I just think first of all
[59:43] It's like people in China trying to use American models
[59:46] There are some complicated things involved
[59:48] Oh
[59:49] Me using Chinese models in the US
[59:50] is actually quite complicated too
[59:53] Second, I simply don't have the motivation for it
[59:57] Especially since I think in my life
[01:00:02] Work is work
[01:00:03] When I'm relaxing, I just find different work to do
[01:00:06] So for me
[01:00:07] My best companions are Claude and Gemini
[01:00:10] But it might not be like that for others
[01:00:13] So it might just be my personal thing
[01:00:15] The one or two times I used Doubao myself
[01:00:18] It was because someone showed me the Doubao phone
[01:00:21] Hahahaha, right
[01:00:23] So what do you think of the Doubao phone?
[01:00:25] I think it's a great idea
[01:00:29] Personally, in terms of results
[01:00:31] They actually did a pretty good job
[01:00:33] Of course, what I don't know is
[01:00:36] Technically, how well optimized it is
[01:00:39] I mean, it
[01:00:39] I think it executes some tasks in real time
[01:00:43] From a results perspective, there's no problem
[01:00:44] But I don't know how much overhead it has
[01:00:47] If that overhead is very, very large
[01:00:49] Then it's probably a technical issue that needs to be solved. Mm-hmm.
[01:00:51] Because you don't want, you know
[01:00:53] Your model to book a high-speed train ticket for you
[01:00:57] And end up costing more than the ticket itself
[01:00:59] That would definitely be unacceptable
[01:01:02] Right, so
[01:01:03] Technically speaking
[01:01:05] I personally don't know how mature it is
[01:01:08] And from a product perspective
[01:01:10] For everyone, it's still quite
[01:01:12] Can't say surprising
[01:01:12] But it's something that gets people pretty excited
[01:01:14] And I think
[01:01:15] Apple probably wanted to do something like this before
[01:01:17] It's just that Apple's own models haven't been that great
[01:01:20] Apple doesn't seem to care much about its AI strategy
[01:01:23] Now, I think
[01:01:26] Apple definitely cares about AI strategy
[01:01:29] Because Siri, the phone assistant
[01:01:33] Was in Apple's product launches
[01:01:36] A very, very important highlight
[01:01:38] But their own models didn't catch up
[01:01:41] Now they might be trying to do this through a partnership with Gemini
[01:01:46] To try to make it happen
[01:01:48] As for whether they care about it now
[01:01:49] First of all, I don't know
[01:01:50] If you ask me to guess, I'd definitely say they care
[01:01:52] But if you ask me to explain
[01:01:53] Why from the outside it doesn't look like they care that much
[01:01:55] My only guess is that
[01:01:56] If from the outside it looks like you care a lot
[01:01:58] And you still can't pull it off
[01:01:59] Then you just look stupid
[01:02:00] Ah
[01:02:02] Saving face
[01:02:03] Ah, right, hahaha (I don't care)
[01:02:05] Then let's talk about Doubao's model
[01:02:07] You just said Doubao's model is quite distinctive
[01:02:10] Can you be more specific?
[01:02:12] One is that its voice is really well done
[01:02:13] That's the first point
[01:02:13] I think the voice is really well done
[01:02:14] It's the most distinctive thing I can feel
[01:02:17] I mean, I think the voice quality might be
[01:02:24] To put it politely, probably one of the best in the world
[01:02:26] To put it bluntly
[01:02:27] I think it's simply the best in the world
[01:02:28] Mm. Is that hard?
[01:02:30] Mm
[01:02:32] I haven't gotten to that level myself
[01:02:33] So I don't know if it's hard or not
[01:02:35] But I think it might be something that takes a lot of effort
[01:02:38] Whether in terms of data or various optimizations
[01:02:39] Is it a product thing or a model thing?
[01:02:41] It has to be a model thing
[01:02:42] It might also include some product aspects
[01:02:44] But it's definitely a model thing
[01:02:46] Right. And then
[01:02:48] I think that's one aspect
[01:02:48] And on the other hand
[01:02:50] On the other hand, I don't have that much personal experience
[01:02:52] Because I haven't actually used it that much
[01:02:54] So it's probably more from
[01:02:55] Feedback from friends and family
[01:02:56] That is Hey, this Doubao model is just fun to talk
[01:03:00] It's just fun to chat with
[01:03:01] Haha, right
[01:03:02] But I think that
[01:03:03] Is more of some subjective feedback
[01:03:07] I think one is the voice
[01:03:10] And another is that it
[01:03:11] Generates very fast, which is also very important
[01:03:13] Because many models
[01:03:14] Are showing you their chain of thought
[01:03:16] But I'm talking about trivial things in your daily life
[01:03:18] I don't want to see its chain of thought
[01:03:20] Right. I don't think this is technically difficult
[01:03:21] It's just that maybe
[01:03:22] People haven't spent more time on it yet
[01:03:25] On this
[01:03:25] And the fact is
[01:03:26] If you try Gemini 2.5 Pro and Gemini 2.5 Flash
[01:03:31] You'll find
[01:03:32] Gemini 2.5 Flash
[01:03:32] When completing the same problem
[01:03:35] It's already much faster than before
[01:03:37] And much less fluff
[01:03:39] So I don't think this is a
[01:03:42] Mm-hmm, in my view it's not a technical difficulty
[01:03:44] It's more about when to pay attention to it
[01:03:46] And do something about it
[01:03:49] I think maybe it's now
[01:03:51] Right now these American companies
[01:03:53] Are all still in the stage of
[01:03:54] Working hard to push the upper limits of intelligence forward
[01:03:59] And ByteDance
[01:04:00] Of course it's also pushing the upper limits
[01:04:02] But I think
[01:04:03] It might just be doing very well in user optimization too
[01:04:05] Also doing quite well
[01:04:08] Recently there's another topic
[01:04:09] That Chinese robots are very hot right now
[01:04:11] At the Spring Festival Gala
[01:04:13] I don't know if you have any observations about this
[01:04:17] I've watched some performances
[01:04:18] Also searched for some prices on Amazon
[01:04:20] I was really surprised they're so cheap
[01:04:22] Haha, did you buy one
[01:04:24] No, haha
[01:04:25] I wouldn't have any use for it even if I bought one
[01:04:26] But indeed I used to
[01:04:29] I don't know, in my mind I thought humanoid robots
[01:04:31] And
[01:04:32] Of course at the software level there's nothing really
[01:04:34] But mainly hardware
[01:04:34] I thought for hardware to be this mature
[01:04:37] It would probably cost something like
[01:04:39] Several million dollars or something
[01:04:40] But it seems when I checked
[01:04:42] The price is much cheaper than that
[01:04:44] I think this still reflects
[01:04:46] China's hardware industry chain
[01:04:48] Still has a lot of advantages
[01:04:50] But I
[01:04:51] Don't really know if it
[01:04:52] As a
[01:04:54] As a robot
[01:04:55] In terms of hardware
[01:04:55] I think it's indeed very very strong
[01:04:57] And from the software perspective
[01:05:00] I haven't quite figured it out
[01:05:01] I think robot models
[01:05:04] Are also something with relatively large disagreement right now
[01:05:08] Right
[01:05:08] What do you mean
[01:05:09] What I mean is
[01:05:11] I think robot models are probably more in the
[01:05:14] Feature engineering era
[01:05:17] Like you have a given environment
[01:05:20] A given scenario
[01:05:21] You optimize for that scenario
[01:05:23] People know how to do that
[01:05:24] Mm-hmm but doing RL
[01:05:26] Doing reinforcement learning
[01:05:28] Building appropriate virtual environments
[01:05:29] Still virtual
[01:05:30] This kind of
[01:05:31] This kind of data
[01:05:32] Then you do training
[01:05:33] Can improve
[01:05:35] But it doesn't have strong generalization
[01:05:38] I think this is
[01:05:40] Whether there is generalization
[01:05:41] Is actually a watershed for many AI directions
[01:05:45] A deterministic scenario
[01:05:49] A very single scenario
[01:05:50] Can you do this well
[01:05:51] This wasn't solved just in recent years
[01:05:54] It could be done more than ten years ago
[01:05:56] Like language is also language
[01:05:58] In this era before Transformer-like architectures
[01:06:02] It wasn't completely impossible
[01:06:04] Right, back then
[01:06:05] You could also train very strong models to do translation
[01:06:06] Mm-hmm
[01:06:08] You could train a very strong model
[01:06:09] To do semantic analysis
[01:06:10] But what you couldn't do is
[01:06:12] I can improve all abilities across the board
[01:06:14] By improving at one level
[01:06:16] Mm-hmm
[01:06:17] I think this is a watershed
[01:06:18] And I think language models
[01:06:22] After Transformer and GPT
[01:06:24] Entered that kind of stage
[01:06:26] Crossed a threshold
[01:06:27] Where you can improve all abilities by improving at one level
[01:06:28] And you might train at one point
[01:06:31] It will abstract this ability
[01:06:33] And generalize it to all related things
[01:06:35] But I think robots haven't reached that stage
[01:06:39] More still before that stage
[01:06:41] Where I have a single scenario
[01:06:43] A single thing
[01:06:46] Then I can optimize for that
[01:06:50] So what do you think
[01:06:51] About these robotics teams in Silicon Valley
[01:06:53] And there are also a lot of robotics people inside Gemini
[01:06:55] Mm
[01:06:56] What do you think
[01:06:56] That direction is a bit...
[01:06:59] What would you call it
[01:07:00] Is it a sub-direction of yours
[01:07:01] Or a parallel direction
[01:07:03] Or what
[01:07:04] I think
[01:07:05] In the past, it was quite a parallel direction
[01:07:07] But now, for robotics
[01:07:09] I think people are also trying
[01:07:10] To see if they can leverage language models
[01:07:13] As a base model
[01:07:14] And then train something like
[01:07:16] For example, VLA (Vision-Language-Action model)
[01:07:17] Especially multimodal models
[01:07:18] Right, right, right, and um
[01:07:22] So now
[01:07:23] It has become something closely related
[01:07:26] To the language model track
[01:07:27] Mm
[01:07:28] And personally, my feeling is
[01:07:32] They will become very important in the future
[01:07:34] But they haven't found their own path yet
[01:07:40] But what they're doing is really interesting
[01:07:43] I highly recommend everyone go check out
[01:07:44] Robotics labs
[01:07:46] They're way more interesting than language model labs
[01:07:48] Language model labs
[01:07:50] Feel like normal offices
[01:07:51] But robotics labs, they really
[01:07:53] Have people controlling these robots
[01:07:55] Collecting all kinds of data
[01:07:56] And watching the robot in like
[01:07:59] Shelves picking up all sorts of items and stuff
[01:08:00] Doing things like that
[01:08:01] I think it's a very interesting thing
[01:08:03] Which one did you go to
[01:08:04] Ah, I went to
[01:08:05] Wait, Gemini's own lab
[01:08:07] No, not Gemini
[01:08:08] Google DeepMind's own lab
[01:08:10] I've been to see it
[01:08:11] And also that Dyna
[01:08:13] I've also been to see
[01:08:14] They have a clothes-folding robot
[01:08:16] Right, their scenario might be a bit more narrow
[01:08:18] Like folding clothes
[01:08:20] Is one robot, maybe doing some other things
[01:08:22] Like pouring water and stuff
[01:08:23] Right, like that
[01:08:24] Your intuitive feeling
[01:08:25] Where does robotics progress compare to in LLM years
[01:08:29] It hasn't reached the GPT-1 moment yet, right
[01:08:30] Definitely not
[01:08:31] I think it definitely hasn't, right
[01:08:33] Mm
[01:08:34] It's like everyone still hasn't
[01:08:36] Figured out how to scale up
[01:08:39] I think for me
[01:08:40] Whether it's robotics or multimodal generation
[01:08:43] Neither has reached that point
[01:08:45] Then let's get into today's main topic
[01:08:47] We're still very interested in you
[01:08:49] And chat about
[01:08:50] How you went from someone who studied physics
[01:08:53] Into the world of AI
[01:08:55] Mm
[01:08:56] Where did you grow up
[01:08:57] How did you grow up
[01:08:59] I
[01:09:00] I was born in Ningxia
[01:09:01] In a very, very small city
[01:09:04] Called Dawukou
[01:09:06] See, that confused expression of yours
[01:09:08] Already shows how small this city is
[01:09:10] Mm
[01:09:10] This city existed in the past because of a coal mine
[01:09:13] Also because of Shitanjing
[01:09:13] A coal mine
[01:09:14] And then this city came into being
[01:09:15] Right, so I was born there
[01:09:17] But I
[01:09:18] Went to Shanghai with my parents during elementary school
[01:09:21] And so
[01:09:21] The latter half of elementary school and my middle and high school were in Shanghai
[01:09:24] Then I went to Beijing for undergrad
[01:09:26] What I just mentioned
[01:09:28] Undergrad in Beijing
[01:09:29] Then PhD in the US
[01:09:31] Right
[01:09:31] You had good grades since you were young, right
[01:09:33] You got into university through physics competition
[01:09:35] And studied theoretical physics at Tsinghua and Stanford
[01:09:38] Right, I didn't get in through physics competition
[01:09:40] Hahaha
[01:09:41] I think I was quite mediocre when I was young
[01:09:43] Hahahaha
[01:09:44] Ah first of all
[01:09:46] The middle school and elementary school I attended were both nobodies
[01:09:51] Hahaha
[01:09:54] I think I
[01:09:55] The middle school I attended at that time, competitions
[01:10:00] Were not something you should consider
[01:10:02] It was that kind of middle school
[01:10:04] Called Shangnan Middle School East Campus
[01:10:06] Another school that makes everyone confused
[01:10:07] A school that leaves people baffled
[01:10:09] Okay, since we're here, which elementary school was it
[01:10:11] What was the elementary school called (Dezhou Second Village Elementary School)
[01:10:13] My context management ability is too strong
[01:10:16] I can't even remember what it's called actually
[01:10:18] Hahaha, mm-hmm, right
[01:10:21] And right
[01:10:22] It was that middle school
[01:10:23] It was um
[01:10:25] In a small environment within one class
[01:10:27] There were still some classmates who wanted to do things properly
[01:10:30] But overall
[01:10:32] I think that middle school was in a relatively laid-back state
[01:10:35] Right, and
[01:10:38] I think maybe my grades were okay (What do you mean by okay)
[01:10:42] Okay means at that time the situation was
[01:10:45] Shanghai high schools had so-called
[01:10:47] At that time there were so-called four top schools
[01:10:48] Like Shanghai High School
[01:10:50] Then Hua Er
[01:10:51] Jiao Tong and Fudan affiliated high schools
[01:10:52] Right And at that time the situation was I could get into these four schools
[01:10:56] But couldn't get into the best classes in these four schools
[01:10:59] But at that time I really wanted to do competitions
[01:11:01] Because I had never done competitions before
[01:11:02] You started competitions in middle school
[01:11:03] I didn't do competitions in middle school
[01:11:04] Oh, I never did competitions in middle school
[01:11:06] Why did you want to do competitions if you never did them
[01:11:07] Because I never did them
[01:11:08] So I wanted to do them
[01:11:08] How did you get that idea (Hahaha, that's just how I am)
[01:11:11] My personality is
[01:11:13] I always love doing things I'm not good at
[01:11:16] Hahahaha, right
[01:11:19] And at that time I hadn't done competitions
[01:11:22] But I knew about them
[01:11:24] So I felt that compulsory education
[01:11:27] Not compulsory education, but before going to college I should give it a try
[01:11:31] So but then
[01:11:32] My grades weren't good enough for that
[01:11:33] So
[01:11:35] Going to the four top schools, the best four schools
[01:11:37] I couldn't get into their competition classes
[01:11:39] Then I discovered there was a slightly worse school
[01:11:42] That school was Gezhi High School
[01:11:44] A slightly worse school
[01:11:45] But that school had a competition class
[01:11:47] And I felt this competition class
[01:11:51] In today's terms it's an underdog
[01:11:55] Hahahaha
[01:11:58] Impressive
[01:11:59] In the words of that time, I felt like the barefoot aren't afraid of those wearing shoes
[01:12:02] Hahahaha
[01:12:04] I think, mm-hmm
[01:12:06] Worth a shot
[01:12:08] So actually at that time, back then
[01:12:11] At that time
[01:12:12] Shanghai still had this so-called early admission system
[01:12:14] Where before the high school entrance exam
[01:12:15] You could sign a contract with a school
[01:12:17] And then you would reserve a spot at that school in advance
[01:12:19] And then go directly there
[01:12:21] And then it was very natural to go
[01:12:23] And then go do competition high school
[01:12:25] So you were actually between the regular classes of Shanghai's four top schools
[01:12:28] And the competition class of Gezhi High School
[01:12:31] Without hesitation
[01:12:32] Chose Gezhi High School's competition class
[01:12:33] Of course I can't say
[01:12:35] I can't say that when I made the choice
[01:12:36] Getting into the best four high schools
[01:12:38] Was a sure thing
[01:12:39] Although my score was indeed enough later
[01:12:41] At that time the high school entrance exam hadn't happened yet
[01:12:43] Right right but at that time I felt
[01:12:45] Even if I could get in
[01:12:46] I should go to an underdog place and take a gamble
[01:12:50] Why
[01:12:52] Because I wanted to do this
[01:12:53] What was your purpose for wanting to do competitions
[01:12:55] I think the main thing at that time was wanting to experience it
[01:12:59] I felt I hadn't done it
[01:13:00] I had to find an opportunity to do it
[01:13:01] Why did you have to do it
[01:13:05] First, I felt it was indeed difficult
[01:13:07] Ah, it was indeed more
[01:13:08] There was just this excitement about difficulty
[01:13:10] Right
[01:13:11] It's indeed
[01:13:12] At least at that time
[01:13:13] Before I started
[01:13:14] The impression everyone gave me was
[01:13:16] That this thing was much more challenging
[01:13:19] Than the stuff you learn without doing competitions
[01:13:22] Mm-hmm
[01:13:23] The people who do this seem really strong
[01:13:25] If you don't do it you're just the smoothest stone
[01:13:28] Among all the mediocre rocks
[01:13:30] So at that time I felt I should do it
[01:13:32] So I went and did it
[01:13:33] Of course doing it actually brought some benefits
[01:13:36] Looking back later
[01:13:37] If I hadn't done competitions at that time
[01:13:38] I probably wouldn't have gotten into Tsinghua
[01:13:39] Oh, did you get bonus points or something
[01:13:42] At that time actually
[01:13:44] The competition direct admission system had already declined significantly
[01:13:48] Only those who made the national training team could get direct admission
[01:13:50] My high school
[01:13:51] Anyway I think
[01:13:52] I wasn't at the level of making the national training team
[01:13:54] So let's not talk about that
[01:13:56] But before taking the senior year competition exam
[01:13:59] By a twist of fate I went to Tsinghua for a summer camp
[01:14:03] And by a twist of fate on the last day of the summer camp
[01:14:06] I heard they were doing
[01:14:08] Independent enrollment
[01:14:10] But mainly aimed at Beijing students
[01:14:13] I frantically texted the admissions office teacher
[01:14:16] Saying I wanted to take the exam with them
[01:14:19] He agreed
[01:14:20] And then he agreed to let us take the exam
[01:14:22] You all or just you
[01:14:24] Just agreed
[01:14:24] Me And the few people from our high school who went together
[01:14:27] Those high school classmates from Shanghai who went to that summer camp
[01:14:31] Oh what reason did you use to convince him to text him
[01:14:34] I've forgotten the specifics of that text
[01:14:35] But the general idea of that text was
[01:14:37] You give Beijing students the exam
[01:14:39] Why not give Shanghai students the exam
[01:14:41] Oh, you were quite righteous about it
[01:14:42] Did you think they were playing favorites at that time
[01:14:46] I didn't think they were playing favorites
[01:14:47] I just felt they had this opportunity
[01:14:48] Why not give it to us
[01:14:50] Everyone's competing on the same playing field
[01:14:52] You were classmates at that time
[01:14:54] And so I sent this message
[01:14:56] And they actually let us take the exam
[01:14:58] How many people
[01:14:59] I can't quite remember
[01:15:01] Maybe from Shanghai
[01:15:03] There were probably about seven or eight people in that exam room
[01:15:06] You sent that text
[01:15:07] Maybe
[01:15:07] Maybe other high schools had other students who sent texts too
[01:15:10] But from our high school I was the one who sent it
[01:15:12] Oh so
[01:15:14] They were all Shanghai high schools
[01:15:15] Students who went to Beijing for that summer camp
[01:15:16] Students who attended the summer camp
[01:15:19] And then they let us take the exam
[01:15:21] And then we signed
[01:15:23] That easy to talk to
[01:15:25] Right, so what I learned from that incident
[01:15:28] The most important life lesson is
[01:15:31] Be bold
[01:15:32] Haha
[01:15:34] If you don't fight for it you'll never get it
[01:15:36] Even if you fight for it you might not get it
[01:15:37] But if you don't fight for it you definitely won't get it
[01:15:39] Were you nervous when you sent that text
[01:15:41] You were still in high school
[01:15:44] I can't remember anymore
[01:15:46] At that time I felt
[01:15:46] Was this a very bold thing for me
[01:15:49] No, at that time I was completely thinking
[01:15:52] I have to fight for it now
[01:15:53] If I don't fight for it today I won't be able to fight for it tomorrow haha
[01:15:56] Like
[01:15:57] The day I heard about it I immediately started frantically texting
[01:16:00] Frantically texting who
[01:16:01] Texting the admissions office
[01:16:02] That Tsinghua admissions office teacher
[01:16:03] Texting one person or multiple people
[01:16:05] Can't remember, probably one teacher
[01:16:07] Did he reply quickly
[01:16:09] Mm-hmm mm-hmm I think Tsinghua
[01:16:11] Just said yes
[01:16:12] I don't know if they discussed it among themselves
[01:16:14] But anyway in the end they said they agreed
[01:16:18] And then we took the exam together
[01:16:19] Right
[01:16:20] So I so I
[01:16:21] Why do I feel like
[01:16:22] I've always had quite a soft spot for Tsinghua
[01:16:23] I just feel
[01:16:23] that this school is willing to give people opportunities
[01:16:28] to provide equal opportunities for everyone
[01:16:31] How did you do on that exam?
[01:16:33] Well, when I came out, I felt like I totally bombed it
[01:16:37] Because I couldn't solve half a problem
[01:16:39] But later I found out others missed even more
[01:16:42] So I did get in after all
[01:16:43] Hahaha, yeah, exactly
[01:16:46] How many of your Shanghai classmates got in that year?
[01:16:49] Ah, I think two
[01:16:51] Independent recruitment
[01:16:52] Was it a score reduction or something?
[01:16:53] It lowered the cutoff to the first-tier university line
[01:16:54] Lowered to the first-tier line
[01:16:55] Oh
[01:16:57] So how did you do on the gaokao?
[01:16:59] Later, sure enough, my gaokao wasn't high enough for Tsinghua
[01:17:02] But I could get into any school except Tsinghua and Peking University
[01:17:06] Oh
[01:17:08] So why
[01:17:09] Online it says you were recommended for admission
[01:17:12] I think it's just that people
[01:17:14] who didn't go to school during those years find it hard
[01:17:17] hard to really understand what happened back then
[01:17:18] Because two cohorts before mine
[01:17:21] you could still get recommended admission with a provincial first prize
[01:17:24] A provincial first prize got you recommended admission
[01:17:27] What about your time?
[01:17:28] In our time, with a provincial first prize
[01:17:30] you made the provincial team
[01:17:31] then represented the provincial team at the national competition
[01:17:34] and only by making the national training team could you get recommended admission
[01:17:36] I made the provincial team and went to the national competition
[01:17:38] But I didn't make the national training team
[01:17:40] Right So in my year, I didn't have a recommended admission slot
[01:17:43] Oh
[01:17:44] Were you good at competitions?
[01:17:47] I think I was pretty mediocre
[01:17:49] Like
[01:17:50] Isn't not being the best basically the same as being mediocre?
[01:17:53] And I obviously wasn't the best
[01:17:54] So I was just mediocre
[01:17:58] What was your family's attitude toward you doing competitions?
[01:18:00] What was their attitude?
[01:18:03] The best thing about my parents is
[01:18:06] they didn't really interfere much
[01:18:07] They may have tried to control me at some point
[01:18:09] but later found they couldn't
[01:18:10] Oh, how so?
[01:18:11] I just didn't listen to them
[01:18:12] Oh
[01:18:14] I think most Chinese families
[01:18:20] it's already considered pretty good when kids discuss things with their parents
[01:18:23] I usually just informed them
[01:18:25] Haha, informed them of what?
[01:18:27] Informed them, oh, I'm going to the independent recruitment exam
[01:18:30] Yeah, and
[01:18:32] Including filling out applications for high school and college
[01:18:35] My parents might not have even seen my application forms
[01:18:38] Oh, they're pretty laid-back, huh?
[01:18:41] I think they just
[01:18:47] when you can't understand what someone is doing
[01:18:49] the best thing is to not meddle
[01:18:51] I think my parents understood this very well
[01:18:53] Yeah, hahaha
[01:18:57] So you're pretty rebellious, huh?
[01:19:00] I think I am
[01:19:03] Pretty
[01:19:05] My personality is
[01:19:07] I really care about what I want to do
[01:19:09] If it's something I've figured out I want to do
[01:19:12] Don't try to stop me
[01:19:14] And I'll definitely do my absolute best
[01:19:18] But if it's something I don't want to do
[01:19:19] Forcing me won't help, I won't do it. Right
[01:19:22] Are you very competitive?
[01:19:24] Pretty strong
[01:19:26] Yeah, but I think I'm more competing with myself
[01:19:29] pushing myself, I guess
[01:19:31] Not really willing to compete with others
[01:19:34] Oh, right
[01:19:35] Of course, if
[01:19:36] well
[01:19:37] it's something I think is important
[01:19:39] and you also think it's important
[01:19:40] then I definitely have to outdo you, hehe
[01:19:44] So then you got to Tsinghua, that was even more amazing
[01:19:47] You studied quantum physics, why?
[01:19:49] Yeah, I was doing condensed matter theory at the time
[01:19:53] Why did you choose this major?
[01:19:56] A twist of fate
[01:19:57] Looking back now
[01:19:59] Of course I can
[01:20:00] come up with some very reasonable-sounding explanations
[01:20:04] But honestly, going back to that time
[01:20:06] I think it was just a twist of fate
[01:20:08] So at that time we were in the Jixian class
[01:20:11] And the Jixian class had a very good tradition
[01:20:13] First of all, although the Jixian class was in the physics department
[01:20:15] It didn't restrict what students could do
[01:20:17] Actually 2/3 of the students in the Jixian class wouldn't do physics
[01:20:20] Ah
[01:20:20] And for
[01:20:21] Why did you enter this class
[01:20:23] Uh
[01:20:24] At that time the entire Tsinghua physics department was Jixian class
[01:20:27] Maybe not anymore now
[01:20:28] Anyway it was at that time
[01:20:29] And another good tradition it had was
[01:20:29] It encouraged students to learn through practice
[01:20:33] So it encouraged students
[01:20:33] To enter research labs as early as possible
[01:20:37] And learn through research
[01:20:40] And at that time I really wanted to do theory
[01:20:46] Was it because you found it difficult
[01:20:48] It feels like you have a fascination with difficulty
[01:20:52] Maybe it's also a kind of illness
[01:20:54] I can talk more about this later
[01:20:56] What are the bad consequences of this illness
[01:20:57] Hahaha
[01:20:58] Right and then then right
[01:21:00] Then I wanted to do theory
[01:21:01] And of course the Jixian class
[01:21:04] Or what we call the Xuetang class
[01:21:05] Had a smaller class
[01:21:06] And then the
[01:21:07] Teacher recommended saying hey
[01:21:09] The Institute for Advanced Study is a great place
[01:21:10] Tsinghua Institute for Advanced Study
[01:21:11] The research institute founded by Mr. Chen-Ning Yang
[01:21:13] Is a great place
[01:21:14] So I went there to find a teacher
[01:21:16] And there happened to be
[01:21:19] A teacher who was still young at that time called
[01:21:21] Called Wang Zhong, he was my undergraduate teacher
[01:21:22] Mm-hmm, at that time he didn't have many students either
[01:21:24] And we chatted
[01:21:27] Of course I knew nothing
[01:21:28] But he was quite patient
[01:21:29] And gave me
[01:21:30] Gave me some papers to read
[01:21:32] And after reading I discussed with him
[01:21:34] Later I discovered condensed matter theory
[01:21:36] Especially the project we were doing at that time
[01:21:37] Was related to topological insulators
[01:21:39] And these kinds of directions
[01:21:42] Actually
[01:21:44] Was a direction very suitable for undergraduates to get started with
[01:21:47] It didn't require too much background knowledge
[01:21:50] You only needed to know
[01:21:52] The most basic thing is you need to know quantum mechanics
[01:21:54] Statistical mechanics
[01:21:55] Solid state physics
[01:21:56] Which are actually very very easy to learn
[01:21:59] Basic knowledge
[01:22:00] But it might really test
[01:22:01] The depth of your understanding of this knowledge
[01:22:03] So for undergraduates
[01:22:05] It's actually a particularly good direction
[01:22:07] Where you can get started quickly
[01:22:09] And do some actual projects
[01:22:10] And then we did some work together
[01:22:13] Among which possibly
[01:22:14] The work in open quantum systems
[01:22:17] Looking back now is still quite important work
[01:22:20] Right and then
[01:22:23] In a sense
[01:22:24] I think looking back now
[01:22:27] Doing that work
[01:22:28] Doing research during that period
[01:22:30] Is actually very very similar to doing AI now
[01:22:32] It's more that you have an idea
[01:22:34] You have an understanding
[01:22:36] And at that stage you can
[01:22:38] You can do a numerical experiment
[01:22:39] To verify whether your idea and understanding are correct
[01:22:42] You find AI is actually the same
[01:22:44] AI is also you have an idea
[01:22:45] You have an understanding
[01:22:46] You design some experiments
[01:22:48] To verify whether your understanding is correct
[01:22:49] And then you design some model
[01:22:52] Training pipeline
[01:22:53] To implement your ideas
[01:22:56] Right so actually these two are very similar
[01:23:00] Can you talk about your non-Hermitian system research
[01:23:04] Ah, I can talk about it
[01:23:05] I'll try to speak in human terms
[01:23:07] But it's also possible I'll actually be talking nonsense
[01:23:09] So those who don't want to listen can skip ahead
[01:23:12] Hahahaha
[01:23:14] Slide the progress bar
[01:23:16] You can set two markers on the progress bar
[01:23:18] Right and then right
[01:23:19] Non-Hermitian systems are like this
[01:23:22] One of the most basic assumptions of quantum mechanics is
[01:23:26] An isolated system
[01:23:27] Its evolution is described by unitary evolution
[01:23:32] Unitary evolution is kind of nonsense
[01:23:33] Sorry
[01:23:34] What unitary evolution means is
[01:23:35] It's a linear process
[01:23:37] And this linear process
[01:23:40] Can be described by an operator
[01:23:42] Called the Hamiltonian
[01:23:44] Ah, the Hamiltonian, in a certain sense
[01:23:47] It's somewhat like the energy of the system
[01:23:48] But not exactly
[01:23:49] It's somewhat analogous to
[01:23:50] So It determines how the system evolves over time
[01:23:53] And if it's
[01:23:54] An isolated system
[01:23:55] This Hamiltonian will be a Hermitian matrix
[01:23:57] A Hermitian matrix is one where you transpose it
[01:23:59] And then take the complex conjugate
[01:24:00] And it's the same as the original
[01:24:02] But real systems
[01:24:04] The vast majority are not isolated systems
[01:24:07] For example, you
[01:24:08] Me, as a human being
[01:24:09] Definitely have to exchange information with the outside world
[01:24:11] And exchange matter
[01:24:12] Materials are the same
[01:24:14] If you put a piece of material there
[01:24:17] Unless you create an extremely high vacuum
[01:24:19] You always have to interact with the substrate
[01:24:21] You have to exchange with the external environment
[01:24:23] So real systems
[01:24:24] Are mostly not isolated systems
[01:24:25] And isolated systems
[01:24:26] Won't be described by a unitary process
[01:24:29] And the corresponding Hamiltonian
[01:24:30] Won't be Hermitian either
[01:24:31] Hamiltonian
[01:24:32] That's where the term 'non-Hermitian' comes from
[01:24:34] It's essentially for studying open quantum systems
[01:24:36] Quantum systems that exchange with the outside world
[01:24:38] Their behavior
[01:24:39] And at that time, something very puzzling was discovered
[01:24:43] We were initially trying to study
[01:24:45] Some topological phenomena in these open quantum systems
[01:24:48] And then we found
[01:24:50] The theoretical results from hand calculations
[01:24:52] Just couldn't match the numerical results no matter what
[01:24:57] More precisely
[01:24:58] The hand calculation result
[01:24:59] Assumed the system
[01:25:00] Had periodic boundary conditions
[01:25:01] For example, on a ring
[01:25:02] Or on the surface of a torus
[01:25:04] And numerically
[01:25:07] Because it's closer to the actual situation
[01:25:08] It would calculate with open boundaries
[01:25:11] For example, the behavior of a material in a square shape
[01:25:13] And these two results just couldn't be reconciled
[01:25:15] So we tried to understand this
[01:25:16] And later found
[01:25:18] The basic paradigm people used to describe Hermitian systems
[01:25:20] A fundamental paradigm
[01:25:24] Is the so-called Bloch wave
[01:25:26] Which assumes the eigenstates of the system are
[01:25:28] Linear combinations of waves
[01:25:31] This
[01:25:32] Sine and cosine waves, that kind of thing
[01:25:33] Linear combinations of such waves
[01:25:34] This assumption
[01:25:38] In non-Hermitian systems, it actually
[01:25:43] breaks down — it becomes wrong
[01:25:45] The fact is
[01:25:46] Later we found
[01:25:47] In non-Hermitian systems
[01:25:48] Actually, the energy eigenstates
[01:25:50] All
[01:25:51] Can potentially accumulate at one edge of the system
[01:25:53] Right, and then we systematically established this
[01:25:55] Set of descriptive methods
[01:25:57] And then built a framework
[01:26:00] To describe a non-Hermitian system with open boundaries
[01:26:03] How to describe its eigenstates
[01:26:05] And thereby describe its time evolution
[01:26:07] And some dynamics
[01:26:09] So
[01:26:10] That was the work at that time
[01:26:12] And later there was a lot of
[01:26:15] Because it was actually a
[01:26:17] A paradigm shift
[01:26:18] So later there was a lot of
[01:26:20] Follow-up work
[01:26:21] But later I actually switched directions
[01:26:22] So I didn't continue much in this direction
[01:26:25] Why didn't you continue with it
[01:26:28] It's hard to catch a paradigm shift, isn't it
[01:26:31] It's hard to catch a paradigm shift
[01:26:33] Yes, yes
[01:26:35] This is the weakness of human nature
[01:26:37] I feel like
[01:26:38] I always love challenging myself with things I don't know
[01:26:40] Hahaha especially at that time
[01:26:42] Just
[01:26:43] I don't know what I was feeling in that direction
[01:26:46] Maybe looking back at that work a few years later
[01:26:49] It would become the most important work in that direction
[01:26:52] Later when you do some more work
[01:26:53] It might indeed make you more famous
[01:26:55] Get more citations
[01:26:56] Write more good journal articles
[01:26:58] Find a good faculty position
[01:26:59] But it feels like for a scientific career
[01:27:03] It wouldn't be that exciting
[01:27:06] So at that time I wanted to switch to something else
[01:27:08] Switch to something I wasn't good at
[01:27:08] Do it right
[01:27:10] And then
[01:27:10] So when doing my PhD I switched directions
[01:27:12] To do high energy theory
[01:27:14] High energy theory, right
[01:27:15] High energy physics, right
[01:27:16] So your undergraduate and PhD were also different
[01:27:18] Also different
[01:27:20] It's not just jumping from physics to AI
[01:27:22] Actually your undergraduate and PhD both look like physics
[01:27:25] But the directions had already changed significantly
[01:27:26] Right, two directions with almost no connection
[01:27:28] Oh, that's quite amazing
[01:27:30] Including your choice of competitions
[01:27:32] Going to Gezhi High School was also quite amazing
[01:27:35] Right
[01:27:36] What kind of human nature is this
[01:27:38] I think it's just
[01:27:40] To put it badly, I love torturing myself
[01:27:43] Hahaha, to put it nicely, challenging myself
[01:27:46] Hahaha
[01:27:48] Mm-hmm, are you happy being tortured
[01:27:51] I think if someone tortures themselves just for the sake of being tortured
[01:27:54] Then that person has psychological issues
[01:27:56] But If a person is being tortured in order to learn more things
[01:27:59] And enrich their experiences and abilities
[01:28:02] I think it's worth it
[01:28:05] Your undergraduate teacher
[01:28:06] Teacher Wang Zhong was also an underdog, right
[01:28:08] Does he count
[01:28:09] No, hahaha
[01:28:10] He was doing quite well
[01:28:11] How can you say that about him haha (At that time)
[01:28:13] I just said he was very young
[01:28:15] No no no, he was very young
[01:28:16] But he
[01:28:17] My impression of him has always been
[01:28:19] He is a very sharp person
[01:28:20] Very capable of seeing problems
[01:28:23] Trying to understand problems
[01:28:24] Understanding them very clearly
[01:28:25] Indeed he might not be like many teachers who are
[01:28:32] Very famous
[01:28:34] In society or very dazzling
[01:28:35] At least not at that time
[01:28:36] Now he's very famous
[01:28:38] At that time he wasn't that famous yet
[01:28:39] But I think in terms of ability
[01:28:41] I think he's very strong
[01:28:43] Right, and actually he started out
[01:28:47] When he was doing his PhD he studied with Teacher Shoucheng
[01:28:49] Teacher Zhang Shoucheng
[01:28:51] So
[01:28:53] People who can be chosen by Teacher Shoucheng
[01:28:54] Basically won't be too bad
[01:28:55] Mm-hmm
[01:28:57] Did he say anything about you changing directions for your PhD
[01:29:04] He didn't say anything
[01:29:05] I think he is
[01:29:08] He is someone who doesn't like to interfere with others
[01:29:11] Hahahaha
[01:29:13] I don't know what he was thinking inside
[01:29:15] But I think
[01:29:16] He is someone who doesn't like to interfere with others
[01:29:18] Eh, quantum physics
[01:29:19] What kind of worldview is it as a whole
[01:29:21] It and, um
[01:29:23] I think
[01:29:24] I think the biggest difference is I think, um
[01:29:27] There are many
[01:29:27] Many differences from classical physics
[01:29:30] But I think
[01:29:30] They are two corresponding concepts, right
[01:29:32] Classical physics and quantum physics
[01:29:34] They are theories at different energy and time
[01:29:38] Or spatial scales
[01:29:40] That is, essentially our world is all quantum
[01:29:43] Of course right now
[01:29:44] We don't know what exists at smaller scales
[01:29:45] Right, like At smaller scales
[01:29:46] There are many different ideas
[01:29:49] For example, string theory is an idea
[01:29:50] And then look at other ideas
[01:29:52] Quantum gravity is also an idea, things like that
[01:29:53] Right, but none of those can be verified
[01:29:55] Verified
[01:29:56] The effective theory at the smallest scales is quantum physics
[01:29:59] The tiniest, tiniest scales
[01:30:01] That can be experimentally verified
[01:30:04] The effective theory at the smallest scales is quantum
[01:30:07] Of course, this includes quantum mechanics and quantum field theory
[01:30:10] And classical physics is
[01:30:12] When the spatial scale you're looking at and
[01:30:16] Is relatively large
[01:30:17] This quantum physics
[01:30:18] Will gradually, gradually reduce to classical physics
[01:30:20] Actually, it's more about at different scales
[01:30:23] Having different effective theories
[01:30:24] This, this thing
[01:30:26] Is actually a very profound idea in physics
[01:30:28] It's what's called the renormalization group
[01:30:29] What the renormalization group says
[01:30:31] Is that
[01:30:34] The theory describing a system
[01:30:37] At different energy scales
[01:30:39] May look completely different
[01:30:41] Right, and even if they may ultimately, at the root
[01:30:45] Are all a grand unified theory
[01:30:46] Of course, right now
[01:30:47] There isn't really a true grand unified theory
[01:30:49] If one exists
[01:30:50] Even if they share the same root at the origin
[01:30:52] But at different scales
[01:30:53] They may also look completely different
[01:30:55] So classical physics and
[01:30:57] Quantum physics
[01:30:57] Are more like two descriptions at different scales
[01:31:00] Speaking of quantum physics
[01:31:01] There are several terms that seem related
[01:31:03] For example, the butterfly effect
[01:31:05] For example, quantum entanglement
[01:31:06] Can you talk about these
[01:31:08] I think this is something everyone can understand
[01:31:10] And I don't know physics either
[01:31:11] Don't blame me, everyone
[01:31:12] I don't know quantum physics either
[01:31:14] Right, I think
[01:31:15] Quantum entanglement
[01:31:16] Is indeed something relatively well-known
[01:31:19] And quite unique to quantum physics
[01:31:22] And then it's very simple
[01:31:23] It's like, say I have two particles
[01:31:24] For example, they're in an entangled state
[01:31:26] And then maybe they're actually very far apart
[01:31:29] But actually
[01:31:29] If I perform some measurement on one of them
[01:31:31] Or perturbation
[01:31:32] It will also affect the state of the other
[01:31:34] This is real
[01:31:35] This is real, right
[01:31:37] What kinds of things have quantum entanglement
[01:31:39] What kinds of two objects, there are many
[01:31:41] There are many
[01:31:43] Just, there are many
[01:31:44] Actual situations
[01:31:45] It's actually
[01:31:46] When you look closely enough, enough, enough
[01:31:49] At a small enough, microscopic scale
[01:31:50] The vast majority of particles may be in entangled states
[01:31:54] But practically speaking
[01:31:55] You can For example, create one spin and another spin
[01:31:58] First bring them together
[01:31:59] Then collapse them into an entangled state
[01:32:02] Then you can pull one of them very far away
[01:32:04] Then it becomes an entanglement
[01:32:06] A state entangled over a long distance
[01:32:07] And I think even, I remember a few years ago
[01:32:10] There were people who specifically did experiments
[01:32:13] Putting a bacterium and some other thing
[01:32:15] Into a quantum entangled state
[01:32:17] What do you mean by prepare
[01:32:19] Into a quantum entangled state
[01:32:22] This can be manually operated
[01:32:23] This is something that can be manually operated
[01:32:25] Why, how do you operate it
[01:32:26] Generally speaking
[01:32:27] It's through some
[01:32:29] Some measurements and the action of evolution operators
[01:32:32] Can put it
[01:32:32] Into this state
[01:32:34] But the hard part here
[01:32:35] Is actually how to implement this experimentally
[01:32:37] This process
[01:32:38] You can imagine
[01:32:39] It's like you perform some quantum measurements
[01:32:41] And some, some so-called quantum gate operations
[01:32:43] Actually
[01:32:45] It's quite difficult
[01:32:46] Which brings us back to the question just now
[01:32:49] That every system is actually not isolated
[01:32:51] You might have these two spins
[01:32:52] And you think, hey
[01:32:53] If I prepare them this way
[01:32:54] Don't I get an entangled state?
[01:32:55] Then I just separate them and I'm done
[01:32:57] But the real problem is
[01:32:58] These two particles actually live in our world
[01:33:01] Other particles constantly
[01:33:01] Bump into them
[01:33:02] Or external heat disturbs them a bit
[01:33:04] And the state is gone just like that
[01:33:05] So the hard part is
[01:33:06] How to actually implement this process experimentally
[01:33:08] Right, and then
[01:33:10] Another example of entanglement might be more well-known
[01:33:13] I should actually mention that example
[01:33:14] Which is Schrödinger's cat
[01:33:17] That's a much more famous example
[01:33:21] It says its state is actually a superposition
[01:33:24] Of a radioactive source emitting a particle
[01:33:26] And the cat being dead
[01:33:28] That's one state
[01:33:29] The other state is the radioactive source not emitting a particle
[01:33:31] And the cat being alive, a superposition of these two
[01:33:34] So for example
[01:33:34] If you measure that radioactive source
[01:33:36] And find that it emitted a particle
[01:33:37] You know the cat is dead
[01:33:39] No matter how far apart the cat and the source are
[01:33:42] Right, so that's entanglement
[01:33:44] But the butterfly effect is a
[01:33:48] Is a different thing
[01:33:49] And the butterfly effect
[01:33:52] Well the famous part of the butterfly effect
[01:33:54] Is actually from classical physics
[01:33:56] What people hear about in classical physics
[01:33:58] The butterfly effect is that famous example
[01:34:01] Where maybe a butterfly in South America
[01:34:02] Flaps its wings
[01:34:03] Half a month later
[01:34:04] A typhoon hits North America
[01:34:07] But from a more mathematical formulation
[01:34:10] It says that at time
[01:34:15] At the initial moment
[01:34:16] If you make a very tiny perturbation
[01:34:19] And then measure the impact of this perturbation
[01:34:21] How large it becomes in the future
[01:34:22] You'll find
[01:34:23] That this perturbation grows exponentially
[01:34:27] Right, that's mathematically
[01:34:28] A description of the classical butterfly effect
[01:34:31] But something people were puzzled about before
[01:34:35] Is how could this phenomenon exist in quantum systems
[01:34:37] Because as we just said, isolated
[01:34:39] An isolated quantum system undergoes unitary evolution
[01:34:40] It's a very linear process
[01:34:42] So in a certain sense
[01:34:44] If you have one state
[01:34:46] That is, one vector and another vector
[01:34:48] With not too large an angle between them initially
[01:34:50] Then after some evolution
[01:34:51] This angle shouldn't change
[01:34:54] And so there should always exist
[01:34:56] This situation where initial states are
[01:34:58] Very slightly different
[01:34:59] And in the future, bam, it grows exponentially
[01:35:02] That seems from quantum mechanics, like
[01:35:03] Something unlikely to happen
[01:35:05] But as we just said
[01:35:06] Our world is actually quantum at the microscopic level
[01:35:09] And becomes classical at the macroscopic level
[01:35:11] But they're part of the same continuum
[01:35:12] How can one have it and not the other
[01:35:13] That's what people were trying to understand
[01:35:15] And of course Later people gained a better understanding
[01:35:17] Which is that actually
[01:35:19] When discussing the butterfly effect in quantum systems
[01:35:21] You shouldn't discuss the change between two states
[01:35:24] This change
[01:35:24] Instead you should discuss something
[01:35:27] Called local observable（局域可观测量）
[01:35:29] That is, the change in local observables
[01:35:31] That actually corresponds to what you see
[01:35:33] In classical physics, those changes
[01:35:35] So after four years of studying quantum physics
[01:35:37] What were you thinking at the time
[01:35:40] What do you think physics helped you with
[01:35:42] When you were about to graduate as a senior
[01:35:44] I think the biggest benefit of studying physics as an undergraduate
[01:35:47] Is first of all
[01:35:50] Think things through clearly
[01:35:51] Reading isn't about reading a lot
[01:35:53] But about reading deeply
[01:35:54] Reading a lot doesn't mean you can discover new things
[01:35:59] But if you have
[01:36:00] A perspective different from others on something
[01:36:02] That's what's more valuable
[01:36:04] To society
[01:36:05] This one thing
[01:36:05] And another thing is don't trust theory too much
[01:36:09] Don't trust pure theory too much
[01:36:11] Because
[01:36:12] I came to this conclusion
[01:36:13] Because the main reason that discovery happened at that time
[01:36:16] Was because we could do numerics
[01:36:19] It started because numerics and theory didn't match
[01:36:22] Then we carefully studied that problem
[01:36:24] And discovered this thing
[01:36:27] Then why did you go study high energy physics for your PhD
[01:36:29] That's also a theory
[01:36:30] This brings us back to the topic we just discussed
[01:36:32] That always loving to challenge very difficult things
[01:36:34] Sometimes also brings some bad results
[01:36:37] What bad results
[01:36:39] For example I feel like
[01:36:40] I think my PhD, for myself personally
[01:36:44] I learned a lot
[01:36:46] Grew a lot
[01:36:47] But for this world
[01:36:49] It didn't produce any contribution
[01:36:52] Haha, this high energy theory direction
[01:36:53] It's difficult enough
[01:36:55] Very very difficult
[01:36:56] And um
[01:36:58] But the bad thing about it is
[01:37:01] It's actually not particularly verifiable
[01:37:03] There are no objective evaluation criteria
[01:37:06] Because
[01:37:07] High energy theory has developed to the point where
[01:37:10] Experiments completely can't catch up at this stage
[01:37:12] Experiments completely can't catch up to what you're discussing in theory
[01:37:15] Whether it's energy scales
[01:37:16] Or these microscopic scales
[01:37:18] Right
[01:37:19] How does it progress
[01:37:21] What does its progress depend on
[01:37:23] If not experiments
[01:37:26] One source of progress
[01:37:29] Comes from mathematical self-consistency
[01:37:31] Mm-hmm, like for example
[01:37:32] You propose a framework
[01:37:35] To describe these things
[01:37:36] Then can you be self-consistent with existing
[01:37:39] Already verified theories at lower energy scales
[01:37:43] Like for example
[01:37:43] You study string theory
[01:37:45] Then naturally the question everyone asks is
[01:37:46] Can string theory at low energy
[01:37:48] Return to quantum field theory
[01:37:49] And then return to classical physics
[01:37:51] Then this self-consistency is one criterion
[01:37:54] I think this is very reasonable
[01:37:55] A very scientific thing
[01:37:57] Of course there are also some unscientific factors
[01:37:59] That when this field completely lacks experiments
[01:38:03] And objective standards
[01:38:06] There definitely won't be just one framework that appears
[01:38:08] There definitely won't be just one self-consistent framework that appears
[01:38:10] At this time who does well
[01:38:12] Who doesn't do well
[01:38:13] Actually depends on
[01:38:16] The subjective judgments of some old-timers in the field
[01:38:20] Did someone hurt you
[01:38:22] I wasn't hurt by anyone
[01:38:23] It's just that the longer I stayed in that field
[01:38:27] The more I felt this thing was stupid, like
[01:38:32] A person's life isn't that long
[01:38:34] Why waste your own time
[01:38:36] Serving old-timers
[01:38:40] Right
[01:38:41] So it feels like spending 5 years learning a lot of knowledge
[01:38:46] Buying a big lesson
[01:38:48] This lesson is
[01:38:49] This big lesson is to (do experiments)
[01:38:52] Hey, it's about doing
[01:38:54] Things with relatively objective evaluation criteria
[01:38:57] Mm-hmm, or from another perspective
[01:38:58] Or from another perspective
[01:39:00] Like
[01:39:01] Do things that can have an impact on this world
[01:39:06] So actually your undergraduate went relatively smoothly, right
[01:39:09] In the quantum physics research field
[01:39:10] Very quickly
[01:39:11] You very quickly had very good academic results
[01:39:13] And it was paradigm-level change
[01:39:15] But you quickly felt it wasn't attractive anymore
[01:39:17] So You wanted to challenge something more difficult in your PhD
[01:39:20] Right
[01:39:20] And during the PhD period it was actually quite lonely
[01:39:24] At least in terms of results it was like that
[01:39:25] Hahaha
[01:39:26] The outside world couldn't tell
[01:39:27] From the outside it all looks like a very glamorous resume
[01:39:29] PhD at Stanford
[01:39:30] Right, I think
[01:39:31] In terms of actual research output
[01:39:35] I think
[01:39:35] No one would say my PhD papers were bad
[01:39:39] But if I'm being completely honest
[01:39:40] How much impact did they have on the world?
[01:39:41] I think almost none
[01:39:42] No impact, practically zero
[01:39:44] Right, so for me personally
[01:39:46] I was really unhappy with that
[01:39:48] But I also wasn't unhappy enough to, you know
[01:39:52] worry that people would say I was slacking off
[01:39:55] I really wasn't slacking off
[01:39:57] You can still meet all the external expectations
[01:40:00] Right
[01:40:01] How do you pull that off?
[01:40:02] Well, this is something that
[01:40:05] You know how it really feels, right?
[01:40:06] Right, exactly
[01:40:07] I think meeting external expectations
[01:40:09] Or meeting the standards of a small circle
[01:40:12] It's like training a model
[01:40:15] Once you're in that small circle
[01:40:17] And you know what their evaluation criteria are
[01:40:19] It's easy to do well
[01:40:21] Even if you don't actually believe in those standards
[01:40:23] You can still meet them
[01:40:24] Mhm
[01:40:24] But deep down, you know you don't buy into them
[01:40:26] Because sometimes even when you don't believe in it
[01:40:29] And you hit those marks
[01:40:29] You can fool yourself and just keep moving forward
[01:40:32] But I eventually realized I couldn't fool myself
[01:40:37] Couldn't lie to myself
[01:40:38] Mhm
[01:40:39] Right
[01:40:40] When did you realize that?
[01:40:41] I think probably around
[01:40:44] The last two years of my PhD
[01:40:47] I started having that feeling
[01:40:48] But back then, I hadn't really figured it out yet
[01:40:52] Hadn't figured out what to do if not this
[01:40:55] So I spent some time
[01:40:57] Exploring different directions
[01:40:59] For example At first I mostly looked into
[01:41:02] Quantum computing
[01:41:04] Or quantum information, that kind of direction
[01:41:06] Then I got a postdoc offer
[01:41:09] After getting the postdoc offer
[01:41:10] It felt more urgent
[01:41:13] Because when you're still in school
[01:41:16] You can still have a student mindset
[01:41:18] After leaving school, it's your own career（事业）
[01:41:21] You have to carve out a path for yourself
[01:41:23] So at the time I felt
[01:41:25] Quantum computing and AI were probably two
[01:41:29] I think they offer young people
[01:41:32] More opportunities
[01:41:34] So what was your postdoc direction?
[01:41:37] The postdoc had no direction
[01:41:37] It was basically just theoretical physics
[01:41:40] A postdoc is a very independent position
[01:41:41] You basically do whatever you want
[01:41:43] Right, it's more like
[01:41:45] In a way, it's kind of like doing charity
[01:41:48] Huh?
[01:41:48] Who's doing charity?
[01:41:50] Well, there are probably some
[01:41:53] Whether it's government organizations that care about research
[01:41:55] Or private organizations
[01:41:56] They donate money
[01:41:57] To the university
[01:41:58] Or allocate funding to the school
[01:42:00] The school uses that money to hire postdocs
[01:42:01] Who then do research in a department
[01:42:04] And share their research
[01:42:05] Broadly with other people in the department
[01:42:08] I think it's more about creating a kind of social atmosphere
[01:42:12] This kind of
[01:42:12] This kind of work
[01:42:13] Right, and so
[01:42:14] So there really aren't many restrictions
[01:42:15] You can basically do whatever you want
[01:42:17] But I didn't actually do
[01:42:19] The postdoc for very long
[01:42:20] I was probably at Berkeley for two or three months in reality
[01:42:23] But officially, I was only there for two weeks
[01:42:27] What do you mean by officially?
[01:42:28] I mean
[01:42:28] I had actually already gone there before I officially started
[01:42:30] Because I was already in the Bay Area anyway
[01:42:32] I went there before I officially started
[01:42:33] But after I officially started
[01:42:34] I only stayed for two weeks before quitting
[01:42:37] What happened during those two weeks?
[01:42:39] Nothing happened in those two weeks
[01:42:40] I wasn't even planning to start the position
[01:42:41] But the people at Berkeley were just too nice
[01:42:42] They were like, uh
[01:42:44] No worries, just wait until things are settled
[01:42:45] Come for as long as you can
[01:42:47] Oh, so you told them you were actually talking to Anthropic
[01:42:50] Right
[01:42:51] I told them
[01:42:51] Actually I think I might go do AI
[01:42:55] Maybe I shouldn't join
[01:42:56] Mm-hmm But Berkeley wasn't
[01:42:58] Not just Berkeley
[01:42:59] I think the Bay Area
[01:42:59] Teachers at both these schools are very nice
[01:43:02] They really take care of you
[01:43:03] They felt you haven't fully finalized things yet
[01:43:05] So better hold onto the current job first
[01:43:09] Do you think physics helped you later when doing AI
[01:43:10] In what ways
[01:43:14] I think in terms of hard skills there wasn't much help
[01:43:17] In terms of pure tool-based skills
[01:43:21] Actually the transfer from physics to AI
[01:43:25] Is very very little
[01:43:27] But I think if you really have to ask
[01:43:29] I think maybe the main
[01:43:31] Main
[01:43:33] No
[01:43:33] Can't say it's ability
[01:43:34] It's personality
[01:43:35] Maybe
[01:43:35] Maybe physics people want to get to the bottom of things more
[01:43:39] Want to understand something more
[01:43:40] And want to do things more systematically
[01:43:42] Because we're used to this very systematic
[01:43:44] Whether it's experimental methods
[01:43:45] Or theoretical methods
[01:43:47] So I think this might be
[01:43:50] A good thing
[01:43:51] But I don't think this is unique to physics people either
[01:43:55] Like Why wouldn't computer science people have this trait
[01:43:57] I know many computer science people
[01:43:59] Who also have this trait
[01:44:01] Many chemistry people also have this trait
[01:44:02] Biology students also have this trait
[01:44:03] So I don't think it's unique to physics
[01:44:06] Right but actually it's quite interesting
[01:44:08] There are indeed many in this field
[01:44:10] Especially with language models
[01:44:13] This kind of large scale AI
[01:44:14] There are indeed many people from physics backgrounds
[01:44:17] Who have been very successful
[01:44:18] Right especially at Anthropic this company
[01:44:22] When many people describe this generation of AI
[01:44:23] They all say it's a black box
[01:44:26] Can you use a scientific perspective
[01:44:27] To understand this black box
[01:44:28] The operating principles of artificial intelligence
[01:44:30] I think
[01:44:33] Everything in this world is a black box
[01:44:36] Like even physics
[01:44:39] Something everyone thinks they understand
[01:44:41] Actually doesn't really have
[01:44:44] An understanding from its microscopic behavior
[01:44:47] All the way to macroscopic manifestations
[01:44:50] Like whether it's quantum mechanics
[01:44:52] Or quantum field theory
[01:44:52] They all describe behavior at that energy scale
[01:44:55] Essentially the system is still a black box
[01:44:56] You still don't know at its most microscopic level
[01:44:58] What kind of dynamics
[01:45:00] AI is the same
[01:45:01] Whether it's a black box or not
[01:45:02] Is actually all relative
[01:45:03] We indeed don't understand language models to the level of
[01:45:07] Neurosurgery-level precision
[01:45:09] It's not that I understand this behavior
[01:45:11] To the extent of
[01:45:12] Saying this behavior is caused by which neuron
[01:45:15] Which artificial neuron's which activation
[01:45:18] Producing this behavior
[01:45:19] We don't have that
[01:45:21] Haven't reached that level of understanding
[01:45:22] Except in some very sparse
[01:45:24] Very small networks
[01:45:26] Like Anthropic
[01:45:27] Has this so-called Interpretability
[01:45:28] Interpretability team
[01:45:29] They might do some similar work
[01:45:30] But in practically usable language models
[01:45:33] We haven't reached such understanding
[01:45:34] But it doesn't mean we have no understanding at all
[01:45:37] For example Scaling Law
[01:45:38] It describes how models at that scale
[01:45:41] With model size and data improve in perplexity
[01:45:47] Under this metric get better and better
[01:45:50] Mm-hmm so you say there's no understanding at all
[01:45:53] Well if
[01:45:54] Scaling Law
[01:45:54] Doesn't count as a small part of understanding
[01:45:56] Then can we also say
[01:45:57] We actually don't understand this world at all either
[01:45:58] This world is also a complete black box
[01:46:01] So Scaling Law is a scientific law
[01:46:05] It's an empirical law
[01:46:06] An empirical law
[01:46:07] Right
[01:46:08] But
[01:46:09] The boundary between empirical laws and scientific laws
[01:46:11] is quite blurry
[01:46:14] For example
[01:46:17] If we look back at these thermodynamic
[01:46:19] various different laws
[01:46:21] The first law, the second law
[01:46:22] The Clapeyron equation and whatnot
[01:46:23] all this messy stuff
[01:46:24] When they were first discovered
[01:46:26] they were also empirical laws
[01:46:28] It's just that later on
[01:46:29] as time went by
[01:46:30] we gradually understood their microscopic mechanisms
[01:46:32] Then they might have become scientific laws
[01:46:34] Right, I think maybe something like Scaling Law
[01:46:36] or things like that
[01:46:38] Right now it's definitely still very impressive
[01:46:41] But in the future, when the technology becomes more fixed
[01:46:44] and people start to understand it more and more
[01:46:46] the microscopic process
[01:46:47] will it become a scientific law
[01:46:48] if such a definition exists
[01:46:51] I think it's possible
[01:46:55] Can you explain in scientific terms
[01:46:57] this so-called intelligence emergence
[01:47:01] First of all, this term itself isn't very scientific
[01:47:04] So naturally there's no way to use scientific language
[01:47:06] to describe something unscientific
[01:47:08] Intelligence emergence?
[01:47:10] Well, I think intelligence emergence
[01:47:14] to me it's more of a subjective feeling
[01:47:17] rather than an objective phenomenon
[01:47:19] When many people talk about intelligence emergence
[01:47:21] what they might have in mind is that previous language
[01:47:23] models could only do one type of thing
[01:47:26] like only translation
[01:47:27] only analysis
[01:47:28] only certain things
[01:47:29] But now it seems like the model
[01:47:30] can do everything
[01:47:32] But this thing
[01:47:35] Again, I think it's like
[01:47:37] to me
[01:47:38] it's more of a technical emergence
[01:47:40] rather than a behavioral emergence
[01:47:43] It's that through research
[01:47:45] we discovered
[01:47:45] how to do this kind of large-scale training
[01:47:49] and then be able to lift all capabilities across the board
[01:47:52] I think this is the more fundamental thing
[01:47:54] As for intelligence emergence itself
[01:47:56] Actually, I think, um
[01:47:58] everyone probably has a different definition in mind
[01:48:00] Right
[01:48:01] Your definition is
[01:48:02] To me, there's no definition
[01:48:04] Haha, to me
[01:48:07] The only qualitative difference is
[01:48:08] whether there's been a technical breakthrough
[01:48:11] that allows us to scale up
[01:48:13] and lift all capabilities across the board
[01:48:15] This, to me
[01:48:16] is a well-defined thing
[01:48:19] You ended up choosing AI
[01:48:22] between quantum computing and AI
[01:48:23] How did this shift happen
[01:48:26] Right, I think I still spent
[01:48:28] some time understanding
[01:48:29] where the bottlenecks lie in both directions
[01:48:32] I think the good thing is they both give young people opportunities
[01:48:34] The good thing is
[01:48:35] both have opportunities
[01:48:35] But quantum computing seemed to you
[01:48:38] to be closer to your main path
[01:48:40] at that time, right
[01:48:41] Well, that's why I needed to understand the details
[01:48:44] Because after understanding the details, I found out it's not
[01:48:46] It's the opposite
[01:48:47] Because quantum mechanics
[01:48:48] Oh, not quantum mechanics
[01:48:49] I mean quantum computing
[01:48:49] I think its main bottleneck right now
[01:48:51] is actually in the experiments
[01:48:53] It's not about how you design those algorithms
[01:48:56] or design those operators
[01:48:57] It's more about how you implement it experimentally
[01:49:00] That's something I'm actually not good at
[01:49:01] It's actually quite unrelated to many things
[01:49:04] I'm interested in
[01:49:05] It's actually relatively unrelated
[01:49:08] On the other hand, the things related to me are more
[01:49:11] Like AI, as I just mentioned
[01:49:12] It's more about having an idea
[01:49:14] and then you can use some numbers to verify it
[01:49:16] This numerical aspect in AI
[01:49:17] might be training a model or something like that
[01:49:19] Right and this is actually quite similar to doing physics
[01:49:22] It even is
[01:49:23] That's why
[01:49:24] I've always liked to compare this
[01:49:26] With 18th century physics
[01:49:28] Make comparisons
[01:49:29] It's more like physics of that era
[01:49:32] In that era theory and experiment weren't separated
[01:49:34] There were no theoretical physicists
[01:49:35] Experimental physicists
[01:49:36] You just did physics
[01:49:37] Just did physics
[01:49:38] You could do experiments yourself
[01:49:40] And also do theoretical speculation
[01:49:41] I think AI is a bit like that era
[01:49:43] So actually
[01:49:45] The distance from theoretical physics to experimental physics
[01:49:48] Is farther than directly jumping to AI
[01:49:50] Farther, mm-hmm
[01:49:51] Actually farther
[01:49:52] And in terms of interest it's also farther
[01:49:54] You don't like experimental physics
[01:49:56] (I think) You don't like doing experiments
[01:49:57] I think, um
[01:49:59] It's indeed not where my interest lies
[01:50:01] Mm-hmm although I'm not willing to do it myself
[01:50:03] But I am indeed very interested
[01:50:04] In knowing how other people's experiments are going
[01:50:05] Hahahaha
[01:50:08] Doesn't AI require doing experiments
[01:50:10] Yes, but it's more like numerics
[01:50:12] Right it's not quite like
[01:50:14] That thing where you go to the lab and build an optical table
[01:50:17] And whatnot
[01:50:18] You also have to
[01:50:19] I think experiments are really something
[01:50:21] Maybe because I don't understand
[01:50:22] I haven't reached that level
[01:50:24] So some things seem quite mystical to me
[01:50:28] For example
[01:50:29] Everyone knows how to build this optical table
[01:50:32] But some people can build it for you
[01:50:34] Some people just can't build it after 6 years
[01:50:37] This is hands-on ability
[01:50:39] I just don't get it
[01:50:40] Hahahaha
[01:50:42] I sometimes think
[01:50:42] This thing is a bit mystical
[01:50:45] Oh
[01:50:46] Mm-hmm so numerics are still better
[01:50:49] Numerics are much clearer
[01:50:51] Right right right, for me
[01:50:52] Doing numerical experiments
[01:50:54] Or like AI
[01:50:55] Training models
[01:50:56] And studying various different techniques
[01:50:58] To look at certain details
[01:50:59] This thing is actually um, is
[01:51:02] I can understand why it's done this way
[01:51:06] Mm-hmm but when it comes to building the table
[01:51:08] I'm completely at a loss
[01:51:10] You've done it before
[01:51:12] I of course have
[01:51:14] Everyone has probably done basic
[01:51:15] Physics students definitely all
[01:51:17] Done basic experimental training
[01:51:18] But more importantly I have many friends who do experiments
[01:51:21] Whether visiting their labs
[01:51:23] And watching how they do experiments
[01:51:24] Or chatting with them about how to design experiments
[01:51:28] I feel like there are many things I can't quite understand
[01:51:30] But indeed some of them do it well
[01:51:31] Some don't do it well
[01:51:33] So you say doing AI research now
[01:51:34] Is like doing thermodynamics research in the 17th century
[01:51:37] What it's actually expressing is
[01:51:38] Although everyone can't very clearly
[01:51:42] Scientifically explain and understand this thing
[01:51:44] But it won't stop it from developing
[01:51:47] Right it's more like
[01:51:50] Why
[01:51:51] Comparing to thermodynamics of that era
[01:51:52] In that era
[01:51:53] Everyone actually didn't understand the microscopic theory of heat
[01:51:57] Everyone didn't know what heat was
[01:51:59] Just like now we can't understand
[01:52:01] Right just like now
[01:52:01] Everyone can't understand
[01:52:02] Which matrix element in this language model
[01:52:04] Is doing what
[01:52:07] Actually everyone doesn't understand
[01:52:08] But it doesn't prevent you from having some good empirical laws
[01:52:12] Like various laws of thermodynamics
[01:52:14] And various Scaling Laws now
[01:52:16] So
[01:52:18] From this perspective it is
[01:52:21] From this
[01:52:22] From the perspective of this direction
[01:52:23] Yes
[01:52:24] At this level
[01:52:25] It's something like
[01:52:26] And from a researcher's perspective
[01:52:28] It's that other point I was making
[01:52:29] Theory and experiment actually go hand in hand
[01:52:32] So how did you end up interviewing at Anthropic
[01:52:35] How did your Anthropic journey unfold
[01:52:40] I think the main thing was
[01:52:42] I had former colleagues at Anthropic
[01:52:44] Haha, yeah
[01:52:45] Former colleagues
[01:52:46] So Anthropic
[01:52:47] actually has a lot of people from
[01:52:48] physics backgrounds
[01:52:49] especially theoretical physics backgrounds
[01:52:50] Why is that
[01:52:51] In terms of their hiring choices
[01:52:53] why did they choose this group of people
[01:52:54] I think
[01:52:55] Of course, many
[01:52:57] Mmm
[01:52:59] A lot of people might come up with reasons like
[01:53:02] physicists are good at this or that
[01:53:04] But from my personal perspective
[01:53:07] I think the main reason is still connections
[01:53:09] Just connections
[01:53:10] Because in Anthropic's founding team
[01:53:14] there were actually
[01:53:16] three or four fairly technical people at the time
[01:53:18] and two of them
[01:53:19] are still very much on the technical front lines
[01:53:22] in leadership
[01:53:22] Both of them came from physics backgrounds
[01:53:24] And the people they might have recruited
[01:53:27] also came from physics backgrounds
[01:53:28] So it just continued that way
[01:53:30] But actually, at this stage
[01:53:32] after I joined
[01:53:33] they barely hired anymore
[01:53:35] people with no AI background at all. Right.
[01:53:37] So it's also a
[01:53:38] I think it's also a product of its era
[01:53:40] Right, and then
[01:53:42] Anyway, I decided to go into AI at that point
[01:53:43] So I tried to reach out to a few places
[01:53:46] And then
[01:53:47] You only looked at
[01:53:48] Anthropic?
[01:53:49] No, I also reached out to OpenAI and GDM
[01:53:51] That is, Google DeepMind
[01:53:52] But Google DeepMind
[01:53:53] because it was too slow back then
[01:53:56] Hahaha, so I didn't
[01:53:59] Just didn't
[01:54:00] end up in consideration
[01:54:02] But
[01:54:03] Too slow
[01:54:03] You mean their interview process was slow
[01:54:06] But later
[01:54:07] Obviously later
[01:54:07] They made huge strides with Gemini
[01:54:10] They moved really fast after that
[01:54:12] Haha, yeah
[01:54:13] And then Anthropic
[01:54:15] Well, anyway
[01:54:16] What about OpenAI
[01:54:17] I reached out to OpenAI too
[01:54:18] But OpenAI
[01:54:20] probably didn't find a particularly good fit in terms of projects and people
[01:54:22] And Anthropic was because I reached out at that time
[01:54:26] And then it was
[01:54:27] my first
[01:54:28] that manager
[01:54:29] my first manager
[01:54:30] And he used to do theoretical physics too
[01:54:32] And he said at the time
[01:54:36] We're trying to do reinforcement learning
[01:54:39] Trying to do this kind of large-scale reinforcement learning
[01:54:41] There are many scientific questions to understand
[01:54:43] That was in '24
[01:54:45] Around August or September
[01:54:46] At that time
[01:54:47] actually
[01:54:47] reinforcement learning wasn't as mature as it is now
[01:54:50] Back then most people didn't really know how to do it
[01:54:51] Because o1 hadn't been released yet
[01:54:53] Back then, o1 was just, yeah, yeah, yeah
[01:54:56] It was just
[01:54:57] Just
[01:54:58] Everyone knew it was out there
[01:55:00] But no one had seen the results yet
[01:55:01] But Anthropic didn't actually know how to do it back then
[01:55:03] They had a general idea at the time
[01:55:07] But there were many details that needed careful study
[01:55:10] So he told me, hey
[01:55:12] There's this thing
[01:55:13] Would you like to come interview
[01:55:15] And I thought, hey
[01:55:17] It might be a good opportunity
[01:55:18] How did you perceive reinforcement learning back then
[01:55:22] No clue, haha
[01:55:24] You roughly know pre-training
[01:55:25] Post-training, yeah, exactly
[01:55:26] I roughly knew the pipeline
[01:55:27] But I didn't really know the specifics of
[01:55:31] how industrial-grade language models are trained
[01:55:33] Mm I only knew how it's done in academia
[01:55:36] Right, and then
[01:55:37] So looking back, what I knew then
[01:55:40] In hindsight, it was basically nothing
[01:55:42] Right, and then, mm
[01:55:44] More than anything
[01:55:46] I felt at the time that this was an uncertain thing
[01:55:50] But it was a good opportunity
[01:55:52] So I just went for it
[01:55:53] Mm Of course there was some interview prep and the interview process, right
[01:55:56] How did you prepare
[01:55:57] What did you talk about
[01:55:58] At the time
[01:55:59] Who did I interview with
[01:56:00] Anthropic, some of my later colleagues interviewed then
[01:56:02] And then
[01:56:04] The interview questions weren't too hard
[01:56:05] Anyway, haha, right
[01:56:07] But for me
[01:56:08] I didn't know how to prepare back then either
[01:56:10] I just went through all the courses I could find
[01:56:14] Learned everything I could on my own
[01:56:16] Did all the assignments I could do
[01:56:18] And then I hand-rolled a whole system myself
[01:56:20] That Andrej Karpathy
[01:56:22] He has that famous project called
[01:56:24] I think it's called nanoGPT or something
[01:56:27] Anyway, he has one where
[01:56:27] You can train a tiny GPT model inside a Google Colab Notebook
[01:56:30] And I hand-rolled that
[01:56:33] And then I went to the interview
[01:56:35] And that was it
[01:56:36] Right
[01:56:37] And got the offer pretty quickly
[01:56:39] And then, right
[01:56:39] Got the offer
[01:56:41] And then Your first direction was large-scale reinforcement learning
[01:56:44] Actually, back then two teams reached out
[01:56:47] Two team managers
[01:56:49] Came to talk to me
[01:56:50] One was doing evaluation
[01:56:51] Basically model evaluation
[01:56:54] And the other was doing reinforcement learning
[01:56:57] I chose reinforcement learning
[01:56:59] You chose reinforcement learning back then
[01:57:00] Because it was more unclear, right
[01:57:05] Mm-hm, and back then
[01:57:07] Anthropic wasn't the big company it is now
[01:57:08] The company was actually quite small back then
[01:57:10] How many people
[01:57:11] When I joined
[01:57:12] Our big team only had about 10 people
[01:57:16] Or 10 people
[01:57:17] Or 11 people
[01:57:17] What was the big team called
[01:57:19] It was called Horizon
[01:57:21] Right, and then
[01:57:23] Back then that big team
[01:57:24] So like the parallel teams to this big team
[01:57:26] What were they
[01:57:28] That big team later basically became
[01:57:30] The team that covered every aspect of reinforcement learning
[01:57:33] Right, but back then
[01:57:33] Its whole larger group
[01:57:34] Was just reinforcement learning
[01:57:36] The whole larger group
[01:57:38] Well, for a startup
[01:57:39] It's hard to say what that group's goal was
[01:57:43] Because
[01:57:43] They probably had many different goals at various points
[01:57:46] But just at that stage
[01:57:47] The main goal was probably doing reinforcement learning
[01:57:48] Right, and then
[01:57:49] Of course there were also teams more focused on data below that
[01:57:53] Teams more focused on environments and infra and infrastructure
[01:57:57] And teams more focused on research and algorithms
[01:58:02] And the team I joined
[01:58:03] Was more on the research and algorithms side
[01:58:05] Mm, how many people did Anthropic have back then
[01:58:09] Uh, back then probably
[01:58:13] Around seven or eight hundred in total
[01:58:15] But the whole company
[01:58:18] Seven or eight hundred, right
[01:58:20] What was your first impression when you joined
[01:58:23] I think
[01:58:25] I think my impression of Anthropic
[01:58:27] Has actually been pretty
[01:58:28] Pretty consistent
[01:58:30] I mean, after joining
[01:58:31] My impression of the company was that it had very strong execution
[01:58:36] It's just that
[01:58:37] It's actually a relatively top-down company
[01:58:39] Right and then
[01:58:40] So after many things are decided
[01:58:43] They go all in
[01:58:44] And
[01:58:45] The atmosphere between employees in the company is actually very good
[01:58:48] Everyone
[01:58:51] Doesn't hide things
[01:58:52] And especially when I first joined it was very small
[01:58:54] So
[01:58:55] Everyone knew each other
[01:58:56] So the atmosphere was very good
[01:58:57] And
[01:59:00] I think
[01:59:02] If you're doing
[01:59:04] Just doing language model related things
[01:59:06] Actually looking back now
[01:59:08] That was a very very good learning opportunity
[01:59:11] Where you could get exposed to every aspect of
[01:59:12] Training this model
[01:59:14] And could find corresponding people to ask
[01:59:18] Did Anthropic at that time already have
[01:59:20] What we all know now
[01:59:21] That very firm bet
[01:59:24] Yes yes
[01:59:26] Where did this bet come from
[01:59:27] Why did this bet exist
[01:59:30] I don't know its complete source
[01:59:33] One obvious source I could see
[01:59:36] Was the previous generation model
[01:59:38] After Claude 3 was released
[01:59:41] On Twitter, which might not have been called X yet
[01:59:43] Many people on Twitter were discussing
[01:59:48] That Claude 3 seems to write code better than GPT-4
[01:59:53] In that era
[01:59:54] GPT-4 was still a model with a huge gap from everyone else
[01:59:57] So being able to do one important thing better than GPT-4
[02:00:01] Was quite impressive
[02:00:02] So it was discovered through trial
[02:00:04] I think at least that's one of the reasons
[02:00:06] It was very quick feedback on the market
[02:00:10] Right, this is also something I think this company is very strong at
[02:00:12] Its execution is very very strong
[02:00:17] Once it gets a signal
[02:00:19] That makes it feel very reasonable
[02:00:21] Something this company should do
[02:00:22] Then it will go all in
[02:00:24] It doesn't have that redundancy of large organizations
[02:00:27] Why was its coding definitely better than GPT-4
[02:00:31] Can't say haha
[02:00:33] Oh there is a reason
[02:00:34] There is a reason
[02:00:34] There is a reason, right
[02:00:36] But it's a random reason
[02:00:37] Not because I chose this
[02:00:39] So this result happened
[02:00:40] It's a purely technical reason
[02:00:42] But
[02:00:44] Indeed, I don't
[02:00:45] I can't determine whether it was randomly tried at first
[02:00:47] Or deliberately chosen
[02:00:48] If you ask me to guess
[02:00:49] I would definitely think it was randomly tried
[02:00:51] Oh
[02:00:52] A purely technical reason
[02:00:54] There was someone who did something
[02:00:56] There was indeed a certain team that did something
[02:01:01] Was it top-down
[02:01:02] Or bottom-up
[02:01:05] I think at first it might have been bottom-up
[02:01:08] But later it became a top-down thing
[02:01:11] To quickly capture some market
[02:01:13] Right, internal and market signals
[02:01:15] Right right
[02:01:15] I think this is
[02:01:16] Need to quickly go all in
[02:01:18] Right right
[02:01:18] I think this is something Anthropic is very very strong at
[02:01:21] It's very very reactive
[02:01:22] Reacts very quickly
[02:01:23] Where does its execution come from
[02:01:24] Comes from this person Dario
[02:01:26] Comes from his certain trait
[02:01:28] I feel like
[02:01:30] Mm-hmm Anthropic As a company
[02:01:31] It can implement this
[02:01:35] Relatively top-down mechanism
[02:01:36] Is a very unique thing
[02:01:38] Why
[02:01:39] Because
[02:01:40] Implementing top-down actually has one very difficult point
[02:01:43] That the person making technical decisions
[02:01:47] Must also be the company's decision maker
[02:01:49] Mm-hmm
[02:01:51] First of all you have to be technically convincing
[02:01:54] Then the researchers below will
[02:01:57] You can then
[02:01:58] Convince the researchers below to do this thing
[02:02:01] On the other hand, you have to be the decision-maker at the company
[02:02:03] You have to be able to take responsibility for the company
[02:02:06] Anthropic has that going for it
[02:02:07] That is, its technical leader
[02:02:11] Is actually a cofounder of the company
[02:02:14] Who are you referring to?
[02:02:15] Not Dario Amodei
[02:02:17] Like Jared Kaplan
[02:02:19] And Sam McCandlish
[02:02:20] And both of them are cofounders of the company
[02:02:22] They make this decision themselves
[02:02:24] It's their company
[02:02:25] So they have the authority to do this top-down
[02:02:27] Then Dario, as CEO
[02:02:28] Does he get to say yes or no?
[02:02:31] I don't know about their decision-making discussions
[02:02:35] Hahaha, okay
[02:02:36] What role did Dario play?
[02:02:38] I can only say
[02:02:39] The technical leader has the decision-making power
[02:02:42] I can only say
[02:02:43] For my work at that time
[02:02:45] The person I worked with the most was Jared
[02:02:49] But is this hard for other model companies?
[02:02:51] Very hard. For example, OpenAI couldn't do it
[02:02:54] <b>When Ilya was
[02:02:54] there, wasn't it possible?
[02:02:55] <b>When Ilya was there, it might have been possible
[02:02:56] But Ilya later, on one hand
[02:02:59] I don't know for what reason
[02:03:00] He seemed to have lost the ability to make decisions
[02:03:04] And then he left
[02:03:05] So...
[02:03:07] What about other companies?
[02:03:09] I think other companies all find it pretty difficult
[02:03:11] Even Gemini finds it pretty difficult
[02:03:13] But I think Gemini has a completely different playbook
[02:03:15] It's a bit different
[02:03:15] That is, um
[02:03:17] I think big companies and startups
[02:03:20] Their playbooks are fundamentally different
[02:03:21] Because for startups, what's important is to make bets
[02:03:25] That is, I have to bet on something
[02:03:27] If I want to bet
[02:03:29] It means there's risk
[02:03:31] So that means
[02:03:33] I can make decisions very quickly
[02:03:36] And push decisions through strongly
[02:03:38] So perhaps in this situation
[02:03:40] Top-down is a big advantage, I think
[02:03:42] So I think organizationally, Anthropic
[02:03:44] Has an advantage over OpenAI
[02:03:46] But as a big company
[02:03:47] It might have a different mindset
[02:03:49] Because a big company's mindset might be
[02:03:51] Not only can I minimize the gambling aspect
[02:03:55] But I can also have reserves in every area
[02:03:58] And then if anything succeeds
[02:04:00] I can catch up
[02:04:01] And if I succeed at something myself
[02:04:03] I might even take the lead
[02:04:04] That's probably the big company mindset
[02:04:06] So at Gemini
[02:04:07] Google is a very traditional
[02:04:08] Very bottom-up organization
[02:04:10] At the company level
[02:04:11] There may be some well-defined frameworks
[02:04:14] To evaluate whether your work is good or bad
[02:04:17] To guide you to do things the company needs
[02:04:21] But essentially
[02:04:21] It's still you deciding what you do yourself
[02:04:23] So you think Anthropic can make bets (referring to betting heavily on coding)
[02:04:26] Because of its unique culture
[02:04:29] Organization and culture, yes
[02:04:32] This sounds like
[02:04:34] Something other companies should be able to do too
[02:04:35] But it's very strangely found that
[02:04:38] Other companies find it hard to do
[02:04:40] While Anthropic can do it
[02:04:41] Yes, I think it still requires technical credibility
[02:04:45] Or the company's leaders need to have credibility
[02:04:48] I think this is actually quite difficult
[02:04:50] You're not even talking about the CEO having credibility
[02:04:51] It's the #1 technical person having credibility
[02:04:53] Yes, to me
[02:04:54] I think it's very important for the #1 technical person to have credibility
[02:04:57] But at the same time
[02:04:59] The CEO may not have become an obstacle
[02:05:01] Yes
[02:05:02] Is this hard?
[02:05:03] Ah, I think it depends on your
[02:05:07] This cofounding team
[02:05:08] Whether there's enough mutual trust
[02:05:12] This is also crucial
[02:05:14] I think Anthropic is also strong in this regard
[02:05:16] Very strong among startups
[02:05:17] Its cofounding team
[02:05:18] Not a single person has left the company
[02:05:21] If you look at their past
[02:05:23] They are a group of people who have truly fought battles together
[02:05:26] In the past
[02:05:26] They originated from, they were all former OpenAI employees
[02:05:29] Mm-hmm right
[02:05:30] And
[02:05:31] Many of them were even
[02:05:33] Co-authors on a series of key papers
[02:05:37] Co-authors, because like
[02:05:39] The Scaling Law paper
[02:05:41] Was Jared Kaplan and Sam
[02:05:42] And of course Dario
[02:05:44] And some others
[02:05:44] Maybe Tom Brown was there too
[02:05:45] I can't quite remember if Tom Brown was there
[02:05:48] And the GPT-3 paper had Tom Brown
[02:05:50] And Benjamin Mann
[02:05:51] And Jared Kaplan and Sam were both there
[02:05:53] Dario was also there
[02:05:54] So they are people who have been in the trenches together
[02:05:58] I think mutual trust is still very key
[02:06:01] Mm-hmm, many companies might just be doing their thing
[02:06:04] And can't even keep this small group united
[02:06:06] Then how can you expect
[02:06:07] This big company to stay united
[02:06:11] You're talking about OpenAI right
[02:06:12] Mm-hmm, hahaha
[02:06:15] When you joined Anthropic
[02:06:16] What was the most important
[02:06:17] Project the company was working on
[02:06:18] Did you participate in that big project
[02:06:20] Right
[02:06:20] At that time the goal was to do large-scale
[02:06:22] Large-scale reinforcement learning
[02:06:24] And use it to improve coding ability
[02:06:29] That was the most important thing at that time
[02:06:30] Mm-hmm and we were doing this
[02:06:32] This team
[02:06:34] The research focus at that time was this thing
[02:06:36] This is also why this team later gradually grew bigger
[02:06:38] And became more and more important
[02:06:40] And
[02:06:41] The final result was
[02:06:42] Everyone trained this 3.7 together
[02:06:45] The Claude 3.7 model
[02:06:47] Hey you said internally there was a 3.6
[02:06:48] This is
[02:06:49] Not internally called
[02:06:50] It's from the outside
[02:06:51] Claude 3.5 actually had two versions
[02:06:53] One might be the June version
[02:06:55] Another October version, and then
[02:06:58] You can also see
[02:06:59] Anthropic this company
[02:07:00] Used to have no product capability either
[02:07:03] Actually calling two models by one name
[02:07:05] Hahahaha
[02:07:07] So later outsiders to distinguish
[02:07:09] Called the later version of 3.5 as 3.6
[02:07:13] So Anthropic followed this outside convention
[02:07:16] And called it 3.6
[02:07:17] Called this newer model 3.7
[02:07:21] So
[02:07:21] If you look at the actual product timeline of this company
[02:07:24] It's actually 3.5, 3.5new, 3.7
[02:07:27] How could there be a 3.5new
[02:07:29] What were they thinking
[02:07:31] Haha
[02:07:32] I can only say
[02:07:32] Anthropic at that time
[02:07:34] Probably really had no product ideas
[02:07:36] So your first project was 3.7 or 3.5
[02:07:39] 3.7, 3.7
[02:07:40] Or 3.5new
[02:07:41] Actually I
[02:07:43] Didn't participate, almost didn't participate
[02:07:45] But 3.5new
[02:07:46] Already showed signs of coding
[02:07:48] Really? When you first started
[02:07:50] At the time of 3.5new
[02:07:50] Already saw
[02:07:51] Anthropic's model
[02:07:52] Would be stronger than other models in agentic coding
[02:07:55] Why is that
[02:07:57] Can't say hahaha
[02:08:01] So when you went in
[02:08:02] It was exactly when
[02:08:03] They knew about this thing
[02:08:04] That management also knew about this sign
[02:08:07] Right and when they wanted to make bets
[02:08:09] You had very good luck I think
[02:08:10] I think, right
[02:08:11] I think when I joined
[02:08:12] Everyone had definitely already seen
[02:08:13] This thing could be done and was important
[02:08:16] But didn't quite know how to do it
[02:08:19] And when I went in
[02:08:20] I was researching with everyone how to do it
[02:08:23] Right so the method was large-scale reinforcement learning
[02:08:26] Right from the big picture perspective
[02:08:29] But of course
[02:08:30] There are many technical details that need to be researched
[02:08:33] What know-how is in here
[02:08:37] Haha there are lots of NDA (Non-Disclosure Agreement) contents
[02:08:39] Hahaha
[02:08:41] Would NDAs be written in such detail
[02:08:43] Actually in principle
[02:08:46] In principle
[02:08:48] Employees cannot during their employment and after leaving
[02:08:52] Disclose any information related to the company's internals
[02:08:54] Of course in reality
[02:08:55] Everyone probably has a sense of degree in their mind
[02:08:56] That is
[02:08:58] If this technology hasn't been made public
[02:08:59] Definitely won't discuss it publicly
[02:09:01] But I think although I can't discuss it publicly
[02:09:04] But
[02:09:07] I think
[02:09:09] Doing simple things cleaner than anyone else
[02:09:11] Is the most critical thing
[02:09:13] What do you mean by clean
[02:09:14] You also used this word just now
[02:09:15] Right it's it's
[02:09:17] I think there are many fancy techniques
[02:09:20] For example doing reinforcement learning
[02:09:22] The simplest algorithm is Policy Gradient
[02:09:26] But that doesn't mean it's the only algorithm
[02:09:28] There are other algorithms
[02:09:29] Like various complex
[02:09:30] Search algorithms and such
[02:09:32] But
[02:09:34] Are these complexities necessary
[02:09:36] And these complexities might bring you
[02:09:39] Some efficiency
[02:09:42] That is efficiency improvements
[02:09:43] But they might bring you some
[02:09:45] For example
[02:09:46] Infrastructure difficulties
[02:09:48] Then how do you trade off these things
[02:09:50] These things actually need to be understood in research
[02:09:55] How to balance these different factors
[02:09:57] And choose the best path
[02:09:58] The most stable path
[02:10:00] Right and I think a lot of know-how
[02:10:02] Is actually in these
[02:10:04] These details
[02:10:05] How to handle all these aspects of details
[02:10:08] Then how was coding described as important at that time
[02:10:12] I think
[02:10:13] Is it considered a branch of large language models
[02:10:16] An important branch
[02:10:16] Or what
[02:10:17] I think everyone might have different ideas
[02:10:18] For me
[02:10:21] There are two reasons it's important
[02:10:23] One reason is
[02:10:26] What Anthropic has been talking about
[02:10:27] That coding itself
[02:10:28] Is also part of language model research
[02:10:31] If you can do coding very well
[02:10:33] It might make your research efficiency
[02:10:35] Improve by multiples
[02:10:36] Mm-hmm, forming a research flywheel
[02:10:40] This is one reason
[02:10:41] For me
[02:10:41] Another reason
[02:10:42] Is because coding is actually a model
[02:10:44] Using tools and interacting with the environment
[02:10:47] A very good abstraction
[02:10:49] First of all the benefits of this abstraction
[02:10:51] What are the benefits of this abstraction
[02:10:53] For example the feedback signal is clear
[02:10:54] And data is abundant
[02:10:57] And
[02:10:59] Actually it's very hard in other scenarios
[02:11:01] To find
[02:11:02] Tool-using scenarios that have both these traits simultaneously
[02:11:06] So for me this is a good abstraction
[02:11:08] Some research done in this area
[02:11:10] Might be useful for more general
[02:11:12] Those abilities to use tools and interact with the environment
[02:11:16] Some useful
[02:11:17] Useful lessons
[02:11:19] What was Cursor's status at that time
[02:11:23] At that time Cursor was still a
[02:11:25] Pure product company
[02:11:28] I think in a sense
[02:11:29] It seems like before I went to Anthropic
[02:11:32] During that period
[02:11:36] Claude and Cursor were both in relatively underdog states
[02:11:40] And somehow at 3.5new, which is 3.6
[02:11:45] The outside world's 3.6 generation
[02:11:48] First the model capability went up
[02:11:50] Then Cursor discovered
[02:11:51] This model
[02:11:52] Could really do this kind of Agentic coding tool
[02:11:55] It's just a shell
[02:11:57] Right but this shell wrapping this model
[02:11:59] Suddenly let the public experience
[02:12:01] Not the public
[02:12:02] The public here means the software engineering community
[02:12:04] At that time, um
[02:12:05] I realized
[02:12:06] Wow, this really seems like a productivity tool
[02:12:08] So after that, it just took off
[02:12:11] So around that time
[02:12:12] Anthropic realized
[02:12:12] Cursor is a future competitor
[02:12:15] I don't know about that
[02:12:15] You'd have to ask Dario, hahahaha, alright
[02:12:18] How was 3.7 made
[02:12:21] This was a watershed moment
[02:12:22] For Anthropic
[02:12:23] It was a watershed model
[02:12:25] I think for Anthropic's post-training
[02:12:27] It was a watershed
[02:12:29] Before 3.7
[02:12:31] Post-training was in a relatively
[02:12:33] Um
[02:12:35] Small-scale
[02:12:36] And
[02:12:37] It was more like patching things up
[02:12:40] That kind of state for the model
[02:12:41] People didn't value post-training, right?
[02:12:43] It's not that they didn't value it
[02:12:43] Everyone from the start
[02:12:44] For a long time
[02:12:45] No one really figured out
[02:12:47] How post-training should scale up
[02:12:49] Oh, but during that period
[02:12:51] Whether OpenAI or Anthropic
[02:12:52] Or even like China's DeepSeek, right
[02:12:55] They realized how to scale this up
[02:12:57] And how to scale it up
[02:12:58] You have to find
[02:13:00] The right environment
[02:13:01] Where the feedback signal is clear enough
[02:13:04] And the environment itself is a strong data source
[02:13:08] And then
[02:13:10] On top of that
[02:13:12] You can make the training very stable
[02:13:14] Then it can work
[02:13:18] Yeah, I remember back then
[02:13:19] Actually no one knew
[02:13:20] What OpenAI's secret project was
[02:13:21] Just knew it was called Strawberry
[02:13:22] Called Strawberry
[02:13:23] And then, um
[02:13:24] People thought it would bring a new paradigm
[02:13:26] A new paradigm of post-training reinforcement learning
[02:13:29] But no one knew much more than that
[02:13:31] Yeah, actually
[02:13:35] I think when I joined Anthropic
[02:13:38] People already had a pretty good idea
[02:13:41] About how this should roughly be done
[02:13:44] The general direction of how to do it
[02:13:46] And then
[02:13:48] Later on, as time went on
[02:13:50] As I learned more and more about this field
[02:13:53] I discovered
[02:13:53] At that moment
[02:13:54] The way OpenAI was doing things
[02:13:55] And Anthropic were actually quite different
[02:13:57] How so?
[02:13:58] In terms of the specific algorithms
[02:13:59] And the way they used data
[02:14:02] They were actually quite different
[02:14:03] Although both are called post-training and reinforcement learning
[02:14:05] Um, although both are called that
[02:14:06] But of course I don't think those are the fundamental differences
[02:14:08] In terms of the big picture
[02:14:11] They're the same
[02:14:12] They found some
[02:14:13] Found some very regression-like
[02:14:15] Very clear signals
[02:14:16] Very objective
[02:14:16] And the data itself is relatively clean
[02:14:19] And learnable for the model
[02:14:21] And do stable reinforcement learning training on top of it
[02:14:25] In the big picture, that's the direction
[02:14:26] But the specific implementations differ quite a lot
[02:14:28] But later it was proven
[02:14:29] The specific implementation
[02:14:30] Each company actually went in different directions
[02:14:32] But they all succeeded
[02:14:34] Um, and at the time OpenAI's goal wasn't coding either
[02:14:37] From what I understood, the narrative was
[02:14:40] Pre-training as the first paradigm
[02:14:42] The gold mine is almost exhausted
[02:14:43] So now we're opening a second gold mine
[02:14:46] Which is post-training and reinforcement learning
[02:14:47] To let the Scaling Law continue, right
[02:14:50] I think for a long time
[02:14:52] OpenAI had this idea
[02:14:55] I don't know if their thinking has changed now
[02:14:57] For me
[02:14:59] My thinking has gone through shifts
[02:15:01] Around the era of 3.7
[02:15:03] I actually felt like I
[02:15:04] At that time I also had the feeling that pre-training was almost
[02:15:08] Party is over
[02:15:09] This kind of feeling
[02:15:10] And right when you were about to join
[02:15:13] Right when I first joined
[02:15:14] And at that time when doing these 3.7 related
[02:15:16] These kinds of experiments
[02:15:18] I also once had this idea
[02:15:21] But later as my understanding deepened
[02:15:24] I felt I discovered
[02:15:25] Actually there's still room to do things
[02:15:28] And um
[02:15:31] Pre-training Scaling Law
[02:15:35] It doesn't tell you to keep getting bigger
[02:15:37] It's actually a very systematic framework
[02:15:40] That can tell you what kinds of things are more effective
[02:15:44] Right, mm-hmm
[02:15:46] And
[02:15:47] So later discovered
[02:15:48] Actually there are still many things to do
[02:15:49] The fact is
[02:15:51] Later Anthropic
[02:15:51] And Gemini's pre-training
[02:15:53] Have also been continuously progressing
[02:15:55] OpenAI itself was stuck for a long time
[02:15:57] Haha, are they paying attention to pre-training again now
[02:15:59] They should have been paying attention to pre-training for quite a while
[02:16:02] It's just recently they might have made some progress
[02:16:07] So pre-training and post-training as two paradigms
[02:16:11] Neither has reached its plateau
[02:16:13] I think neither has
[02:16:15] But you say predicting how far it will go
[02:16:18] Can't do that
[02:16:19] Right I think
[02:16:20] I think reaching a plateau has
[02:16:23] Two possibilities
[02:16:27] One possibility is the technology itself has reached
[02:16:29] Where you still have things you want the model to do
[02:16:32] But these two technologies just can't teach it
[02:16:35] Another possibility is
[02:16:36] The things you want to do have reached a plateau
[02:16:38] I think now it's the latter
[02:16:40] Right now we know oh
[02:16:41] There's a Chatbot
[02:16:42] You can teach it to do this
[02:16:43] And then there's coding
[02:16:44] You can teach it to do this
[02:16:46] And then we don't know
[02:16:47] Right, don't know what else to teach it
[02:16:50] That is to say
[02:16:51] This model is still a very smart kid
[02:16:53] Right you can actually teach it many things
[02:16:55] Right but we humans as teachers
[02:16:56] Now don't know what the next thing to teach is
[02:16:58] Right right
[02:16:59] Or how to reasonably teach it
[02:17:01] Using current paradigms
[02:17:04] Speaking of 3.7, what other know-how
[02:17:07] How many months did this take
[02:17:11] This finally all in all
[02:17:15] From starting training to release
[02:17:16] Probably took about four or five months
[02:17:20] From when you first joined
[02:17:22] From when everyone started
[02:17:23] Doing research for this thing
[02:17:26] That probably took two or three months
[02:17:27] And then later from starting training to training completion
[02:17:30] With bumps along the way
[02:17:31] Many things to handle
[02:17:32] And there was a lot of new infrastructure
[02:17:35] Actually infrastructure is really important
[02:17:36] Very time-consuming
[02:17:37] And then probably took about two months or so
[02:17:40] What important work did you do in it
[02:17:41] I don't think I did anything important
[02:17:42] Hahaha
[02:17:44] I think
[02:17:47] My personal contribution
[02:17:47] I personally
[02:17:52] My contribution to any model
[02:17:53] My statement
[02:17:55] Is always
[02:17:56] I feel like I'm not that important to that thing
[02:17:57] I think more importantly I was very lucky
[02:17:59] To have the opportunity
[02:18:02] To join an important project at that time
[02:18:03] And did some things
[02:18:06] Mm-hmm, because in a sense
[02:18:07] I think AI in recent years
[02:18:10] This thing itself is unstoppable
[02:18:14] It doesn't depend on whether you do it or not
[02:18:18] If you don't do it someone else can do it just as well
[02:18:20] So I think in this era
[02:18:23] Actually all things that give individuals credit
[02:18:25] Are somewhat hyped
[02:18:29] Suspicious
[02:18:30] Of being hyped
[02:18:31] But indeed I think for me
[02:18:33] I am very lucky
[02:18:33] Being able to join at that stage was a big deal
[02:18:36] And, well, I learned a few things
[02:18:38] So you were lucky to be there at that stage
[02:18:41] At Anthropic
[02:18:42] this company's
[02:18:44] large-scale reinforcement learning team
[02:18:47] what did you do
[02:18:48] I think around the 3.7 era, what we mainly worked on was still
[02:18:51] working on this agentic coding thing
[02:18:54] how to scale this thing up
[02:18:56] or how to prepare
[02:18:58] like how to set up
[02:18:59] all kinds of environments and data
[02:19:02] including what algorithmic problems you'd run into
[02:19:03] Most of the research at the time was on this part
[02:19:05] Any tips on this?
[02:19:10] Looking back, there aren't really any particularly useful tips, haha
[02:19:12] I think
[02:19:15] When it comes to technical tips
[02:19:16] this is actually something
[02:19:17] that on one hand, people are really eager to hear about
[02:19:21] but companies won't let you talk about
[02:19:22] and in reality isn't very useful
[02:19:24] Why?
[02:19:25] Because a lot of algorithm design isn't actually independent
[02:19:27] independent of the algorithm itself
[02:19:29] It's very strongly
[02:19:30] dependent on your infrastructure
[02:19:32] A simple example is
[02:19:34] some companies
[02:19:35] there's a problem people often discuss
[02:19:36] which is during reinforcement learning
[02:19:38] the sample（采样） machine, the one that generates these
[02:19:42] these trace（轨迹）
[02:19:42] these token(词元) — that machine and the trainer(训练器)
[02:19:46] used to actually train the model
[02:19:47] and then update the model weights — that machine
[02:19:50] these two machines might be different
[02:19:53] But the difference
[02:19:54] is partly due to numerical differences
[02:19:56] and partly because
[02:19:58] of using this kind of asynchronous training architecture
[02:20:00] so naturally
[02:20:01] fundamentally they're different
[02:20:03] So different companies might have different
[02:20:06] degrees of this difference
[02:20:07] so your algorithm design will also differ
[02:20:09] Some companies might have these two differences
[02:20:12] being very, very large
[02:20:14] then the biggest part of your algorithm might be
[02:20:16] how to control this
[02:20:17] and how to keep the training stable
[02:20:19] Things like the actual training effectiveness
[02:20:23] will be weighted slightly less
[02:20:25] But some companies might
[02:20:26] have particularly excellent infrastructure
[02:20:29] so the difference between these two isn't that big
[02:20:30] then you can probably spend more effort
[02:20:31] on the training effectiveness
[02:20:33] So a lot of these small tips
[02:20:35] are actually not very useful
[02:20:37] A lot of know-how is actually not very useful
[02:20:40] I say this because I've indeed noticed
[02:20:44] that many
[02:20:45] other labs — well, not
[02:20:47] people in these three labs probably really want to know
[02:20:50] like how Anthropic does this
[02:20:51] or how Gemini does that
[02:20:53] But sometimes I'm reluctant to answer
[02:20:54] One main reason
[02:20:55] is that fundamentally I think
[02:20:57] answering this question would mislead them
[02:21:00] Modern AI training is a large system
[02:21:03] You actually need to understand
[02:21:06] all aspects of this system
[02:21:07] to have a holistic understanding
[02:21:08] of what makes something useful because of what
[02:21:11] rather than saying the thing itself is useful
[02:21:14] What happened from 3.7 to 4.5?
[02:21:17] Both pre-training and post-training, yes
[02:21:20] And, um
[02:21:21] Of course it's just
[02:21:22] more scaling up
[02:21:24] And data
[02:21:25] Whether it's data or training
[02:21:27] the compute is at a much larger scale
[02:21:32] But I think in terms of paradigm, there wasn't
[02:21:34] anything particularly major that changed
[02:21:39] How many people was it when you left Anthropic?
[02:21:43] Close to 2,000, I think
[02:21:45] More than doubled
[02:21:46] Ah, um
[02:21:47] So during your time at Anthropic
[02:21:49] it happened to be going through its most dramatic transformation
[02:21:52] Ah, I probably just caught
[02:21:54] the tail end of it being a small company
[02:21:57] Actually, I think after three
[02:22:00] or four months, the company already started
[02:22:02] and suddenly there were way more people
[02:22:04] Did the culture change?
[02:22:06] There were still some rather chaotic phases
[02:22:08] And then
[02:22:09] Especially around the time when I left
[02:22:12] The period right before I left
[02:22:13] I think culturally, it went through some
[02:22:15] some chaos
[02:22:17] Because some people came in from outside
[02:22:19] and there was probably some conflict with the original culture
[02:22:23] Oh, the previous culture was
[02:22:25] I think before, it was just
[02:22:26] pretty simple
[02:22:27] Yeah, it was very simple
[02:22:27] It was
[02:22:28] more like a small workshop
[02:22:30] Everyone was friends
[02:22:31] And everyone knew what the others were doing
[02:22:33] And
[02:22:34] No one was particularly
[02:22:38] you know, doing
[02:22:39] too much self-promotion or anything like that
[02:22:41] Doing pointless things
[02:22:42] No one was doing pointless things
[02:22:44] Everyone had a lot on their plate
[02:22:46] And the company back then
[02:22:47] probably had a stronger sense of urgency
[02:22:49] And later on, people probably felt that
[02:22:51] with more people
[02:22:52] this kind of culture would definitely take some hits
[02:22:56] What kind of atmosphere did it bring?
[02:22:59] I think
[02:23:00] There were indeed some people I personally didn't like very much
[02:23:05] Of course, that doesn't mean they're actually bad
[02:23:07] I'm just saying I personally didn't like them
[02:23:08] I mean, I probably don't like
[02:23:11] people who talk a lot in this field
[02:23:16] Like, I think 'idea is cheap'
[02:23:19] Ideas are cheap
[02:23:21] Many ideas
[02:23:21] are actually quite obvious, everyone knows them
[02:23:23] The hard part is how to implement them
[02:23:24] How to break it down into small
[02:23:26] actionable steps
[02:23:27] and actually get it done
[02:23:28] I don't think I like those
[02:23:32] who spend a large part of their day
[02:23:34] on Slack, I mean Slack
[02:23:35] is a workplace software used in the US
[02:23:37] and spending a lot of time on Slack
[02:23:39] talking about grand principles
[02:23:42] I think it's just
[02:23:45] not very useful, haha
[02:23:49] Why did you suddenly leave later on?
[02:23:50] Had you completed some milestone at the time?
[02:23:53] How long had you been thinking about it?
[02:23:55] At the time, I think I'd been thinking about it for
[02:23:59] a month or two
[02:24:00] about a month or so
[02:24:00] a little over a month
[02:24:01] That was fast, yeah yeah
[02:24:03] I think one aspect was
[02:24:06] Um, it was
[02:24:08] I actually didn't really agree with Dario's anti-China stance
[02:24:13] Ah, I think as a company CEO
[02:24:19] For him personally
[02:24:20] whatever views he holds, I think it's fine
[02:24:22] But as a company CEO
[02:24:24] I think
[02:24:25] pushing this view to such an extreme
[02:24:28] was a very emotional reaction
[02:24:30] Yeah, and this was a relatively minor reason
[02:24:33] But on the bigger picture
[02:24:34] There are many companies
[02:24:34] Like I just mentioned
[02:24:35] There were some cultural shocks at the company
[02:24:37] And including myself
[02:24:38] I probably wanted to learn some different things
[02:24:42] I mean, Anthropic
[02:24:43] is after all very focused
[02:24:45] And you might be doing
[02:24:47] If you really want to work on everything related to language
[02:24:48] models in all aspects
[02:24:49] And
[02:24:50] working on this kind of tool use, this Agentic stuff
[02:24:53] and coding and such
[02:24:54] then Anthropic is actually great
[02:24:56] You can learn a lot
[02:24:57] But there are many things Anthropic doesn't do
[02:24:59] For example, no one at Anthropic is doing
[02:25:01] this kind of multimodal generation
[02:25:03] You want to learn but there's nowhere to learn it
[02:25:04] And Anthropic probably didn't spend too much energy
[02:25:06] on this kind of more low-level
[02:25:08] engineering
[02:25:10] infrastructure
[02:25:12] Right
[02:25:12] So probably wanting to learn more things
[02:25:16] was also one of my motivations for leaving at the time
[02:25:19] What percentage was the anti-China stance?
[02:25:22] Because of Dario's personal reasons
[02:25:23] I've in public
[02:25:24] Combined say 40%
[02:25:25] But this number anyway just listen to it
[02:25:28] This number just tells you
[02:25:29] It's not the main reason
[02:25:30] But it is indeed a very big reason
[02:25:33] Not controlling
[02:25:34] Not a controlling reason
[02:25:35] Right not a controlling reason
[02:25:36] But it's a majority holder reason
[02:25:42] Your choice is also quite amazing
[02:25:44] Because most people
[02:25:45] When it's still an underdog
[02:25:50] Joining will create more emotional attachment
[02:25:51] Willing to accompany the company for a longer time
[02:25:53] But you instead jumped to Google
[02:25:56] Because many researchers once they enter Google
[02:25:58] They feel Google doesn't give enough scope
[02:26:02] Mm-hmm
[02:26:03] So they instead want to jump to places like xAI
[02:26:05] Or smaller organizations like Anthropic
[02:26:08] Your move seems to be the opposite
[02:26:11] Right I think
[02:26:12] Actually depends on what you yourself want
[02:26:14] If what you really want is I have a very clear
[02:26:18] Like you said a very clear scope
[02:26:19] And this thing
[02:26:20] Is closely related to my final product model
[02:26:23] I must get one of my ideas
[02:26:27] Into this model
[02:26:29] Then Google might be a very bad place
[02:26:31] Because after all there are so many researchers
[02:26:33] So many already mature organizations
[02:26:35] Doing this thing
[02:26:37] Has a very complicated process
[02:26:40] But I think Gemini is very
[02:26:43] If what you want is research freedom
[02:26:46] Freedom to explore
[02:26:47] And want to learn from broader humanity
[02:26:52] I think in this world
[02:26:52] You probably can't find a second place stronger than Gemini
[02:26:56] So
[02:26:58] So it's
[02:26:59] I think
[02:27:00] Essentially it still depends on what you yourself want
[02:27:03] But I think many people when they leave
[02:27:05] Regardless of where they leave from
[02:27:07] After switching to another place
[02:27:08] The main reason they might feel unhappy
[02:27:10] Is because they didn't figure out what they wanted
[02:27:11] For example if you came to Google
[02:27:14] But told me
[02:27:15] At first you thought you wanted research freedom
[02:27:18] And more motivation was learning
[02:27:20] And after you went
[02:27:21] Discovered you still wanted product impact
[02:27:25] Then you might feel very uncomfortable haha
[02:27:27] You don't pursue impact
[02:27:29] You also said this
[02:27:31] Now AI is a very large system
[02:27:33] And is a
[02:27:35] Very large collaborative effort
[02:27:38] What are you pursuing in it
[02:27:39] I think it's divided into stages
[02:27:40] I think
[02:27:42] At Anthropic
[02:27:43] After experiencing too much
[02:27:44] Product-related things
[02:27:48] I might also want to change my mindset
[02:27:51] To learn some different things
[02:27:53] But you say is there any day
[02:27:54] I might switch back to this mindset
[02:27:55] And want to produce some product influence
[02:27:58] That's also possible
[02:27:59] How do you quantify product influence
[02:28:02] This is very clear internally
[02:28:03] Really
[02:28:04] Hard to quantify
[02:28:05] I think
[02:28:06] Because when publishing papers there was still first author
[02:28:09] This kind of lead author
[02:28:11] Now
[02:28:12] Mm-hmm actually there's no way to quantify
[02:28:14] The reality is there's no way to quantify
[02:28:15] This is also why I think in this era
[02:28:18] Actually talking about each individual's influence
[02:28:20] Is a very very ethereal thing
[02:28:24] I think essentially it's still the organization that did
[02:28:28] Such a thing
[02:28:29] Or the world needs this
[02:28:30] So producing product impact is a subjective feeling
[02:28:32] At least on the model side it is
[02:28:36] Right and then
[02:28:37] Of course actually you can
[02:28:38] I think you can
[02:28:39] The details are about what things you yourself have done
[02:28:44] Specific technical contributions
[02:28:45] And the effects produced technically
[02:28:47] This can be discussed objectively
[02:28:49] But more subjective things are
[02:28:50] You were saying
[02:28:51] how much did this effect account for in the final product
[02:28:53] No one can really say for sure
[02:28:57] Can you describe what you did on 3.7
[02:28:59] What kind of technical work did you do
[02:29:01] that actually had an impact on the model
[02:29:03] It was mainly related to agentic coding
[02:29:04] and the environment around it
[02:29:08] And some algorithmic work as well
[02:29:10] On the algorithmic side, it was mainly about making the training more
[02:29:14] stable
[02:29:14] To be honest
[02:29:17] But I do think there were definitely some algorithmic improvements
[02:29:20] but they didn't achieve particularly ideal results
[02:29:22] To be honest
[02:29:23] It's definitely better than the previous algorithms
[02:29:26] Yeah
[02:29:27] But I don't think that was my personal contribution
[02:29:30] I think it was a collective effort from everyone, haha
[02:29:32] Right, every time I ask you
[02:29:33] you always say it's a collective effort
[02:29:35] It's not an era of individual heroism anymore
[02:29:39] Right, I think the era of individual heroism
[02:29:42] for language models
[02:29:43] has probably passed
[02:29:45] When was it?
[02:29:46] It was the Transformer moment
[02:29:48] Right, at that point when the technology
[02:29:49] hadn't yet reached the scale-up stage
[02:29:52] The person who discovered that technology
[02:29:54] might be a hero
[02:29:55] Or a small group that discovered it
[02:29:56] might be heroes
[02:29:57] After that technology was found
[02:29:58] for probably a long time
[02:29:59] from the model side, it's all been
[02:30:01] I think more about collectivism
[02:30:02] whether this group can work together
[02:30:05] whether they can toward a common goal
[02:30:07] spending their own time together
[02:30:09] and their own energy
[02:30:10] That's the most important thing
[02:30:11] Rather than what each individual
[02:30:13] contributed
[02:30:15] The reason you say collectivism
[02:30:17] is because the capability actually comes from AI, is that right?
[02:30:20] The reason I say collectivism
[02:30:21] is because I think AI as a field is fundamentally simple
[02:30:24] Like
[02:30:26] I don't think there's any
[02:30:27] Except maybe that leap moment
[02:30:29] where the idea might require
[02:30:31] some really deep insights
[02:30:32] In the process after that
[02:30:33] many ideas are actually very trivial (微不足道的)
[02:30:36] Very stupid, basically
[02:30:39] Anyone could think of them
[02:30:40] Anyone could do them
[02:30:41] It's just that you got lucky
[02:30:43] and happened to seize the opportunity to do it
[02:30:44] Including when you described Anthropic
[02:30:46] doing coding, it seemed like there was some randomness to it too
[02:30:48] But you have to seize it
[02:30:50] Right, right. But I think when it comes to coding
[02:30:52] it might still involve more
[02:30:53] than the technical stuff on the model side
[02:30:55] a bit more corporate heroism, perhaps
[02:30:58] That is, whether you can bet on it fast enough
[02:31:02] Yeah, Anthropic was indeed very strong in that regard
[02:31:04] But if Anthropic hadn't done it today
[02:31:05] some other company probably would have
[02:31:06] I think so. It's inevitable
[02:31:08] So it's all about emergent capabilities in AI
[02:31:11] It's just about whether you can seize that capability
[02:31:12] Whether it's a company or an individual
[02:31:13] Right, right
[02:31:14] I think before usable language models
[02:31:19] before large-scale language models emerged
[02:31:21] a lot of things were not inevitable
[02:31:24] Like whether someone could invent something
[02:31:26] whether a language model could be trained at scale
[02:31:28] and whether the GPT paradigm
[02:31:30] could be discovered
[02:31:32] There was a lot of uncertainty
[02:31:34] But like you said, for example
[02:31:37] if there had been no Google Brain back then
[02:31:39] Transformer might not have been discovered
[02:31:42] It might have taken many, many years
[02:31:43] before another well-funded organization with talented people discovered it
[02:31:46] That would have been a huge impact
[02:31:48] But after entering that stage
[02:31:50] especially now, the situation has reversed
[02:31:53] Any organization that wants to stop AI progress
[02:31:57] can't do it
[02:32:00] Anthropic has
[02:32:02] Anthropic is very concerned about AI safety
[02:32:03] But does Anthropic have the ability to stop AI development?
[02:32:06] It doesn't
[02:32:07] If you stop developing
[02:32:08] Others will continue
[02:32:08] Your voice will only get smaller
[02:32:10] Right, actually right now it's
[02:32:12] It's more like this kind of situation
[02:32:14] The world is pushing us forward
[02:32:16] Rather than us pushing the world forward
[02:32:20] I feel like in the future it'll be even harder for us to stop AI
[02:32:22] Haha, I think we already can't stop it
[02:32:25] I just think
[02:32:28] Trying to prevent one specific thing from happening with AI
[02:32:30] Probably isn't the right mindset to begin with
[02:32:34] This also relates to what we were just talking about
[02:32:37] Because we were just talking about Anthropic
[02:32:38] One of Anthropic's very important motivations
[02:32:41] Is so-called AI safety
[02:32:44] I think when it comes to AI safety
[02:32:47] The motivation when it was founded
[02:32:48] Right
[02:32:49] What does that have to do with it now
[02:32:51] The relationship now is complicated, meaning
[02:32:56] A natural
[02:32:58] Question people might ask is
[02:32:59] A company focused on AI safety
[02:33:01] Why is it now training frontier models
[02:33:04] Anthropic's explanation is that
[02:33:07] First, I need to have the most cutting-edge model
[02:33:09] Only then do I have a voice to push my AI safety agenda
[02:33:12] So actually, its thinking all along has been
[02:33:16] I want to build the best model in the world
[02:33:18] Everyone will have to listen to me
[02:33:20] To push forward my safety policies
[02:33:22] But from my personal perspective
[02:33:24] I think this idea is very naive
[02:33:26] Looking at this now
[02:33:28] It's not going to happen
[02:33:30] What's more likely to happen is
[02:33:30] Everyone will have great frontier models
[02:33:33] And you won't be able to stop anything from happening
[02:33:36] Maybe for this issue
[02:33:40] What we should focus on and think more about now is
[02:33:43] If you really want to avoid AI
[02:33:45] Bringing about some crisis
[02:33:47] What would be a more self-enforcing approach
[02:33:52] Let me give an example of a self-enforcing mechanism
[02:33:53] Like nuclear weapons, for example
[02:33:55] Nuclear weapons are also something that everyone thinks, hey
[02:33:57] This might have the power to destroy the world
[02:33:59] But with nuclear weapons, in the end
[02:34:00] The way they were ultimately controlled
[02:34:02] Is multi-party control
[02:34:04] In this world
[02:34:05] There are many countries with nuclear weapons
[02:34:09] They all have the ability to destroy each other
[02:34:11] So stability is maintained through this kind of balance of power
[02:34:14] I think if you want to stop AI from doing bad things
[02:34:17] Maybe
[02:34:18] Ultimately, you'll need a similar mechanism to achieve that
[02:34:21] Rather than hoping
[02:34:22] Pinning your hopes on
[02:34:23] One company setting a law to do something
[02:34:25] Mm, right
[02:34:25] And it sets it itself
[02:34:26] It can only govern itself
[02:34:28] Mm, you also just mentioned
[02:34:29] Anthropic has an interpretability team
[02:34:31] Right How far has their interpretability gotten
[02:34:34] In some relatively simple
[02:34:37] Relatively sparse neural networks
[02:34:41] They can do some interesting research
[02:34:44] For example
[02:34:44] Look at what a certain output
[02:34:48] Or input text or image
[02:34:51] What its internal representation looks like
[02:34:54] And then maybe you invert that representation somehow
[02:34:57] What kind of thing it can output after that
[02:34:59] Doing this kind of research
[02:35:03] You also just mentioned a viewpoint
[02:35:05] That AI is essentially simple
[02:35:06] Can you describe what you mean by this
[02:35:07] This is a conclusion
[02:35:09] Right, I think this is
[02:35:10] This isn't even a conclusion
[02:35:11] It's just my statement（陈述）
[02:35:13] It's my statement
[02:35:15] It could be right or wrong
[02:35:16] Oh, and my explanation for this
[02:35:18] This is your view
[02:35:19] Right, my explanation for this
[02:35:20] My explanation for this statement is
[02:35:24] I think the reason it's essentially simple is
[02:35:26] That you can run experiments
[02:35:28] Like, compared to things that are fundamentally difficult
[02:35:32] Like physics, for example
[02:35:34] The difference is
[02:35:35] With that
[02:35:36] Without experimental data at that energy scale
[02:35:38] You simply can't understand the theory at that energy scale
[02:35:41] But AI isn't bound by this（约束）
[02:35:43] It doesn't matter if you don't understand it
[02:35:45] It can still move forward
[02:35:46] And also right now
[02:35:47] The fact is
[02:35:48] I can do any experiment I can think of
[02:35:50] It's just that possibly
[02:35:51] I need some time
[02:35:52] To scale up the compute
[02:35:54] Or get the infrastructure ready
[02:35:57] But there's no fundamental difficulty
[02:35:59] Right
[02:36:01] So I've always been saying
[02:36:04] I feel AI doesn't give people the sense
[02:36:07] That it's hitting a wall because
[02:36:09] First, you can try many things
[02:36:12] Second
[02:36:13] It's not that everyone has run out of ideas
[02:36:16] With no ideas left to try
[02:36:17] More often it's that there are too many ideas
[02:36:19] Need to try them one by one
[02:36:20] Take time
[02:36:21] Mm-hmm
[02:36:24] Feels like humans are so insignificant
[02:36:26] In front of these experiments
[02:36:27] Yes, so
[02:36:29] I think very soon
[02:36:30] AI might start doing experiments itself
[02:36:33] How soon is very soon
[02:36:34] Within 4 months
[02:36:35] I think in the next 6-12 months
[02:36:38] AI will do experiments itself
[02:36:39] I think of course this statement
[02:36:40] Is not very well-defined
[02:36:41] Sorry I said something very vague
[02:36:43] Like um
[02:36:45] AI improving itself
[02:36:47] Or speeding up its own development process
[02:36:50] This is actually already happening
[02:36:53] Right
[02:36:53] Like we discussed earlier
[02:36:55] It's already helping us
[02:36:56] To achieve some of the things we want
[02:37:00] And speed up our experimental pace
[02:37:02] But I think in the next six to twelve
[02:37:04] Sorry
[02:37:05] What it currently can't do is
[02:37:07] Whether it can
[02:37:09] From start to finish complete an AI research project
[02:37:13] Like not only can it write this code
[02:37:15] It can also run this experiment
[02:37:16] Run this experiment
[02:37:17] Can also see the results
[02:37:18] See the results
[02:37:19] Can also analyze the results
[02:37:20] Analyze the results
[02:37:20] Know where it did wrong
[02:37:21] Then propose new hypotheses
[02:37:23] Design new code
[02:37:25] Run new experiments
[02:37:27] This chain is not yet complete
[02:37:30] But I think
[02:37:30] This chain
[02:37:30] Might be the next thing to gradually become complete
[02:37:34] Based on your various reasons
[02:37:36] At the moment you left
[02:37:37] Decided to leave Anthropic
[02:37:39] What were your expectations for this company's future
[02:37:41] I think when I left
[02:37:43] I was actually quite pessimistic about this company
[02:37:47] But later obviously I was overly pessimistic
[02:37:49] Hehehe why pessimistic
[02:37:50] The reason I was pessimistic at that time was
[02:37:53] I think when I left Anthropic
[02:37:55] Anthropic actually um
[02:37:58] Its main revenue source was API
[02:38:01] Selling tokens
[02:38:02] And
[02:38:04] This is a bad business
[02:38:06] Is a bad business
[02:38:07] Because this business
[02:38:08] Is only a good business for one company
[02:38:10] Which is Google
[02:38:11] Because this
[02:38:11] This business eventually leads to price wars
[02:38:15] Eventually it will be price wars
[02:38:17] In price wars if you don't have the complete chain
[02:38:21] There's not much advantage
[02:38:24] But later Anthropic obviously on the product side
[02:38:28] I think indeed there were many clever ideas
[02:38:30] Did many good things
[02:38:31] Whether it's Claude Code getting better and better
[02:38:33] And Claude Cowork
[02:38:34] And various
[02:38:35] Work and efficiency related things
[02:38:37] All slowly converged
[02:38:39] So it feels like it has now become more than
[02:38:42] What I thought at the time
[02:38:43] If you ask me which of OpenAI and Anthropic would die first
[02:38:46] Of course they won't really die
[02:38:47] Just which would become less important first
[02:38:50] At that time I would think hey
[02:38:50] Maybe Anthropic would become less important first
[02:38:53] But later first OpenAI got punched by Google
[02:38:56] Then Anthropic itself got on track
[02:38:58] So now it seems Anthropic has more advantage
[02:39:01] Haha
[02:39:02] Have you ever regretted it
[02:39:04] Mm-hmm not really
[02:39:04] I think for me personally
[02:39:06] My personal motivation was still wanting to switch places
[02:39:10] Improve myself
[02:39:11] I think for this
[02:39:13] For the thing I wanted to do
[02:39:14] This choice wasn't wrong
[02:39:17] You also mentioned Anthropic's products have many clever ideas
[02:39:20] Especially this year
[02:39:22] Like Cowork and such
[02:39:24] Where does this come from
[02:39:27] I think I didn't see Cowork's development process
[02:39:29] So I don't know
[02:39:30] And Claude Code
[02:39:31] I think the person, the product
[02:39:35] Might also
[02:39:35] Really have some opportunities for individual heroism
[02:39:39] Is it a researcher or a product manager
[02:39:40] Boris Cherny
[02:39:43] I think Claude Code almost
[02:39:46] At least the beginning of this thing
[02:39:47] Was him wanting to do this thing himself
[02:39:49] To improve his own or colleagues' work efficiency
[02:39:52] Finally became something
[02:39:54] Important to everyone
[02:39:56] What kind of person is Boris
[02:39:58] I didn't have too much personal contact with him
[02:39:59] I mostly just saw his work, when at the company
[02:40:02] He's a researcher right
[02:40:04] Right but he's mainly on the product side
[02:40:07] So Anthropic does have a dedicated product department
[02:40:10] Didn't used to be so separated
[02:40:11] Later had a separate one
[02:40:13] Right, Anthropic seems to really understand AI products
[02:40:16] Right I think
[02:40:18] I think this is why
[02:40:19] When we first started talking
[02:40:21] Felt that product managers
[02:40:23] Might still be quite hard to replace with AI currently
[02:40:25] Hahaha mm-hmm
[02:40:27] Good product managers
[02:40:28] Hey he doesn't seem to be the previous generation of product managers
[02:40:31] He's not the kind who arranges features and such
[02:40:35] He seems to know how to collaborate with AI
[02:40:37] Some kind of product manager
[02:40:38] Right I think the previous generation of product managers might
[02:40:41] But not entirely
[02:40:43] The previous generation also had some
[02:40:44] Interaction
[02:40:45] Interaction-level changes
[02:40:46] But every interaction-level change
[02:40:48] Actually brings a very big product
[02:40:49] Like maybe Douyin
[02:40:52] Is a product with interaction-level change
[02:40:54] Then it immediately brought huge
[02:40:56] Mm-hmm opened new directions
[02:40:58] And I think
[02:40:59] Maybe Claude Code is also a product at this level
[02:41:04] Claude Code and Cowork were both by Boris
[02:41:06] I don't know who did Cowork
[02:41:07] OK I already left
[02:41:09] I see
[02:41:10] Then tell me about after you arrived at Google
[02:41:12] DeepMind, has your work focus changed
[02:41:13] Work focus changed or not
[02:41:15] Mm-hmm, still
[02:41:17] Some changes happened
[02:41:19] And
[02:41:21] I anyway mainly focus on
[02:41:24] Doing ML coding
[02:41:27] And some relatively long horizon things
[02:41:30] These two things
[02:41:31] 其实刚才都都大概提了一嘴
[02:41:32] Like ML coding
[02:41:33] Actually just now both were roughly mentioned
[02:41:36] Actually it mainly wants to achieve
[02:41:37] The complete AI training itself process we just talked about
[02:41:39] Of course in this process
[02:41:41] There are many practical problems
[02:41:44] Many practical details to solve
[02:41:46] I think in the big picture
[02:41:48] Everyone actually has quite a consensus on how to do it
[02:41:50] But still back to details
[02:41:52] There are many things to handle in details
[02:41:53] Like how to choose appropriate data
[02:41:55] How to choose appropriate feedback signals
[02:41:58] And it brings new infrastructure challenges
[02:42:02] And
[02:42:03] Now it's about slowly figuring out these things
[02:42:05] Slowly figuring them out
[02:42:06] And
[02:42:10] Like long horizon
[02:42:11] Is the other thing we just talked about
[02:42:12] That is wanting to achieve
[02:42:14] That this model can
[02:42:16] Still that slogan
[02:42:18] Train with finite
[02:42:20] But use as infinite
[02:42:22] I think wanting to make this training
[02:42:26] Length longer and longer and longer
[02:42:29] Might not be making a single training
[02:42:32] This segment's length keep increasing
[02:42:34] Might not be a very realistic solution
[02:42:37] But a very realistic thing is
[02:42:38] How do you under limited context
[02:42:40] Do longer work
[02:42:43] Actually if you think about it
[02:42:44] Humans are actually like this
[02:42:45] Human context is actually very very short
[02:42:47] If you ask me now what I ate last night
[02:42:48] I can't remember at all
[02:42:50] Ah you might still remember
[02:42:51] Hahaha I can't remember at all
[02:42:53] Because why
[02:42:54] Because it's not critical to my current scenario
[02:42:56] Right Like even if I knew what I ate last night
[02:42:57] So what
[02:42:58] So I choose to forget it
[02:43:00] So human context is essentially very short
[02:43:02] But they can selectively forget
[02:43:04] And selectively retrieve
[02:43:07] To bring back these important
[02:43:09] Information relevant to the current scenario
[02:43:12] So
[02:43:13] I think that might also be for me
[02:43:15] A very interesting direction
[02:43:18] These two things are actually somewhat related
[02:43:20] Somewhat complementary
[02:43:21] Why, these two things
[02:43:22] Actually both are within the large category of models using tools and with environment
[02:43:26] And different models
[02:43:28] Different people interacting
[02:43:30] Within this category
[02:43:30] The node everyone completed in the past
[02:43:32] Is Agentic coding, which is both tools and environment
[02:43:37] Environment is this virtual machine
[02:43:39] Or interacting within your own computer
[02:43:42] And this thing
[02:43:45] Actually horizontally it grows different usage scenarios
[02:43:50] Then doing AI research
[02:43:52] Is actually horizontally
[02:43:52] Another scenario in this scenario
[02:43:55] This scenario
[02:43:56] Actually not only horizontally is it a new scenario
[02:43:58] Vertically
[02:43:59] It also makes the scale of this thing longer
[02:44:03] Because completing a code completion or something
[02:44:07] Is a very quick thing
[02:44:08] But doing a complete AI research
[02:44:11] Or doing this kind of computer science research
[02:44:13] Is a very long process
[02:44:16] Right so
[02:44:17] It's actually like a T-shape
[02:44:19] Horizontal extension
[02:44:20] Vertical extension too
[02:44:23] Is long horizon still a scientific problem
[02:44:26] Mm-hmm there are scientific problems
[02:44:28] Also engineering problems
[02:44:28] I think its scientific problems are more about
[02:44:32] How to try different solutions
[02:44:34] After trying in a more scientific way
[02:44:37] To find the path we ultimately want to take
[02:44:40] This solution
[02:44:41] What are the ways
[02:44:42] Mm-hmm
[02:44:45] I might not be able to say too specifically
[02:44:47] But broadly speaking
[02:44:49] Some solutions are from the pre-train perspective
[02:44:52] From the pre-training perspective
[02:44:54] Some solutions
[02:44:54] Are similar to this sparse attention
[02:44:57] Sparse attention
[02:44:58] For example DeepSeek also has some work
[02:45:00] And academia also has a lot of work
[02:45:03] And from the post-training perspective
[02:45:04] Also have post-training solutions
[02:45:05] Like for example externally
[02:45:09] Like what you use every day, Cursor and such
[02:45:11] They have very strong context management
[02:45:13] Managing this context ability
[02:45:14] Like it can let the model choose
[02:45:16] I think this middle segment is unimportant
[02:45:18] Just throw it away
[02:45:19] And that segment is important so store it in some file
[02:45:21] Retrieve it when needed
[02:45:22] These two broadly speaking
[02:45:25] These two solutions
[02:45:27] Both have people researching
[02:45:29] Of course the specific implementation details
[02:45:30] Are more than the examples I just mentioned
[02:45:32] The examples I just mentioned
[02:45:33] Are relatively public examples
[02:45:34] The specific implementation details
[02:45:36] Of course each company has its own little secrets
[02:45:38] Well, I think ultimately it all comes down to that
[02:45:43] And then
[02:45:45] I personally spend a lot of
[02:45:48] more time on post-training approaches
[02:45:52] Because
[02:45:53] Well, first of all,
[02:45:54] because I myself
[02:45:56] haven't actually spent official work time on pre-training
[02:45:59] Pre-training is more of an interest to me
[02:46:01] something I want to learn about
[02:46:02] But I myself
[02:46:03] haven't actually done that much work on it
[02:46:05] And on the other hand,
[02:46:07] I think post-training approaches
[02:46:10] actually align better with my own understanding of this
[02:46:13] My understanding of this
[02:46:14] is exactly what we've been talking about
[02:46:16] whether you can train with short context
[02:46:19] but still handle long-context tasks
[02:46:23] Pre-training approaches
[02:46:23] essentially still require you to have long context
[02:46:25] Training it requires the data to contain it
[02:46:27] Right, yeah.
[02:46:28] Right, right. So
[02:46:29] so it doesn't quite fit my philosophy on this problem
[02:46:32] Oh, right.
[02:46:33] So do you think it's possible now?
[02:46:35] Training for long with short
[02:46:37] I think
[02:46:37] It's definitely possible
[02:46:40] but we're not sure which approach works best
[02:46:44] Gemini does long-context really well
[02:46:46] Why is that?
[02:46:49] There are some tricks
[02:46:50] [laughter]
[02:46:55] There are some tricks that really surprised me, haha
[02:46:58] Oh, this is about pre-training, right?
[02:47:00] Doing long context well
[02:47:02] definitely requires both sides
[02:47:03] But I'm just saying, for me,
[02:47:05] the pre-training side
[02:47:06] that trick still really surprised me
[02:47:08] [laughter]
[02:47:11] Right, OpenAI doesn't do it as well as Gemini
[02:47:13] on long context
[02:47:14] But there are also different opinions
[02:47:16] Some people say that with this Gemini 3 generation
[02:47:19] long context actually got a bit worse
[02:47:20] and stuff like that. Right.
[02:47:22] Again, when you joined Gemini
[02:47:23] it felt like people didn't have high expectations for Gemini
[02:47:27] No, I already had pretty high expectations for Gemini at the time
[02:47:29] Haha, what year and month was that?
[02:47:32] I joined at the end of September last year
[02:47:36] That was before Gemini
[02:47:37] released Gemini 3
[02:47:39] You had high expectations for it
[02:47:40] What about others?
[02:47:43] I think people in the industry
[02:47:44] still had a pretty good impression of Gemini back then
[02:47:47] I mean, I think
[02:47:49] before, everyone thought Google was in real trouble
[02:47:51] under OpenAI's impact
[02:47:53] I think people's perception
[02:47:55] probably shifted with the Gemini 2.5 generation
[02:47:59] Because 2.5 was clearly
[02:48:01] you could tell Google was getting the hang of it
[02:48:04] Of course, even before that, Gemini's
[02:48:06] 1.5 also had some, you know,
[02:48:12] small things
[02:48:14] where it was already pretty strong in specific areas
[02:48:17] It was clearly no longer far behind
[02:48:19] But 2.5 was really
[02:48:20] truly a generation
[02:48:21] I think it was when people actually started using the model
[02:48:23] Anyway, I myself have used 2.5 quite a bit
[02:48:25] used it quite a lot
[02:48:26] You went to Gemini because you saw 2.5?
[02:48:28] My going to Gemini had nothing to do with that
[02:48:30] Mainly it's because I knew
[02:48:31] what kind of atmosphere Gemini had
[02:48:33] There were a lot of people doing different kinds of research
[02:48:35] And I also knew some people
[02:48:37] actually
[02:48:40] doing really interesting research
[02:48:40] And many Gemini engineers
[02:48:43] I think their technical skills are extremely, extremely strong
[02:48:46] I think
[02:48:47] I learned so, so much from them
[02:48:51] And, um,
[02:48:53] that's the reason for me
[02:48:54] But I think from everyone's perception
[02:48:56] I think people in the industry, after seeing Gemini 2.5
[02:49:00] probably realized
[02:49:02] that Gemini was catching up
[02:49:06] So for you
[02:49:07] that wasn't a signal for you to join Gemini, right?
[02:49:09] It wasn't a signal for me to join
[02:49:10] Then why did you join Gemini?
[02:49:12] Well, like I just said,
[02:49:13] Mainly because I wanted to accomplish something back then
[02:49:14] Actually, I wanted to have that
[02:49:16] But you know
[02:49:17] Gemini has strong people
[02:49:17] Right? Yeah, exactly
[02:49:19] It's because when they came
[02:49:22] When they approached me, they'd definitely want me to
[02:49:25] Go talk to their people, right?
[02:49:27] So from those conversations
[02:49:28] You can actually get a sense of how things are
[02:49:32] Oh, so they came to you
[02:49:34] Yeah
[02:49:34] But I think in the end it became a two-way street
[02:49:36] So, hahaha
[02:49:37] Wasn't OpenAI an option for you back then?
[02:49:39] If you wanted to leave Anthropic
[02:49:41] OpenAI was also an option at the time
[02:49:42] OpenAI should still have been stronger than Gemini
[02:49:44] In terms of momentum, right?
[02:49:46] At that time
[02:49:47] But
[02:49:48] Though back then
[02:49:49] Weren't there all those internal politics
[02:49:52] Infighting was starting to emerge
[02:49:53] I think so
[02:49:55] So OpenAI was indeed an option for me back then
[02:49:57] And of course there were also options like xAI
[02:49:59] And I think
[02:50:00] The main reason I didn't end up at OpenAI
[02:50:02] Was that I had concerns about its
[02:50:03] Culture, at least at that time
[02:50:06] I had pretty big concerns about its culture
[02:50:09] I just felt that
[02:50:13] To put it bluntly, people who actually get things done
[02:50:16] There weren't as many as at Gemini
[02:50:18] Even fewer than at Anthropic
[02:50:20] Right? I really care about that
[02:50:22] Hahaha, yeah
[02:50:24] So a sense of cultural and personal connection brought you to Gemini
[02:50:26] Yeah
[02:50:28] And then you also caught that Gemini 3 inflection point, right?
[02:50:31] Hmm
[02:50:32] Gemini 3 should have been a major turning point for them
[02:50:36] A turning point period, right?
[02:50:37] I think in terms of actual impact
[02:50:40] I think it was two things
[02:50:41] That created a major turning point for Gemini
[02:50:45] Turning it into a heavyweight
[02:50:48] player in the market
[02:50:49] The player is Nano Banana
[02:50:51] Nano Banana and Gemini 3
[02:50:52] Two things back to back, which is
[02:50:55] I think if there were only Gemini 3
[02:50:57] It probably wouldn't have had such great results
[02:50:59] Because when your market share is less than
[02:51:01] Even 10%
[02:51:03] Whether your model is slightly better or worse
[02:51:04] It just spreads too slowly
[02:51:08] But what Nano Banana did was
[02:51:10] First, it went viral in the market, it was a huge hit
[02:51:13] Then a ton of people downloaded the Gemini app
[02:51:16] And then Gemini 3 was released right after
[02:51:18] Retaining those users
[02:51:20] So
[02:51:21] Now it's become a major player
[02:51:23] I think if Gemini hadn't thrown this punch
[02:51:25] OpenAI's position would be really comfortable
[02:51:28] Its market share is so high that
[02:51:29] Whatever you do with the model
[02:51:30] It doesn't actually matter that much to them
[02:51:35] To be honest
[02:51:36] I think when ordinary people use models
[02:51:39] Their perception of the model's capabilities
[02:51:42] Is actually very, very weak
[02:51:45] Most people don't even use the o-series models
[02:51:47] Most people just use the regular
[02:51:48] ChatGPT one
[02:51:50] Right, so I think for Genimi
[02:51:52] This Nano Banana built up the user volume
[02:51:56] And then Gemini 3 retained those users
[02:51:57] Was something critical
[02:52:00] How many ChatGPT users did it actually take away?
[02:52:03] Hmm, I don't know the exact numbers now
[02:52:07] But my feeling is
[02:52:10] Gemini's market share is probably around 20%
[02:52:15] But I haven't really checked the current data carefully
[02:52:19] Looking at it with hindsight
[02:52:21] These two factors
[02:52:22] Together contributed to Gemini's challenge to OpenAI today
[02:52:25] So from an insider's perspective you must have known earlier
[02:52:28] What happened and why
[02:52:30] Google would undergo such changes
[02:52:32] Yeah, I think
[02:52:34] First of all, Google's technical reserves
[02:52:37] Have always been sufficient
[02:52:37] Hmm, enough talent
[02:52:38] Yeah, they've always been sufficient
[02:52:40] And then
[02:52:42] Organizationally speaking
[02:52:42] It became increasingly clear later on
[02:52:44] It's having a better framework to let
[02:52:48] Everyone work together on this thing
[02:52:50] So there might slowly be some progress
[02:52:53] Right and then
[02:52:56] I think in a sense
[02:52:58] As an outsider
[02:53:00] In a sense
[02:53:01] I think OpenAI saved Google's life
[02:53:04] Oh because everyone used to worry
[02:53:08] This chatbot
[02:53:09] Would completely replace search
[02:53:11] Right if this really happened
[02:53:13] Google would actually be in a tough spot
[02:53:14] But fortunately
[02:53:15] OpenAI did this thing first
[02:53:19] Then made Google realize this thing is important
[02:53:21] But it didn't take this thing all the way
[02:53:23] Didn't take this thing to the extreme
[02:53:25] Didn't completely kill off search
[02:53:28] Maybe just ate some market share
[02:53:30] As a result
[02:53:31] Let Google itself catch up on chatbots too
[02:53:34] Now the one in a tough spot is them
[02:53:37] What if
[02:53:38] For example there's a company, just hypothetically
[02:53:41] In a fictional world
[02:53:42] A company not only made a chatbot
[02:53:44] But also marched forward triumphantly
[02:53:46] Doing better and better
[02:53:47] Really just ate up your search in one go
[02:53:50] Completely didn't give you a chance to fight back
[02:53:51] Then it would be very tough
[02:53:53] Did the chatbot not eat up search
[02:53:54] Because OpenAI didn't do it well
[02:53:56] Or why
[02:53:57] Or because it can't kill off search
[02:53:59] I think
[02:54:00] Both sides actually have reasons
[02:54:01] That is first um
[02:54:03] Current chatbot interaction methods
[02:54:05] Actually won't completely eat up search
[02:54:08] Because it's stronger than search
[02:54:10] Like we said earliest just now
[02:54:12] The one point it's stronger than search
[02:54:13] Is that it has strong interactivity
[02:54:15] You can follow up
[02:54:16] And
[02:54:18] It can help you condense some very complex information
[02:54:21] This is where it's very strong
[02:54:22] So this portion of usage scenarios
[02:54:24] It will indeed steal people from search but
[02:54:27] There are still some very stupid scenarios in search
[02:54:28] Where you have a very simple thing
[02:54:31] You don't want to waste this time
[02:54:32] On a chatbot
[02:54:33] Like I just
[02:54:35] I just search buy rice
[02:54:37] I search buy and it's done
[02:54:39] Just
[02:54:40] Do I have to ask ChatGPT
[02:54:41] Do I have to ask which one is good
[02:54:43] And it's still spinning there
[02:54:44] Spinning for half a day
[02:54:46] Then gives you a link
[02:54:46] You click again
[02:54:47] Then go to the webpage to buy
[02:54:48] Right there's no need for that
[02:54:49] So from actual usage
[02:54:52] Its current form
[02:54:53] Is not enough to completely eat up search
[02:54:55] Right and
[02:54:58] Of course from another perspective
[02:55:00] It might not have reached the peak in the chatbot thing either
[02:55:01] It really let Google catch up
[02:55:03] Now it's not quite caught up yet
[02:55:06] In terms of product
[02:55:08] I think in terms of product it's not caught up
[02:55:09] But in terms of model it has already caught up
[02:55:11] But if you want investors to invest in OpenAI
[02:55:15] They would say
[02:55:18] When they placed their bet
[02:55:19] They recognized clearly
[02:55:21] OpenAI is actually a product company
[02:55:22] Its moat is actually product and brand
[02:55:24] Then from today's perspective
[02:55:27] It seems Google hasn't been able to in this matter
[02:55:29] Catch up
[02:55:32] Can't say surpass OpenAI
[02:55:34] Catch up to OpenAI
[02:55:35] Right I think
[02:55:37] This is actually
[02:55:39] Anyway this is all from my perspective as an outsider
[02:55:41] An observer's perspective
[02:55:43] You're a commentator today
[02:55:44] Hahaha
[02:55:45] From an observer's perspective
[02:55:47] I think Google has traditionally been a bit slow with products
[02:55:49] Has always been relatively slow
[02:55:53] And so
[02:55:55] 然后所以
[02:55:57] Do you think OpenAI has an advantage when it comes to products?
[02:55:59] I think it's possible.
[02:56:01] Right.
[02:56:01] And what's one thing Google is particularly good at?
[02:56:04] Finding an extremely simple product form.
[02:56:08] Everyone looks the same.
[02:56:10] Then it just competes with you relentlessly on technology.
[02:56:13] And you can't outcompete it.
[02:56:15] Oh, right.
[02:56:16] That's exactly what Google is good at.
[02:56:18] Because search engines are exactly like that.
[02:56:21] Search is a classic example.
[02:56:22] Everyone has the same search box.
[02:56:24] One button, but it just searches faster than you.
[02:56:26] And more accurately than you.
[02:56:26] There's nothing you can do about it.
[02:56:28] Mm-hmm.
[02:56:30] So that's why.
[02:56:31] Like.
[02:56:33] It feels like all along.
[02:56:35] Google has been in this state of doing very well, but...
[02:56:40] Wall Street never really bought into it.
[02:56:42] Everyone always wondered where this company's moat really is.
[02:56:45] There's no product ingenuity.
[02:56:47] No retention mechanisms either.
[02:56:49] But it has survived until now.
[02:56:51] So what's the reason its technology is so good?
[02:56:53] I think it's still about the people, right?
[02:56:54] I think it's the culture.
[02:56:56] It's said to be.
[02:56:58] A place that particularly, particularly values.
[02:56:59] In the past, it particularly valued engineers.
[02:57:01] Later, it particularly valued research.
[02:57:03] That's the kind of culture.
[02:57:05] So it's very well suited for.
[02:57:05] Products where technological capability spills over.
[02:57:07] Capability-based products.
[02:57:09] Right, if you look at it from this angle.
[02:57:11] Then do you think OpenAI's position is secure?
[02:57:13] Now?
[02:57:15] I don't think anyone's position is secure right now.
[02:57:16] Hahahaha, right.
[02:57:19] I think the form of AI.
[02:57:23] Still has a long way to go.
[02:57:25] Mm-hmm.
[02:57:26] We're not at any endgame yet.
[02:57:29] That's the feeling about this.
[02:57:32] Right.
[02:57:33] It feels like back home there's already a bit of this sentiment.
[02:57:36] Yeah, I don't get it.
[02:57:36] Like, why don't I get it?
[02:57:38] I'm really puzzled.
[02:57:38] Like.
[02:57:39] So back home, people think we're fighting over a super app.
[02:57:42] A super app is zero-sum, right?
[02:57:44] I think conditioned on the chatbot thing (taking the chatbot
[02:57:49] as the condition to build on) that's the super app.
[02:57:51] Then maybe there's something to fight over.
[02:57:54] But the problem is.
[02:57:55] Is this form the super app form?
[02:57:58] What if someone else.
[02:57:59] Comes out with a completely different form one day.
[02:58:01] And your functionality becomes a subset.
[02:58:04] Of that thing.
[02:58:05] That's quite possible, right?
[02:58:06] I don't think there's anything.
[02:58:08] I don't see anything impossible.
[02:58:11] Why wouldn't the chatbot be the ultimate form?
[02:58:13] But after all these years, this is all we've seen.
[02:58:16] Right, it's all just a chat box.
[02:58:18] I think on this matter, I really don't have any.
[02:58:21] Rational or quantitative criteria.
[02:58:24] To explain it.
[02:58:24] More like you just feel like this whole thing is stupid.
[02:58:27] Like this model clearly has so many capabilities.
[02:58:30] But the way we use it is a chatbot (Note: This video was recorded over 2 months ago, when the agent paradigm was not yet clear).
[02:58:32] It just doesn't quite make sense.
[02:58:34] You know what I mean, so.
[02:58:35] We need a product manager.
[02:58:36] To unlock the model's capabilities.
[02:58:39] Hahaha.
[02:58:41] Humans have only communicated with AI through chatbots until now.
[02:58:44] That seems stupid to you, right?
[02:58:45] It's stupid because.
[02:58:46] Then what should we use to communicate with AI?
[02:58:48] Haven't figured it out.
[02:58:48] If I had figured it out, I'd already be doing it.
[02:58:50] Hahahaha.
[02:58:53] Hey, you didn't tell me.
[02:58:53] What exactly changed inside Google.
[02:58:55] To lead to what the outside world saw.
[02:58:57] The rapid leap in model capabilities.
[02:58:59] Right, like I just said, it's one thing.
[02:59:00] I think the organization has more clarity now.
[02:59:01] And.
[02:59:03] Once the organization is clear.
[02:59:04] Did the organization change?
[02:59:06] Right.
[02:59:06] Especially pre-training.
[02:59:08] Has become very, very clear now.
[02:59:09] That is who is responsible for what
[02:59:12] And every point
[02:59:13] Who is the responsible person at every node
[02:59:15] These things are very clear
[02:59:16] Was it chaotic before
[02:59:17] It was very chaotic in the earliest days
[02:59:19] I wasn't there in the earliest days
[02:59:21] But according to colleagues
[02:59:22] Based on colleagues' or people I knew's descriptions
[02:59:25] It was still more chaotic before
[02:59:26] Mm-hmm right right
[02:59:27] And now
[02:59:28] At least pre-training has also become very very clear
[02:59:30] And plus
[02:59:31] This Google
[02:59:32] Has always had
[02:59:33] This relatively strong technical background
[02:59:36] And it does things relatively systematically
[02:59:37] So I feel
[02:59:39] Pre-training at Google
[02:59:40] Is a very very controllable thing
[02:59:42] Mm-hmm predictable thing
[02:59:43] You can
[02:59:45] You can know
[02:59:47] The next generation won't be bad
[02:59:50] Oh you might even know how good it will be
[02:59:53] Through Anthropic's top-down management it also
[02:59:56] Mm-hmm not bad
[02:59:58] Then Google is this bottom-up
[02:59:59] It's still bottom-up right
[03:00:01] It's definitely more top-down than before
[03:00:04] Compared to the earliest days
[03:00:05] But compared to Anthropic
[03:00:07] It's still more bottom-up
[03:00:08] Like different cultures can both work
[03:00:11] Right right
[03:00:12] For model training
[03:00:13] Right that's
[03:00:13] I think big companies have big company ways
[03:00:15] Startups have startup ways
[03:00:17] So big companies are
[03:00:17] You also just said
[03:00:18] It's a completely different narrative
[03:00:19] It's a different
[03:00:22] Method, what is Google's method
[03:00:23] Now I think Google more says
[03:00:27] Like this kind of relatively deterministic thing
[03:00:28] Like pre-training
[03:00:29] Is already a relatively deterministic paradigm
[03:00:31] Then maybe Google will be more like
[03:00:33] Making it into an engineering project
[03:00:35] Google's engineering management ability is very strong
[03:00:38] So it can slowly do it well
[03:00:40] Mm-hmm what is an engineering project
[03:00:41] Engineering project means
[03:00:43] You are actually
[03:00:45] Actually very very
[03:00:46] Very top-down organization
[03:00:47] And very clear
[03:00:49] What we need to do in the next stage
[03:00:51] Then go do this thing
[03:00:53] What nodes need to be handled in between
[03:00:56] And even doing research is like
[03:00:59] Having a very clear framework
[03:01:01] Telling you how to
[03:01:03] Verify whether your results are good or bad
[03:01:05] Evaluate whether your results are good or bad
[03:01:07] Right so this is
[03:01:08] Something Google is very strong at
[03:01:10] In any big engineering project in the past
[03:01:13] So pre-training
[03:01:15] Actually I think has now entered
[03:01:17] Google's comfort zone
[03:01:19] And
[03:01:20] Post-training of course has more uncertainty
[03:01:22] Then maybe post-training currently
[03:01:23] Is still more bottom-up
[03:01:25] Everyone can try more broadly
[03:01:28] You say pre-train is also a kind of RL
[03:01:30] Why do you say that
[03:01:31] I think it's
[03:01:33] It's hard to say from a pure technical perspective
[03:01:36] Pre-train is pre-training
[03:01:38] Or supervised learning
[03:01:39] What is the essential difference between SFT and RL
[03:01:43] Because pre-training and SFT
[03:01:45] Of course pre-training and SFT are essentially not that different
[03:01:46] That is
[03:01:47] You just take the data you get
[03:01:50] As your ground truth
[03:01:52] Then you treat that as your expert
[03:01:55] Treat that as your expert output
[03:01:56] Then you align toward the distribution of that expert output
[03:02:00] Reinforcement learning might be a broader level
[03:02:03] One level, it's saying first this
[03:02:07] This original output
[03:02:09] Is also not a given expert
[03:02:10] But something I produced myself
[03:02:12] And among them there are good results
[03:02:14] And also bad results
[03:02:15] So you want good results to move closer to that
[03:02:16] and bad results to move away from it, something like that
[03:02:18] So in a sense
[03:02:20] pre-training and SFT are a subset of reinforcement learning
[03:02:24] But
[03:02:25] these two things do, in this era
[03:02:27] have their differences
[03:02:28] Of course, for me
[03:02:29] the biggest difference lies in the data
[03:02:32] For pre-training data
[03:02:35] what matters more is having a good distribution
[03:02:37] The distribution needs to be broad enough, or aligned well enough
[03:02:40] with the scope you want to cover
[03:02:43] But data quality
[03:02:45] doesn't need to be extremely high
[03:02:47] But for post-training
[03:02:48] it's the opposite
[03:02:49] In terms of distribution, it may be much narrower
[03:02:53] But for the data it does have
[03:02:55] the quality requirements are very high
[03:02:56] Yeah, right
[03:02:56] So for now
[03:02:58] for me
[03:02:58] the most fundamental difference between the two
[03:02:59] is still in the data distribution
[03:03:01] rather than in algorithms or training paradigms
[03:03:04] So how do different labs
[03:03:05] organize these teams?
[03:03:07] Are pre-training and post-training different?
[03:03:08] Or are they the same?
[03:03:09] Anthropic
[03:03:11] and Google are pretty similar
[03:03:12] Both of them
[03:03:13] have one team for pre-training
[03:03:15] and another team for post-training
[03:03:19] OpenAI might be more chaotic
[03:03:23] In the early days
[03:03:26] initially they had three teams
[03:03:28] They had pre-training
[03:03:30] and they also had reinforcement learning
[03:03:33] the Strawberry team
[03:03:34] and they also had a post-training team
[03:03:38] And my
[03:03:39] I never worked there
[03:03:40] but my understanding is
[03:03:41] its post-training wasn't really
[03:03:43] its RL team, Strawberry
[03:03:45] and its post-training
[03:03:47] are actually what other companies call post-training and product
[03:03:50] Oh, so
[03:03:51] they might have divided it in a different way
[03:03:52] and sliced it up
[03:03:53] They treat the later stages as product work
[03:03:55] As part of it
[03:03:56] their post-training is actually intertwined with product
[03:03:58] they're building the product
[03:04:00] Is it just that the name hasn't been updated?
[03:04:03] Not entirely
[03:04:04] Because
[03:04:05] at most companies, the product team
[03:04:06] doesn't really train models anymore
[03:04:08] They mostly communicate the desired
[03:04:11] traits
[03:04:13] the model traits, to the team training the model
[03:04:15] But it seems like their post-training
[03:04:18] is in a sense its own product team
[03:04:20] but it can also train models
[03:04:22] Is that because
[03:04:23] their understanding of product is that
[03:04:24] people who train models should also build the product
[03:04:27] Yeah, yeah, possibly
[03:04:27] It could be a good thing
[03:04:28] Yeah, but their org has also changed a lot since then
[03:04:31] So I don't know what their org looks like now
[03:04:34] You guys have released several models recently
[03:04:36] and I saw you were involved in all of them
[03:04:39] Gemini 3 Deep Think
[03:04:41] Gemini 3.1 Pro
[03:04:42] Well, I think I can only say
[03:04:45] that I was fortunate to be involved
[03:04:47] Hahaha, yeah
[03:04:49] Again, it all feels like collective work
[03:04:51] Hahaha, yeah
[03:04:52] How did you become such a public figure now
[03:04:54] getting singled out and mentioned separately every time
[03:04:57] I don't get it
[03:04:58] I actually don't think it's great
[03:05:00] Every time I see it
[03:05:02] I feel like
[03:05:03] how am I going to face my colleagues in the office tomorrow
[03:05:05] Hahaha
[03:05:08] Does it feel awkward?
[03:05:09] At the office it's fine
[03:05:10] I think my colleagues are just good people
[03:05:13] Like they probably don't care too much about these things
[03:05:18] But honestly
[03:05:19] I feel like every project I've been part of
[03:05:22] whether at Google or at Anthropic
[03:05:24] It would happen even without me
[03:05:26] Would all happen the same
[03:05:27] Effect wouldn't wouldn't wouldn't
[03:05:29] Get worse
[03:05:30] I am I am
[03:05:31] I think everyone now is
[03:05:33] Everyone is a surfer
[03:05:35] Essentially it's a wave
[03:05:36] Not you the surfer
[03:05:38] Mm-hmm, is the wave AI
[03:05:40] Right it's AI
[03:05:41] This thing itself is this wave
[03:05:43] It will move forward
[03:05:44] Whether you surf this wave or not
[03:05:46] This wave will crash on shore
[03:05:48] Just that some people might surf this wave
[03:05:50] Some people might be a bit late
[03:05:52] Didn't catch the crest of the wave
[03:05:54] Okay
[03:05:54] You were fortunate to participate in these two projects
[03:05:56] What
[03:05:58] Mainly probably some some
[03:06:00] Those small details in algorithmic design
[03:06:03] Then we would
[03:06:04] Discuss together
[03:06:04] And
[03:06:06] Some
[03:06:08] Some things on the data side
[03:06:09] But things on the data side
[03:06:10] I think Might have more impact on future work
[03:06:14] Do these models have paradigm changes
[03:06:18] Mm-hmm I don't think any
[03:06:23] No change is big enough to
[03:06:25] From not knowing how to do large-scale reinforcement learning
[03:06:30] To large-scale reinforcement learning
[03:06:31] That level of change
[03:06:32] No change is big enough to that extent
[03:06:34] There are definitely some small changes
[03:06:37] Can you talk about these small changes
[03:06:39] These new models
[03:06:44] There are definitely some small changes
[03:06:49] Recently I feel models are already numb
[03:06:51] A bunch of domestic models
[03:06:53] And many foreign models too
[03:06:55] OpenAI you all
[03:06:57] Mm-hmm domestic GLM, ByteDance
[03:07:02] DeepSeek has been expected but hasn't released yet
[03:07:05] Kimi can you highlight the key points for everyone
[03:07:09] I think
[03:07:12] In a sense
[03:07:14] None are that worth paying attention to
[03:07:15] Hey what are people competing over now
[03:07:17] Feels like chaos
[03:07:20] I think some things people are competing over
[03:07:21] Actually looking at it now
[03:07:22] In this era
[03:07:23] Already not that important
[03:07:24] Because of inertia from the past
[03:07:26] Everyone would compete for first place on various Benchmarks
[03:07:29] To prove their model's basic capability is strong
[03:07:32] This thing
[03:07:33] Actually by now it has reached
[03:07:35] Public attention
[03:07:36] Those Benchmarks are somewhat maxed out
[03:07:40] Actually think about it, earliest everyone paid attention to SWE-bench
[03:07:43] Randomly everyone hit 80-something
[03:07:45] Fortunately no one exceeded 83
[03:07:47] Because recently OpenAI just released a post saying they exceeded 83
[03:07:50] Some of those problems are not well-defined
[03:07:52] Fortunately no one exceeded it
[03:07:53] Whoever exceeds it would be embarrassed
[03:07:54] Anyway
[03:07:55] And before everyone reasoned by finishing AIME then IMO
[03:08:00] After IMO what
[03:08:03] Can't think of RKGI and such
[03:08:05] Benchmark then RKGI
[03:08:06] Mm-hmm before Gemini 3
[03:08:07] Everyone probably forgot the highest
[03:08:10] At that time maybe level 10 or so
[03:08:11] And everyone was like wow
[03:08:13] Hard as climbing to heaven
[03:08:14] Then Gemini 3 made it 30-something
[03:08:17] Then Claude 4.5 or 4.6 became
[03:08:21] 4.6 should have become 60-something
[03:08:23] Then Gemini 3 Deep Think hit 80-something
[03:08:28] So this is also maxed out
[03:08:30] So now it feels like
[03:08:34] Just relying on hitting these publicly recognized model capabilities
[03:08:38] Actually doesn't have much meaning anymore
[03:08:42] And um
[03:08:45] So from this perspective
[03:08:46] I just
[03:08:47] Essentially there aren't too many key points
[03:08:50] Although everyone is releasing very fast
[03:08:52] Mm-hmm
[03:08:54] Releasing fast also shows
[03:08:55] Actually this problem has become easy
[03:08:56] For everyone
[03:08:57] Everyone knows the know-how now
[03:08:59] There are no secrets anymore
[03:09:00] Right, right
[03:09:00] It's still this, it's still that
[03:09:03] It's still that same thing
[03:09:04] The surfing theory, right
[03:09:05] It's still this
[03:09:05] The wave is moving forward
[03:09:09] What's the next goal everyone might be looking for
[03:09:14] What's the next paradigm-level change
[03:09:16] Will there still be one
[03:09:18] Ah, I think
[03:09:19] The two things I just mentioned are
[03:09:21] I think ML coding and long horizon, right
[03:09:24] And these two are
[03:09:26] I think, I think
[03:09:28] Um
[03:09:30] Yes, yes
[03:09:30] I think it might be something that hasn't reached paradigm-level change
[03:09:33] But I think it is
[03:09:35] Something very valuable for Google
[03:09:38] Because first of all, ML coding is
[03:09:41] Because
[03:09:41] Google itself is a major player in AI research
[03:09:44] And it's also the most full-stack in AI research
[03:09:46] That is Not only does it have these model training parts
[03:09:50] It also has hardware design
[03:09:52] The part connecting hardware to models
[03:09:55] If this entire system can be accelerated
[03:09:58] Or better managed
[03:10:01] That could be very valuable for this company
[03:10:03] Long horizon goes without saying
[03:10:04] Everyone knows
[03:10:04] Everyone thinks it's very important
[03:10:07] Right So I think that might be, for me
[03:10:10] Can't say it's paradigm-level
[03:10:11] Definitely not at the paradigm level
[03:10:13] But it's something I think is very valuable
[03:10:15] That needs to be able to, within the next few months
[03:10:19] Show some light at the end of the tunnel, and um
[03:10:25] I think paradigm-level
[03:10:26] Might still be those more uncertain things
[03:10:28] Like multimodal generation, that kind of thing
[03:10:31] I think there might be a hero
[03:10:33] Or a group of heroes
[03:10:35] Haha, and um, right
[03:10:38] That kind of thing might have some
[03:10:40] Um, also talked about a lot is continue learning（持续学习）
[03:10:43] What about world models
[03:10:45] I think continue learning and this kind of long horizon
[03:10:47] Just said there's no fundamental difference with long horizon
[03:10:50] Because, um
[03:10:51] Because people used to think these two things were very different
[03:10:53] It's because Continue learning changes some of the model's weights
[03:10:56] And when you do this kind of
[03:10:58] For example, like open
[03:10:59] Open source
[03:10:59] Everyone does a lot of this kind of
[03:11:01] This kind of context management（上下文管理）
[03:11:02] Doesn't change model weights
[03:11:05] But actually, if you think about it, there's no fundamental difference between these two things
[03:11:06] Because those tokens in the context
[03:11:08] Their own KV cache is also a kind of weight, isn't it
[03:11:11] So
[03:11:11] You think between these two approaches, which one can
[03:11:14] Which one will be more useful
[03:11:15] More useful in the long run
[03:11:16] I think it's unclear
[03:11:17] But essentially they
[03:11:18] Are both for doing what I just mentioned, long horizon
[03:11:20] This type of thing
[03:11:22] And world models
[03:11:26] Ten thousand people have ten thousand world models
[03:11:30] What does that mean? The definition isn't clear
[03:11:31] That is
[03:11:33] First of all, I don't know what a world model is
[03:11:36] And secondly
[03:11:36] When everyone talks about the world models they're building
[03:11:39] They might be talking about different things
[03:11:42] For example, the world model that Gemini builds might be different from
[03:11:44] For example, like
[03:11:45] Fei-Fei Li
[03:11:46] The world models they're building are not the same thing
[03:11:47] Um, sigh
[03:11:49] Describe the difference
[03:11:50] I don't particularly understand what labs like Fei-Fei Li's
[03:11:55] What these labs are doing
[03:11:56] What it's actually like
[03:11:57] But, um
[03:11:58] Gemini's world model is more of a
[03:12:02] It's a kind of end-to-end（端到端） level of training
[03:12:05] The result it wants is that I can, for example
[03:12:08] For example, video generation
[03:12:09] Is that given a description
[03:12:12] Then generate a video
[03:12:13] But the result it wants to achieve is
[03:12:14] Not only can I generate a video
[03:12:15] I am able to generate a scenario
[03:12:17] What is a scenario
[03:12:18] Scenario means I generate
[03:12:20] The state at this moment
[03:12:22] And then I can also give it a condition
[03:12:25] A condition
[03:12:25] This condition is that under this state I did some
[03:12:27] What kind of
[03:12:28] Action
[03:12:29] And then its next moment state
[03:12:30] Will become a function of my previous moment
[03:12:31] State and action
[03:12:33] And it's end-to-end training this kind of capability
[03:12:35] Right so this might be one solution
[03:12:39] And I
[03:12:40] First I don't know
[03:12:41] What result everyone ultimately wants
[03:12:42] And I also don't know what everyone's
[03:12:45] Definition of their own world model is
[03:12:47] So I think it's more of an exploratory state
[03:12:51] We haven't talked about one organization just now, xAI
[03:12:53] We just talked about Anthropic
[03:12:55] Talked about OpenAI
[03:12:56] Talked about DeepMind
[03:12:58] What about xAI
[03:13:00] xAI I don't understand haha
[03:13:02] As a commentator let's talk about it
[03:13:05] Why are they so turbulent recently
[03:13:07] I think they've always been quite turbulent
[03:13:08] Hahaha why so turbulent recently
[03:13:11] I don't know either
[03:13:13] And
[03:13:14] Actually I don't have that much contact with xAI
[03:13:18] And
[03:13:20] Some people I contacted have also left now
[03:13:22] Actually I don't know what happened to them
[03:13:24] Hahaha
[03:13:28] When you were talking about Anthropic just now
[03:13:29] You said
[03:13:30] The technical number one being able to make bets
[03:13:33] Is very important
[03:13:33] Then at Google who is this number one
[03:13:36] Who is this hero
[03:13:38] I think heroes
[03:13:41] Might be different people at different stages
[03:13:43] Mm-hmm but behind every hero there is one person
[03:13:45] Sergey Brin
[03:13:47] Google's cofounder
[03:13:49] Oh right
[03:13:50] I think ultimately many many big decisions
[03:13:56] Might not be decided by him on how to do them
[03:13:58] But in the end he has to be the one to make the final call
[03:14:00] Mm-hmm even now
[03:14:02] What about Demis Hassabis
[03:14:07] I think the person who appears more on the front lines
[03:14:09] Is Koray Kavukcuoglu
[03:14:11] Right
[03:14:12] Yes
[03:14:12] DeepMind CTO
[03:14:13] And he's now also that Google SVP
[03:14:17] Oh what is Demis responsible for
[03:14:19] I think Demis might manage more of those
[03:14:22] Things leaning toward science
[03:14:25] Like for example drug design
[03:14:27] Isomorphic Labs and such things
[03:14:28] Right right right
[03:14:30] Oh Gemini
[03:14:31] He doesn't manage much
[03:14:33] At least from my perspective
[03:14:35] The person I see more is Koray
[03:14:38] Of course it's possible that
[03:14:39] Company management matters
[03:14:40] Actually there are many parts I can't see
[03:14:43] Then I'm not clear about that
[03:14:46] You also mentioned AI is a whole system
[03:14:48] Mm-hmm
[03:14:48] What understanding do you have about how to systematically do AI
[03:14:50] Now
[03:14:53] After these two years of your work
[03:14:55] Several aspects
[03:14:56] One aspect is from the whole system perspective
[03:14:58] It needs a relatively scientific attitude
[03:15:01] That you need to clearly understand like Scaling Law
[03:15:03] You need to clearly understand
[03:15:04] What assumptions you have made
[03:15:06] And when I make a change
[03:15:08] What factors are actually related to it
[03:15:10] What factors are not related
[03:15:12] Right
[03:15:13] And this is from the organizational perspective
[03:15:14] From the people's perspective
[03:15:16] Actually requires people to be very reliable
[03:15:19] Requires very responsible people
[03:15:22] Actually every system
[03:15:25] Every evaluation framework
[03:15:26] Is very easily hacked
[03:15:28] Because you can always do something
[03:15:29] To make your metrics look very good
[03:15:32] But a trustworthy
[03:15:34] Or down-to-earth person
[03:15:35] He would actually think
[03:15:38] If the thing he did works well
[03:15:40] Is it really
[03:15:41] For example effective at large scales
[03:15:43] Did I miss some factors in between
[03:15:45] Right
[03:15:46] Actually doing things systematically
[03:15:48] Sounds like one sentence
[03:15:50] But actually doing it is very complex
[03:15:52] There are many details
[03:15:53] Many resistances
[03:15:54] It actually goes against human nature
[03:15:57] Oh
[03:15:57] Because every individual's human nature
[03:15:59] Might be to make their own things
[03:16:00] Show up better
[03:16:02] But for a company or an organization
[03:16:04] The most beneficial thing
[03:16:05] Is to make the entire company's system
[03:16:07] Very solid systematically
[03:16:08] This is actually the best for you personally
[03:16:09] Because once this system is solid
[03:16:11] You can leverage this system
[03:16:13] To produce more output
[03:16:15] But the bad thing is
[03:16:16] This system will make your individual heroism
[03:16:17] Not shine
[03:16:19] But you can rest assured that others' individual heroism
[03:16:20] Also won't shine
[03:16:25] But if you are in a system
[03:16:27] Where individual heroism can shine
[03:16:28] Then this system might
[03:16:31] Not be particularly stable
[03:16:36] Because one person leaving
[03:16:39] Might cause the entire thing to collapse
[03:16:41] For example like OpenAI
[03:16:45] You say you love to challenge difficult things
[03:16:48] But this industry seems to require
[03:16:50] Doing simple things well repeatedly
[03:16:51] Actually
[03:16:53] I think the so-called simple things
[03:16:54] Doing them well repeatedly
[03:16:56] Is actually a very difficult thing
[03:16:59] Because human nature doesn't like
[03:17:01] Doing repetitive things
[03:17:03] Because the most difficult thing in this industry
[03:17:04] Is actually doing simple things cleanly
[03:17:06] Why
[03:17:10] Because everyone can do simple things
[03:17:11] If you can't do them cleaner than others
[03:17:12] 需要研究员自己对于这个系统
[03:17:14] 怎么运作
[03:17:15] 有一个好的理解
[03:17:16] 然后以及对公司负责任才能做到
[03:17:20] 否则就是你很容易做到一件事
[03:17:22] 就是
[03:17:23] 你可能比如说你在考虑training的时候
[03:17:26] 是比别人好的
[03:17:26] 但你考虑training加sampling时候比别人差
[03:17:29] 你总可以选择你只是有training
[03:17:31] 但这就很糟糕
[03:17:33] 对
[03:17:33] 所以这个就是既需要你个人的负责任
[03:17:35] 又需要说组织所建立的这个体系里
[03:17:39] 能够能尽量的发现这些
[03:17:42] 有意的或者无意的
[03:17:43] 这种边界的事情
[03:17:45] 但是你作为个体的话
[03:17:46] 你不知道怎么样是对全局最好的呀
[03:17:50] 其实是需要
[03:17:51] 我觉得如果一个研究员做不到
[03:17:53] 对全局去考虑的话
[03:17:57] 他就不是一个好的研究员
[03:17:59] 在现在这个时代
[03:18:00] 嗯就是这个
[03:18:02] 我觉得这个
[03:18:03] 和你就是在学术界做research
[03:18:05] 是很不一样的事
[03:18:05] 哦
[03:18:06] 因为在学术界做research
[03:18:07] 本质上是一个人吃饱
[03:18:09] 全家不愁的状态
[03:18:10] 这我为我的项目负责对吧
[03:18:13] 我为我的可重复性负责
[03:18:16] 但是在一个公司里
[03:18:17] 你其实更多的时候是
[03:18:18] 我得为这个公司负责
[03:18:21] 这是两种完全不一样的心态
[03:18:22] 那你这种自觉性从哪里来的
[03:18:26] 不知道哈哈哈哈哈
[03:18:29] 我觉得我可能就是拉不下脸
[03:18:32] 哈哈哈
[03:18:33] 拉不下脸是什么
[03:18:34] 就是
[03:18:35] 你对一个公司负责任
[03:18:36] 是你和这个公司的契约的一部分
[03:18:39] 其实我觉得没什么道理不这么做
[03:18:43] 这么做是没有原因的
[03:18:47] 所以这个人英雄主义会破坏这种整体性
[03:18:51] 我觉得
[03:18:53] If you're just doing it for personal heroism
[03:18:55] and acting on that basis
[03:18:56] it's very likely to undermine the bigger picture
[03:18:59] Of course, in reality you might be very capable
[03:19:00] and you actually become a hero
[03:19:01] that's also possible
[03:19:04] Since you've also been through two organizations
[03:19:06] what kind of organization do you think is better at fostering intelligence
[03:19:08] in this era
[03:19:09] I think
[03:19:12] this is actually a
[03:19:15] very controversial topic
[03:19:17] I mean
[03:19:20] as we were just discussing
[03:19:20] different organizations
[03:19:21] some tend to be more top-down
[03:19:23] some more bottom-up
[03:19:24] so the natural question is
[03:19:25] for example which of these two types fosters more innovation
[03:19:30] The traditional view was
[03:19:31] bottom-up was a necessary condition for fostering innovation
[03:19:33] because everyone needs freedom, right
[03:19:35] only with freedom can there be innovation
[03:19:37] But purely bottom-up
[03:19:39] you find it doesn't actually work either
[03:19:40] because it just becomes chaotic
[03:19:41] That's what Google was like before
[03:19:42] Was it?
[03:19:43] Yes
[03:19:44] At least in my impression
[03:19:45] from what I understand, that's how it was
[03:19:46] It was just chaotic
[03:19:47] People didn't even know
[03:19:48] what the point of what I was doing was
[03:19:50] That might not be great either
[03:19:51] So you probably need someone
[03:19:53] or a small group
[03:19:55] who can blend these two approaches somewhat
[03:19:57] Mm-hmm
[03:19:59] That's why I think
[03:20:01] whether an organization runs well or not
[03:20:04] it looks like an organizational issue
[03:20:06] but ultimately it comes down to the tech leader
[03:20:10] Mm-hmm
[03:20:11] It's about whether this tech leader has the qualities
[03:20:12] to keep the organization running stably
[03:20:16] Because the optimal state
[03:20:18] is often the most unstable one
[03:20:20] It easily collapses toward a worse state
[03:20:23] Right, so you need a leader to control that
[03:20:26] So do you think it should always be the tech leader doing this
[03:20:28] rather than the CEO
[03:20:31] Well of course every company's CEO
[03:20:33] may have different responsibilities
[03:20:34] But there needs to be a leader
[03:20:35] I think you need at least one leader
[03:20:37] who has two qualities
[03:20:40] to be able to do this
[03:20:41] One quality is that they can fight fires themselves
[03:20:47] It's not just talking about what to do
[03:20:49] What to do
[03:20:50] but rather when something really runs into trouble
[03:20:52] they can step in and lead the team
[03:20:54] to solve the problem
[03:20:56] Of course most of the time
[03:20:57] a leader probably
[03:20:58] won't have time to do this
[03:20:59] But at least they have the capability
[03:21:01] The second important quality
[03:21:02] is that they need to understand others
[03:21:07] Even if it's something
[03:21:08] that they wouldn't do themselves
[03:21:09] they can understand
[03:21:11] why what others are doing matters
[03:21:12] They can tolerate and accommodate others
[03:21:14] That might be another quality
[03:21:18] What do you think about Google's TPU
[03:21:19] In what ways does it outperform GPUs
[03:21:22] What are its weaknesses
[03:21:23] I think
[03:21:24] From a purely hardware perspective
[03:21:26] it's hard to say which hardware is truly better or worse
[03:21:28] especially at this kind of large-scale commercial deployment
[03:21:31] Because fundamentally
[03:21:32] GPUs and TPUs
[03:21:34] In terms of usage
[03:21:35] the biggest difference, setting aside
[03:21:37] the hardware differences
[03:21:38] in terms of usage
[03:21:39] the biggest difference is
[03:21:39] GPUs have a better open-source ecosystem
[03:21:42] TPUs don't
[03:21:43] But this actually isn't an issue at large-scale commercial deployment
[03:21:44] It's not a problem
[03:21:45] Because for example, Google itself uses TPUs
[03:21:47] so naturally they'll spend time building
[03:21:49] this infrastructure
[03:21:50] And infrastructure is
[03:21:52] For example, if you're only running a thousand cards
[03:21:54] it could be a heavy burden
[03:21:56] But if you're running a cluster of hundreds of thousands of cards
[03:21:57] then building out the infrastructure
[03:21:58] isn't really that big of a deal
[03:22:01] And in practice
[03:22:02] So basically
[03:22:03] when it comes to large-scale commercial deployment
[03:22:05] neither one is inherently superior or inferior
[03:22:07] But these two do
[03:22:09] have some differences in design philosophy
[03:22:12] Take GPUs, for example
[03:22:13] At least for the more recent GPU generations
[03:22:16] I haven't used them much
[03:22:17] Like the Hopper generation of GPUs
[03:22:19] The H-series GPUs
[03:22:20] The design philosophy is that
[03:22:22] inside one pod (node)
[03:22:23] there might not be that many cards
[03:22:24] say, just eight cards
[03:22:25] and these eight cards can all interconnect with one another
[03:22:27] NVLink (NVIDIA's high-speed interconnect bus) is extremely fast
[03:22:28] So within one pod, there's basically
[03:22:30] no communication bandwidth bottleneck (insufficient bandwidth between GPUs)
[03:22:33] But TPUs take the opposite approach
[03:22:34] It means that
[03:22:35] they've abandoned pairwise interconnection between cards
[03:22:38] but they try as much as possible to
[03:22:39] fit as many cards as possible
[03:22:40] into one big rack
[03:22:42] It has this kind of
[03:22:45] 3D Torus design (3D Torus topology design)
[03:22:47] So each card
[03:22:48] only connects to its three nearest neighbors in three directions
[03:22:50] but the entire cluster can be connected into one big
[03:22:52] Torus
[03:22:53] And if your compilers (compilers)
[03:22:56] or your sharding (data sharding strategy)
[03:22:57] logic is written well enough
[03:23:00] you can take advantage of this architecture
[03:23:02] Effectively speaking
[03:23:04] you get more memory capacity
[03:23:07] and also reduce a lot of communication bounds
[03:23:12] What's the downside?
[03:23:14] I think one downside is that
[03:23:16] compared to GPUs, it definitely
[03:23:19] at least at a small scale
[03:23:21] is more
[03:23:24] of a rigid structure
[03:23:26] So its ease of use
[03:23:28] or its general versatility might not be as strong
[03:23:33] Recently many neo labs have emerged in Silicon Valley
[03:23:35] What do you think of this trend?
[03:23:36] Why are they all leaving
[03:23:38] jumping ship from these big model companies
[03:23:40] to start neo labs
[03:23:41] I don't really get it
[03:23:42] Haha, my feeling is that
[03:23:45] the vast majority of neo labs will die. And
[03:23:50] Well, I think
[03:23:53] some labs genuinely have good people
[03:23:55] And some labs
[03:23:56] might actually be starting to do some real work
[03:23:57] For example, like Thinking Machines
[03:23:59] is still delivering some new things
[03:24:01] But some neo labs
[03:24:04] Please bleep out the names
[03:24:05] Haha, like XXX, that XXX
[03:24:09] I have absolutely no idea what they're trying to do
[03:24:11] And
[03:24:11] These two have actually been away from the field for a long time
[03:24:15] I think in 2026
[03:24:16] China will place a lot of emphasis on the consumer-side narrative
[03:24:19] Who becomes that super app
[03:24:21] What do you think?
[03:24:22] Do you think this
[03:24:22] It seems like nobody in Silicon Valley talks about this
[03:24:25] Right, because American enterprise is just...
[03:24:29] It's companies
[03:24:30] Or rather, the productivity software market is just too big
[03:24:33] and the profit margins are too high
[03:24:36] So for the US
[03:24:37] there was basically only ChatGPT doing consumer before
[03:24:40] and there wasn't much money in it
[03:24:41] Not much profit
[03:24:43] So
[03:24:44] now everyone will probably focus first on
[03:24:46] productivity software
[03:24:46] or enterprise
[03:24:48] And
[03:24:49] So the trends in China and the US have already diverged
[03:24:51] I think
[03:24:52] Not just AI
[03:24:53] The entire internet industry in the past was like this too
[03:24:55] It was all different
[03:24:56] What China is really strong at is the consumer side
[03:24:58] It can come up with, like
[03:25:00] really, really complex product features
[03:25:03] or structures
[03:25:04] and in a way that seems very indirect to you
[03:25:07] In a very unnatural way
[03:25:09] to snowball that profit
[03:25:10] For example
[03:25:11] What do I mean by indirect?
[03:25:12] (laughs)
[03:25:12] Like, take something like Douyin (TikTok)
[03:25:16] It's not like
[03:25:17] you watch a video
[03:25:18] and I charge you 20 cents per video, right?
[03:25:21] It says you can watch videos for free
[03:25:23] but I can quietly slip in ads
[03:25:24] I can quietly do live streaming
[03:25:25] I can quietly do e-commerce
[03:25:28] But that doesn't work for productivity software
[03:25:30] Productivity software is very straightforward
[03:25:32] Like, I help you write code
[03:25:35] My cost is 150 a month
[03:25:36] I sell it to you for 200, I make 50
[03:25:38] It's that straightforward
[03:25:39] Mm
[03:25:40] Yeah, I think what the US has shown in the past
[03:25:43] is that with these very straightforward products
[03:25:45] it can push technology to the extreme
[03:25:47] But there's never been
[03:25:48] a product that felt so sophisticated
[03:25:51] that you can't live without it
[03:25:54] yet you don't feel like it's taking your money
[03:25:56] but it's actually making money from you
[03:26:01] Hearing you say that, I suddenly feel Meta should just copy ByteDance
[03:26:05] Yeah, but I don't think Meta is as strong as ByteDance
[03:26:06] Because Meta can't find its own niche either
[03:26:09] And
[03:26:10] there's no American company doing this
[03:26:13] No one has found the niche that Doubao occupies
[03:26:15] Then Meta should just copy Doubao
[03:26:17] It doesn't need such strong model capabilities either
[03:26:21] But I still think the Americans making products
[03:26:23] fundamentally, the people doing consumer products aren't good enough
[03:26:26] Far behind China
[03:26:29] This is the accumulation of the past decade, right?
[03:26:31] Yeah
[03:26:32] Mm
[03:26:33] Because the positive feedback loop in the US over the past decade
[03:26:36] all came from doing B2B
[03:26:37] A lot of enterprise stuff
[03:26:39] Or it's just too easy to make money in the US
[03:26:42] Mm
[03:26:42] When it's too easy to make money
[03:26:43] you won't rack your brains over how to make money
[03:26:46] Hey
[03:26:46] Haven't a lot of people come to chat with you?
[03:26:48] Any interesting people?
[03:26:51] Oh, well A lot of people from China came
[03:26:53] Tech companies
[03:26:54] I think they're all pretty interesting
[03:26:58] And I did find that Chinese people doing products
[03:27:01] probably think in more sophisticated ways
[03:27:04] More sophisticated
[03:27:05] Yeah, they think more...
[03:27:07] Their thought process is more convoluted
[03:27:09] Yeah, it's a completely different style from the US
[03:27:11] America is like
[03:27:12] As I just said about America
[03:27:14] It's like
[03:27:15] you build something and sell it directly
[03:27:16] Yeah, it's simple
[03:27:18] That's how it is
[03:27:20] You just need this capability
[03:27:23] Once you have it, you just need to be cheaper than others
[03:27:25] Then I can earn more than you
[03:27:28] And you can't do anything about it
[03:27:29] Okay
[03:27:30] What about China?
[03:27:31] China seems to be all about this pattern
[03:27:32] Not making money at first
[03:27:35] But once it starts making money
[03:27:36] you can't stop it
[03:27:37] It's just that
[03:27:38] it can really form that
[03:27:40] that self-sustaining
[03:27:41] that loop
[03:27:43] When it really gets that flywheel spinning
[03:27:44] you can't break in anymore
[03:27:47] Do you think American companies
[03:27:48] understand ByteDance now?
[03:27:51] My feeling is no
[03:27:52] Not yet
[03:27:54] It's already so big
[03:27:56] Oh, you mean whether they take it seriously?
[03:27:58] Of course they do
[03:27:59] Everyone definitely knows
[03:28:00] ByteDance is a severely undervalued
[03:28:02] In terms of its valuation
[03:28:04] It's a severely undervalued company
[03:28:05] I think that's very clear to everyone
[03:28:07] And
[03:28:09] I think it's also clear
[03:28:10] that in the consumer market
[03:28:12] On this end, I actually think
[03:28:13] No American company can compete with ByteDance
[03:28:18] But after all it's a Chinese company
[03:28:22] At least in terms of public perception
[03:28:24] After all it's a Chinese company
[03:28:27] So do people understand it
[03:28:29] I don't think people understand it
[03:28:31] But look at Meta
[03:28:31] It's also actively poaching people from ByteDance
[03:28:34] Mm-hmm, do you have any idols in the AI industry
[03:28:37] Or people you admire
[03:28:40] Although you've been in the AI industry for a short time
[03:28:42] No no no, nothing
[03:28:43] I just feel
[03:28:47] When I came to this industry
[03:28:49] The era of individual heroism had already passed
[03:28:52] So there are no heroes
[03:28:54] Sometimes you even think old-era heroes are a bit stupid
[03:28:57] Ah, right
[03:28:58] So really there's nothing
[03:29:01] Who do you think is quite stupid
[03:29:03] Let's not talk about this
[03:29:04] No comment hahahaha
[03:29:08] Right, I think it's
[03:29:10] Different from doing physics
[03:29:11] I think when doing physics
[03:29:12] There were still some
[03:29:13] People I think really much smarter than me
[03:29:17] Like me
[03:29:18] When I was doing my PhD my young advisor was
[03:29:20] I think he, Douglas Stanford
[03:29:21] I think he's just much smarter than me
[03:29:24] I think he
[03:29:26] Maybe also seeing him
[03:29:27] Made me feel in that field
[03:29:29] Not very useful
[03:29:30] With him around what do they need me for
[03:29:31] Right, haha
[03:29:33] You came to AI to do a dimensionality reduction attack right
[03:29:34] Not a dimensionality reduction attack
[03:29:35] But anyway it feels like AI this thing
[03:29:36] Doesn't really need brains
[03:29:41] Really doesn't need brains
[03:29:42] Then what does it need
[03:29:43] I think this
[03:29:43] The most important trait in this industry
[03:29:46] Is being reliable
[03:29:48] Doing things carefully
[03:29:50] And being responsible for what you do
[03:29:51] This is the most important trait
[03:29:53] You say how much brains those things need, I think
[03:30:00] They're all things undergraduates can do
[03:30:04] But you say AI has no individual heroism
[03:30:06] Now an AI researcher is priced so high
[03:30:09] Like a star player transfer
[03:30:10] I don't know if it's a good thing or bad thing
[03:30:13] For me personally
[03:30:13] Of course I'm very happy
[03:30:15] I benefit from this
[03:30:16] Right hehehe
[03:30:18] But um
[03:30:20] Actually speaking
[03:30:21] I don't know if this thing
[03:30:24] Is a good thing
[03:30:25] Why do you think the price has become so high
[03:30:27] I think maybe on one hand
[03:30:29] Everyone thinks this thing is scarce
[03:30:32] But actually it might not be that scarce
[03:30:34] Because training a person
[03:30:36] Although this thing isn't that hard
[03:30:37] But training a person requires an environment
[03:30:39] You need to have that opportunity to be exposed to this thing
[03:30:42] To learn this thing
[03:30:44] Without that opportunity
[03:30:44] No matter how smart you are it's useless
[03:30:46] Maybe in the past people who could encounter that opportunity
[03:30:49] Weren't that many
[03:30:51] So in the market it might be relatively scarce
[03:30:54] From this perspective
[03:30:54] Mm-hmm
[03:30:56] But I think another aspect is also
[03:30:58] Maybe the hype about people is a bit excessive
[03:31:01] Right
[03:31:02] Really like to mythologize individuals
[03:31:03] Now
[03:31:04] Right I think
[03:31:06] Really
[03:31:08] Just say it again
[03:31:10] This is a collectivist thing haha
[03:31:13] Then many people are also very curious
[03:31:15] Because
[03:31:17] Maybe many companies also want to recruit AI people
[03:31:21] Then you think the most important thing is still being reliable
[03:31:24] What metrics are there for this
[03:31:25] How can you quickly judge whether a person is reliable
[03:31:28] Whether they do things carefully
[03:31:29] Everyone has some methods they use to measure
[03:31:33] I of course also have some of my own tricks
[03:31:36] It's just that I
[03:31:37] I used to design an interview question
[03:31:41] Let me briefly explain it
[03:31:42] This
[03:31:43] It shouldnt be confidential
[03:31:44] So I should be able to talk about it
[03:31:45] Um
[03:31:46] So the interview question is actually quite simple
[03:31:47] I need this person to, within 24 hours,
[03:31:51] complete a reinforcement learning project
[03:31:55] from scratch
[03:31:57] They have to choose on their own
[03:31:59] what kind of model
[03:32:00] I tell them what resources are available
[03:32:01] and they choose what model to use
[03:32:03] what data to use
[03:32:04] what algorithm to use
[03:32:05] and train the model
[03:32:07] Within 24 hours
[03:32:08] I give them 24 hours to get this done
[03:32:11] And after the 24 hours are up
[03:32:12] they'll have a one-hour discussion with me
[03:32:15] So this thing
[03:32:16] isn't that hard in the AI era
[03:32:19] Without AI
[03:32:20] this would be impossible
[03:32:20] No one could do it in 24 hours
[03:32:22] But with AI, it's actually quite easy
[03:32:23] Because AI can do the whole thing for you
[03:32:25] But why still do this?
[03:32:26] There are two reasons
[03:32:27] There are many reasons
[03:32:28] Among them
[03:32:29] Two reasons why it was designed this way
[03:32:31] One reason is that I think in this era, evaluating someone
[03:32:35] like whether they write good code
[03:32:37] is actually useless
[03:32:38] Because most people don't need to write code themselves anymore
[03:32:41] What's more important is
[03:32:44] whether they can effectively leverage AI
[03:32:46] So that's one aspect of evaluating this
[03:32:48] The second aspect is that there's a trap here
[03:32:51] If you let AI do everything
[03:32:53] but you don't really try to understand
[03:32:54] what AI did for you
[03:32:56] you'll be exposed during that one-hour discussion
[03:32:59] That's a
[03:33:00] That's where people fail
[03:33:02] So the other thing this tests
[03:33:04] is whether you've truly formed a collaboration with AI
[03:33:06] Or if you just completely handed it off
[03:33:08] That's something I personally value very much
[03:33:11] That also
[03:33:12] reflects whether this person
[03:33:13] is someone reliable
[03:33:15] Of course, this
[03:33:17] The design of this question itself
[03:33:18] also has some rather dark cleverness to it
[03:33:21] Like why it was designed as 24 hours
[03:33:23] is to see how much this person values this opportunity
[03:33:27] Can they stay up all night
[03:33:27] Right, hahaha
[03:33:29] If they're willing to pull an all-nighter
[03:33:30] they can survive these 24 hours
[03:33:32] If they can't make it
[03:33:34] then it just means
[03:33:34] they probably don't value this opportunity that much
[03:33:36] Haha
[03:33:39] So for people younger than you
[03:33:40] Do you think AI is still
[03:33:42] a blue ocean
[03:33:45] a place with lots of opportunities
[03:33:46] I think purely working on language models
[03:33:48] is no longer a blue ocean
[03:33:50] I think it's too late — the last train has already left
[03:33:53] The last train has already left
[03:33:54] Which last train is that?
[03:33:57] I feel like I got in on that last train
[03:34:00] And there might have been some people after I got in
[03:34:02] some new people
[03:34:03] But I think they won't have the opportunity
[03:34:05] to encounter such good opportunities
[03:34:06] Like being able to
[03:34:08] do something in a relatively small team
[03:34:10] Chances to encounter such opportunities will be rare
[03:34:12] Right, and then
[03:34:14] But I think AI
[03:34:15] is a very vast field
[03:34:18] Language models are just a tiny, tiny part of it
[03:34:20] A very small part
[03:34:21] There are many other things
[03:34:22] Like the multimodal generation we just mentioned
[03:34:24] There may still be many opportunities there
[03:34:26] Robotics probably has even more opportunities
[03:34:28] And even more extreme, there's
[03:34:30] like whether you can use AI
[03:34:32] to help with real scientific problems
[03:34:34] Like helping with
[03:34:37] quantum control and things like that
[03:34:38] Then it might be more blue ocean
[03:34:40] Those are all blue sky things
[03:34:42] Right so
[03:34:43] I think for
[03:34:47] People young enough
[03:34:48] Maybe doing the hottest thing right now
[03:34:50] Is not the right choice
[03:34:52] Doing things no one has done now
[03:34:54] Might be more of a good choice
[03:34:56] Right
[03:34:57] How will you develop in the future
[03:34:59] Will you be at Google for a long time
[03:35:03] I think probably not
[03:35:04] Hahahaha
[03:35:05] Saying this so publicly
[03:35:06] I think probably not
[03:35:10] I think I will still try to challenge myself
[03:35:14] Right and
[03:35:15] Need to torture myself
[03:35:16] Right need to torture myself
[03:35:17] But
[03:35:18] I just might need to find something
[03:35:19] Worth torturing myself for
[03:35:22] If AI is not fundamentally difficult
[03:35:24] Won't you find it boring
[03:35:25] Where is your challenge
[03:35:27] Although it's not difficult
[03:35:28] But knowing and not knowing
[03:35:29] There is still a gap
[03:35:33] From completely not knowing the details
[03:35:36] To slowly understanding the details
[03:35:38] Understanding how it works and such
[03:35:40] These things
[03:35:40] I think still require spending time and effort
[03:35:43] And after you understand
[03:35:43] I think this thing will also be helpful for your future
[03:35:47] Like whether you do product related
[03:35:48] Or develop toward other AI directions
[03:35:51] I think all
[03:35:52] In the long term
[03:35:53] Will be helpful
[03:35:54] Where do you want to develop in the future
[03:35:57] I think anything is possible
[03:35:58] Haha haven't figured out how to torture myself
[03:36:01] You probably won't jump to another big company again
[03:36:04] Probably not
[03:36:05] Mm-hmm
[03:36:06] What differences do you feel between what you learned at Anthropic
[03:36:08] And what you learned at Google DeepMind
[03:36:11] I think they're quite different
[03:36:12] I think Anthropic
[03:36:13] Is where you can understand one thing
[03:36:16] One line, language model
[03:36:17] Every aspect of this line very thoroughly
[03:36:21] It gives you that opportunity
[03:36:24] And at Google
[03:36:24] It's more horizontal
[03:36:26] It has many different aspects
[03:36:28] Many different people
[03:36:29] And you can also see different perspectives
[03:36:30] Also see different research directions
[03:36:33] You can see all of them
[03:36:35] Right
[03:36:35] Anthropic is because it bets firmly enough
[03:36:38] So you can understand more vertically
[03:36:41] Right
[03:36:42] Have you thought about using AI to solve physics problems
[03:36:45] (Your theoretical physics) Someone is doing it
[03:36:48] So I don't think I need to do it haha
[03:36:50] You don't have essential interest in this
[03:36:51] I think this thing
[03:36:52] First
[03:36:53] Currently it's not the highest priority for me
[03:36:57] I think if one day
[03:36:58] I think I solve the highest priority thing on my hands
[03:37:01] And haven't found anything else to do
[03:37:02] I might go do this thing
[03:37:03] What is your highest priority now
[03:37:05] My highest priority now is
[03:37:06] To push the two things I just mentioned
[03:37:07] Oh ML coding and long horizon
[03:37:11] To at least a
[03:37:14] Where colleagues can
[03:37:15] Push it to a relatively
[03:37:16] I think relatively stable state
[03:37:18] That I think is my highest priority
[03:37:21] Of course there might be other priorities later
[03:37:22] But
[03:37:24] Using AI to do physics
[03:37:25] I think is something
[03:37:27] Many people are already trying to do
[03:37:29] One more of me is not too many
[03:37:32] One less of me is not too few
[03:37:33] Might as well let others do it first
[03:37:34] Do you have any physicists you particularly admire
[03:37:37] Not really
[03:37:37] Yes, but there are quite a few
[03:37:40] Don't know where to start
[03:37:41] Hahahaha
[03:37:42] Physicists yes
[03:37:43] AI scientists
[03:37:45] No
[03:37:47] But this is related to a person's growth experience
[03:37:49] I think
[03:37:50] Like
[03:37:51] An adult finds it hard to truly worship a person
[03:37:54] A child might
[03:37:57] Who have you worshipped
[03:37:58] I think in physics
[03:38:00] Actually there are many who are really quite strong
[03:38:05] But those everyone talks about
[03:38:07] People from 100 years ago let's not talk about
[03:38:09] Like Einstein
[03:38:10] Heisenberg and such let's not talk about
[03:38:11] And including everyone later knows
[03:38:13] Like Frank Yang
[03:38:14] Chen-Ning Yang and such let's also not talk about
[03:38:16] And
[03:38:17] Like when I was doing topology before
[03:38:20] Actually there was someone who later also won the Nobel Prize
[03:38:23] That Haldane
[03:38:24] You'll find these people
[03:38:26] Have some abnormal foresight
[03:38:30] They seemed out of place in their era
[03:38:32] But look at Haldane
[03:38:34] When he first did Haldane model and these fractional
[03:38:37] Quantum Hall effect related things
[03:38:40] It was decades away from when everyone finally figured out these topological states
[03:38:42] Many decades later
[03:38:45] Mm-hmm
[03:38:45] At that time he could feel this thing was important
[03:38:47] And kept pushing this thing himself
[03:38:49] I think this is not easy
[03:38:51] Of course I think
[03:38:52] If you really want to find a similar person in AI
[03:38:53] I think maybe Geoffrey Hinton
[03:38:55] When everyone felt this thing
[03:38:58] Was optional or not that certain
[03:39:00] He kept working in this direction
[03:39:01] Then I think
[03:39:02] This might be a hero-level figure
[03:39:05] After him
[03:39:07] AI after that
[03:39:08] I think
[03:39:11] I think there might also be some heroic collectives
[03:39:13] Like for example Transformer
[03:39:15] Noam
[03:39:16] And those
[03:39:18] That
[03:39:19] Ashish
[03:39:20] Niki and them
[03:39:20] That might be a heroic collective
[03:39:23] You said something that made a very deep impression on me
[03:39:26] I don't have any mentors in this industry
[03:39:28] Don't have any old friends
[03:39:29] I can criticize whoever I want
[03:39:31] This might be the benefit of not doing AI
[03:39:33] Hahaha the benefit of not coming from AI
[03:39:35] Right, like
[03:39:38] Really have no burden
[03:39:40] No old-timer is your relative
[03:39:46] So if you think he's stupid
[03:39:47] He is stupid
[03:39:48] Can just say he's stupid directly
[03:39:49] It doesn't matter
[03:39:52] Were you like this before too
[03:39:55] I think I was quite restrained when I was a student
[03:39:57] Oh
[03:39:58] But later I found restraint useless
[03:40:02] No benefit to myself
[03:40:03] No benefit to others either
[03:40:04] Better to be more direct
[03:40:07] Expressing your own ideas is the most critical thing
[03:40:08] I think directly expressing your own ideas
[03:40:10] Is something where in the short term people will definitely hate you
[03:40:13] But in the long term everyone will appreciate
[03:40:17] Who have you heard speaking particularly stupidly recently
[03:40:19] Bleep out that name
[03:40:20] Thank you I think XXX has always been quite stupid, haha
[03:40:25] And consistently stupid, haha
[03:40:29] Could he possibly be the right person
[03:40:32] I think what he says
[03:40:35] In Pauli's words is not even wrong
[03:40:38] Because it's not well-defined
[03:40:39] It's hard to say whether what he says is right or wrong
[03:40:42] Right, like one day
[03:40:44] Maybe a different paradigm happens
[03:40:47] He can jump out and say hey
[03:40:48] I said this this this this back then
[03:40:51] But then you discover
[03:40:51] Maybe if the paradigm were another state
[03:40:53] He could also say the same thing
[03:40:55] This is why I hate this kind of very vague
[03:40:58] Very vague people
[03:41:01] Because a thing being vague is meaningless
[03:41:05] Why do you think he speaks very vaguely
[03:41:06] No correct definition
[03:41:08] Like
[03:41:09] It's kind of ambiguous
[03:41:12] If it has a proper definition
[03:41:13] I can explain why it's properly defined
[03:41:14] But if it doesn't have a proper definition
[03:41:16] I have no way to explain
[03:41:16] Why it isn't properly defined
[03:41:17] Because it really isn't properly defined
[03:41:18] Hahaha
[03:41:20] What about XXX
[03:41:22] I think at least
[03:41:23] I think XXX is still a well-defined thing
[03:41:25] Like, it's trying to do XXX
[03:41:29] And their approach might lean more toward this
[03:41:32] More traditional kind of
[03:41:34] This neural network model approach
[03:41:37] Rather than a more end-to-end approach
[03:41:41] I think at least it's well-defined
[03:41:42] As for whether it's right or wrong
[03:41:44] I think that's something the future will test
[03:41:48] Most old geezers are actually fine
[03:41:49] I think
[03:41:50] I think when people get old
[03:41:51] They don't necessarily turn into old geezers
[03:41:53] When people get old, they split into two types
[03:41:55] One type is the venerable elder
[03:41:58] They might stop nitpicking so much
[03:42:01] And actually put effort into mentoring young people
[03:42:04] The other type is the old geezer
[03:42:05] They don't know what they're talking about
[03:42:06] Yet love to nitpick and boss people around
[03:42:07] Yeah, so getting old doesn't necessarily make you an old geezer
[03:42:10] Hey, who got you all riled up
[03:42:12] I don't even know who got me riled up
[03:42:13] But I've definitely met plenty of old geezers
[03:42:15] Hahaha
[03:42:15] When did you change
[03:42:17] Like, becoming so direct when you speak
[03:42:18] You stopped holding back—you've always thought this way
[03:42:22] But you didn't say it
[03:42:23] I think in the past I might have been pretty direct too
[03:42:27] But not this direct
[03:42:29] But after getting into AI, I became even more direct
[03:42:31] So it's like nothing holding you back, right
[03:42:32] One, there's nothing holding me back
[03:42:33] Two, this field is objective enough
[03:42:36] Like
[03:42:37] You don't really have to worry too much
[03:42:39] About offending people with your opinions
[03:42:40] As long as your views are internally consistent
[03:42:42] Like, you have a coherent framework for your views
[03:42:44] You're not just randomly trashing people
[03:42:45] That would definitely offend people
[03:42:47] You have your own understanding of things
[03:42:50] I think people will actually respect you for it
[03:42:53] Because ultimately, how well you do in this field
[03:42:55] Is judged by objective standards
[03:42:58] Every guest we have recommends a life-changing book
[03:43:02] It has to be a book that genuinely had a major impact on you
[03:43:05] What book would you say
[03:43:07] This is the hardest question of the day
[03:43:12] I feel like you're overestimating my cultural sophistication
[03:43:15] Hahahahaha
[03:43:18] Honestly, I don't really have a life-changing book
[03:43:21] Okay, I read a book recently
[03:43:24] Recently
[03:43:24] Last time Ji Yichao mentioned 'The Line Puppy'
[03:43:29] The book I recently read is Yukawa's autobiography
[03:43:32] Hideki Yukawa's (1949 Nobel Prize in Physics winner) autobiography
[03:43:34] 'Tabibito' (The Traveler)
[03:43:36] And then
[03:43:37] If I had to say, books that left an impression
[03:43:40] First of all, I genuinely don't like reading
[03:43:41] I feel like I'm not very well-read
[03:43:47] And the books I read
[03:43:48] Other than professional ones
[03:43:51] All feel like leisure reading to me
[03:43:53] Like Yukawa's autobiography
[03:43:55] It's essentially leisure reading too
[03:43:57] But I found it quite interesting
[03:43:59] Like
[03:44:00] You get to see
[03:44:02] A scientist who later seemed so successful
[03:44:04] Struggling in his youth
[03:44:07] Very authentic
[03:44:09] And then maybe some other leisure reads
[03:44:13] Like novels and stuff
[03:44:14] There's a novel I really like
[03:44:15] 'From the New World'—it's a Japanese novel
[03:44:19] Yeah, if you really force me to recommend some leisure reading
[03:44:22] I could recommend that one
[03:44:24] Have you watched any movies or anything lately
[03:44:28] TV shows, or played any games
[03:44:32] Nothing at all
[03:44:33] Hahaha
[03:44:35] A favorite food from anywhere in the world
[03:44:37] Sushi, probably
[03:44:39] A favorite place anywhere in the world
[03:44:47] I—I think if you really force me to choose
[03:44:50] I'd probably choose Hawaii
[03:44:52] Because I really love the ocean
[03:44:54] Yeah, but it's hard to say for sure
[03:44:55] Because after I visit more coastal places
[03:44:57] I might have a new favorite
[03:44:58] Hahaha
[03:44:59] Something not many people know
[03:45:00] But probably should
[03:45:05] Don't trust old timers, does that count? Hahaha
[03:45:09] Have you ever been superstitious?
[03:45:12] Hmm
[03:45:14] I
[03:45:15] I haven't, fundamentally
[03:45:16] But I think
[03:45:17] Sometimes superstition can be a way to comfort yourself
[03:45:19] I meant, have you ever been superstitious about old timers?
[03:45:21] Oh, superstitious about old timers
[03:45:22] Never?
[03:45:27] Really, never
[03:45:28] But I probably didn't hate old timers this much before
[03:45:30] Then I started hating them more and more
[03:45:31] Why?
[03:45:35] Maybe it's just that
[03:45:37] When you develop more judgment of your own
[03:45:40] Stupid people just look even stupider
[03:45:42] But they haven't hurt you
[03:45:43] So why hate them?
[03:45:44] It's just stupidity intolerance
[03:45:45] Everyone has stupidity intolerance
[03:45:47] Hey, what's your MBTI?
[03:45:48] No idea
[03:45:50] Why has there been, in recent years,
[03:45:53] I mean, among young people
[03:45:54] Toward older people
[03:45:56] Such an unfriendly term emerging?
[03:45:59] Where does it come from?
[03:46:01] No idea
[03:46:01] No, no, no
[03:46:03] Haven't looked into it
[03:46:03] Could ask Gemini
[03:46:04] Have it do a Deep Research
[03:46:06] See where the term "laodeng" comes from
[03:46:08] So what are the papers that have influenced AI progress the most, in your mind?
[03:46:11] Sequence-to-sequence is one
[03:46:13] And then that
[03:46:15] I think language models
[03:46:17] At the peak of the feature engineering era
[03:46:21] And then
[03:46:24] Scaling Laws is one
[03:46:25] The one by Jared Kaplan
[03:46:27] Their Scaling Laws paper at OpenAI is also one
[03:46:29] It's a paper that introduced this systematic research methodology
[03:46:34] Into the field
[03:46:36] A paper
[03:46:36] Of course, the actual methods in Scaling Laws
[03:46:40] May not have been exactly right
[03:46:43] But it was the first
[03:46:44] To introduce this idea
[03:46:46] I think that's crucial
[03:46:48] Based on your current understanding
[03:46:49] What's a key important bet?
[03:46:52] Long horizon (long-horizon tasks)
[03:46:53] Hahaha
[03:46:55] Our studio is called Language is World Studio
[03:46:57] When you first heard this name
[03:46:58] What were you thinking?
[03:47:00] I think this name is a bit...
[03:47:04] Too normal, too mediocre
[03:47:05] Hahahaha, fair enough, hahahaha
[03:47:10] I think this name is something that
[03:47:13] Maybe ten years ago
[03:47:15] Was a very unique perspective
[03:47:18] But now there's just too much consensus
[03:47:21] I think ten years ago it really was
[03:47:23] Maybe it's been more than ten years now
[03:47:24] Sorry, I feel like I'm getting old too
[03:47:25] Maybe it's been more than ten years
[03:47:26] Like around 2014, 2015
[03:47:30] Everyone thought vision was the most important thing
[03:47:34] Back then
[03:47:35] I think realizing
[03:47:36] That language is an important carrier of intelligence
[03:47:39] Was probably something different
[03:47:41] But I don't think our name
[03:47:43] Was meant in an AI context
[03:47:45] Hmm
[03:47:50] Hahaha
[03:47:52] Well then that's worth deep thought, hahaha

Full Transcript (Bilingual)

https://www.youtube.com/watch?v=ttkd0t5qTD4
Translation: zh-CN

[00:00] English subtitles were generated by AI and are for reference only.
英文字幕由人工智能生成，仅供参考。

[00:09] Hello everyone, I'm Xiaojun.
大家好，我是小军。

[00:11] Today our guest is Yao Shunyu, a researcher at Google DeepMind.
今天我们的嘉宾是姚舜宇，一位来自Google DeepMind的研究员。

[00:14] There are two famous Yao Shunyus in Silicon Valley.
硅谷有两个著名的姚舜宇。

[00:16] One previously worked at OpenAI, then jumped ship to Tencent to become their Chief AI Scientist.
一位之前在OpenAI工作，然后跳槽到腾讯担任首席人工智能科学家。

[00:22] He's been on our show before.
他以前上过我们的节目。

[00:23] Today I've invited the other Yao Shunyu.
今天我邀请的是另一位姚舜宇。

[00:26] He was previously at Anthropic.
他之前在Anthropic。

[00:28] Now he's at Google DeepMind.
现在他在Google DeepMind。

[00:30] We'll start by talking about the recent series of massive model changes.
我们将从讨论最近一系列大规模模型变化开始。

[00:34] So next is my interview with Shunyu.
那么接下来是我对舜宇的采访。

[00:37] Anthropic as a company.
Anthropic这家公司。

[00:38] It's able to implement this kind of relatively top-down mechanism is something quite unique.
它能够实施这种相对自上而下的机制，这一点相当独特。

[00:43] But is this difficult for other model companies?
但这对其他模型公司来说是否困难？

[00:45] Very difficult. For example, OpenAI can't do it.
非常困难。例如，OpenAI做不到。

[00:47] And Gemini also finds it difficult.
Gemini也觉得困难。

[00:49] Big companies and startups.
大公司和初创公司。

[00:51] Their strategies are fundamentally different.
它们的策略根本不同。

[00:53] Because for startups, what's important is making bets.
因为对初创公司来说，重要的是下注。

[00:57] I have to bet on something.
我必须押注于某事。

[00:59] I think everyone right now is basically
我认为现在每个人基本上

[01:01] Everyone is a surfer.
每个人都是冲浪者。

[01:02] Fundamentally it's a wave.
根本上说，它是波浪。

[01:04] Not the surfer.
不是冲浪者。

[01:05] But anyway, it just feels like this AI thing doesn't really require much brains.
但不管怎样，感觉这个人工智能的东西并不需要太多脑子。

[01:09] Doesn't require much brains.
不需要太多脑子。

[01:10] Really doesn't require much brains.
真的不需要太多脑子。

[01:11] Then what does it require?
那么它需要什么呢？

[01:12] I think in this industry, the most important trait is being reliable.
我认为在这个行业，最重要的特质是可靠。

[01:17] Being detail-oriented.
注重细节。

[01:19] And taking responsibility for what you do.
并对你所做的事情负责。

[01:21] These are the most important traits.
这些是最重要的特质。

[01:28] Aren't there two Yao Shunyus in Silicon Valley?
硅谷难道有两个姚舜宇吗？

[01:29] Why don't you first introduce yourself to everyone and then explain to everyone the difference between the two Yao Shunyus?
你为什么不先向大家介绍一下你自己，然后再向大家解释一下两个姚舜宇的区别呢？

[01:35] Ah.
啊。

[01:35] Sure, yeah.
当然，是的。

[01:36] So my name is Yao Shunyu.
所以我的名字是姚舜宇。

[01:39] And obviously there's also a friend with an almost identical name (Yao Shunyu, Chief AI Scientist at Tencent, former OpenAI researcher).
而且显然还有一位名字几乎相同的（姚舜宇，腾讯首席人工智能科学家，前OpenAI研究员）朋友。

[01:44] And our main career paths also have some overlap (overlap).
而且我们的主要职业道路也有一些重叠（重叠）。

[01:46] So it might look very difficult to tell us apart.
所以可能看起来很难区分我们。

[01:48] Yeah, and I used to study physics.
是的，我以前学物理的。

[01:52] I did my undergrad at Tsinghua.
我在清华读的本科。

[01:55] I worked on condensed matter theory back then.
当时我研究的是凝聚态理论。

[01:57] Then later went to Stanford to do theoretical high-energy physics.
后来又去了斯坦福研究理论高能物理。

[02:01] And quantum information and black hole-related areas.
以及量子信息和黑洞相关领域。

[02:04] After leaving Stanford, went to Berkeley.
离开斯坦福后，去了伯克利。

[02:08] Briefly stayed for two weeks as a postdoc (postdoctoral researcher).
作为博士后（博士后研究员）短暂地待了两周。

[02:11] Then quit.
然后辞职了。

[02:13] And went to Anthropic.
然后去了Anthropic。

[02:14] Stayed at Anthropic for a year.
在Anthropic待了一年。

[02:17] Around late September to early October last year, joined Gemini.
大约在去年的九月末到十月初，加入了Gemini。

[02:21] Yeah, and if everyone insists on telling us apart.
是的，如果每个人都坚持要区分我们的话。

[02:24] I think the biggest difference is that Shunyu, he has always been doing CS from the start.
我认为最大的区别是，舜宇，他从一开始就一直在做计算机科学。

[02:29] Computer science-related stuff.
计算机科学相关的东西。

[02:30] While I actually, in a sense, came to this halfway.
而我实际上，在某种意义上，是半路出家的。

[02:34] Yeah, I mainly did theoretical physics before.
是的，我之前主要做理论物理。

[02:37] Yeah.
是的。

[02:38] Are you two good friends?
你们两个是好朋友吗？

[02:39] You guys seemed to have known each other since college.
你们看起来从大学就认识了。

[02:40] And you were in the same year, right? (Yes)
而且你们是同一年级的，对吧？（是的）

[02:42] What kind of person is he?
他是什么样的人？

[02:43] What kind of person are you?
你是什么样的人？

[02:43] Evaluate him.
评价他。

[02:44] Evaluate yourself too (hahaha).
也评价一下你自己吧（哈哈哈）。

[02:46] Yeah yeah, we knew each other since undergrad.
是的，是的，我们从本科就认识了。

[02:47] Because we were in the same year in undergrad.
因为我们在本科是同一年级的。

[02:49] At Tsinghua.
在清华。

[02:49] But he, of course, he studied computer science from the start.
但是他，当然，他从一开始就学的计算机科学。

[02:51] So he was in that Yao Class, the computer science experimental class.
所以他在那个姚班，计算机科学实验班。

[02:53] And I studied physics.
而我学的物理。

[02:55] So I was in the Ji Class.
所以我是在集班。

[02:56] Yeah, and later he went to Princeton.
是的，后来他去了普林斯顿。

[02:58] I went to Stanford.
我去了斯坦福。

[02:59] This might also be another somewhat puzzling point.
这可能也是另一个有点令人费解的点。

[03:02] Which is, it seems like in the general world people think Stanford is where computer science people should go.
也就是说，似乎在普遍的认知里，人们认为斯坦福是计算机科学专业人士应该去的地方。

[03:08] And think Princeton is where physics people should go.
而认为普林斯顿是物理学专业人士应该去的地方。

[03:10] But we happened to do the opposite.
但我们恰恰做了相反的事情。

[03:11] Haha.
哈哈。

[03:12] So that might also have caused some confusion.
所以这可能也引起了一些困惑。

[03:15] And we really are quite different.
我们确实非常不同。

[03:18] I think he's a much more interesting person than I am.
我认为他比我是一个更有趣的人。

[03:20] I think I've also learned a lot from him.
我认为我也从他那里学到了很多。

[03:22] In the past as well.
过去也是如此。

[03:23] I've been able to learn things that are quite different from my own strengths.
我能够学到一些与我自身优势截然不同的东西。

[03:26] For example, he probably spends a lot of time thinking.
例如，他可能花了很多时间思考。

[03:29] Like in AI.
比如在人工智能领域。

[03:29] He spends a lot of time thinking about.
他花了很多时间思考。

[03:31] Human-AI interaction.
人机交互。

[03:33] And also some product-related things.
还有一些与产品相关的事情。

[03:36] And I think, for me,
我想，对我来说，

[03:38] He's a very different kind of friend.
他是一种非常不同的朋友。

[03:40] And I've also learned a lot from him.
我也从他那里学到了很多。

[03:41] When you were in Silicon Valley,
当你在硅谷的时候，

[03:42] How often did you meet?
你们多久见一次面？

[03:43] Do you still call each other frequently now?
你们现在还经常联系吗？

[03:45] How frequently?
多久一次？

[03:47] We did meet quite frequently when we were in Silicon Valley.
我们在硅谷的时候确实经常见面。

[03:51] Maybe every few weeks.
也许每几周一次。

[03:53] But it seems like we mainly met just to hang out.
但似乎我们见面主要是为了闲逛。

[03:56] Hahaha.
哈哈哈。

[03:57] Doing what?
做什么？

[03:58] Well.
嗯。

[03:59] It was really just purely for fun.
那真的纯粹是为了好玩。

[04:00] Like going out for a walk.
比如出去散步。

[04:02] And chatting about random stuff.
然后聊些无关紧要的事情。

[04:04] And sometimes having a meal.
有时也会一起吃饭。

[04:06] Playing cards or something like that.
打牌之类的。

[04:07] Right, haha, right.
对，哈哈，对。

[04:08] And after he went back.
然后他回去之后。

[04:09] We actually still.
我们其实还。

[04:10] Often call each other.
经常联系。

[04:12] What did you talk about in the most recent call?
最近一次通话聊了什么？

[04:13] I think it was one or two weeks ago.
我想大概是一两周前吧。

[04:15] Ah.
啊。

[04:16] How did you know?
你怎么知道的？

[04:17] Uh, probably just.
呃，可能只是。

[04:19] Every few months.
每隔几个月。

[04:20] Then we catch up a bit.
然后我们聊聊近况。

[04:23] Share recent updates, yeah.
分享一下最近的动态，嗯。

[04:25] Has he tried multiple times to get you to join him?
他有没有多次尝试让你加入他？

[04:27] Uh.
呃。

[04:29] Hmm.
嗯。

[04:30] Ha.
哈。

[04:31] Maybe he does, I guess.
也许他有吧，我猜。

[04:32] But, but.
但是，但是。

[04:33] I don't think it matters.
我觉得这不重要。

[04:34] It doesn't matter, hahaha.
不重要，哈哈哈。

[04:35] Why don't you go?
你为什么不去呢？

[04:36] I think for myself.
我想我自己。

[04:37] I.
我。

[04:38] Haven't figured it out yet.
还没有想清楚。

[04:39] Yeah, I think it's mostly my own reasons.
是的，我想这主要是我自己的原因。

[04:41] And then.
然后。

[04:43] I didn't join any.
我没有加入任何。

[04:46] Chinese companies either.
中国公司。

[04:46] And I think the main reason is.
我想主要原因是因为。

[04:49] Around September or August-September last year.
去年九月左右，或者八九月。

[04:53] I think.
我想。

[04:54] When I left.
当我离开的时候。

[04:55] Left Anthropic.
离开了Anthropic。

[04:56] And when deciding where to go after leaving, my biggest motivation was.
在决定离开后去哪里时，我最大的动力是。

[05:01] I wanted to learn something different.
我想学点不一样的东西。

[05:03] Yeah, for me I probably didn't consider.
是的，对我来说，我可能没有考虑过。

[05:06] No, no.
不，不。

[05:07] More seriously consider being able to lead a project or lead a project or something.
更认真地考虑能够领导一个项目或领导一个项目之类的。

[05:12] I was more at that time more focused on prioritizing learning something new.
我当时更专注于优先学习新东西。

[05:16] So that's why I chose to go to Gemini, right?
所以这就是我选择去Gemini的原因，对吧？

[05:19] I noticed you two are always being compared and discussed together.
我注意到你们两个总是被比较和讨论在一起。

[05:22] Is it more of a bother or more enjoyable for you?
这对你来说是更麻烦还是更享受？

[05:24] I don't really feel anything about it.
我对此并没有什么特别的感觉。

[05:27] And because I'm not really someone who pays attention to social media.
而且因为我并不是一个关注社交媒体的人。

[05:31] So I really don't feel anything about it.
所以我真的对此没有什么感觉。

[05:34] Yeah.
是的。

[05:35] Because Shunyu, he said last year, AI has entered the second half.
因为Shunyu，他说去年，AI已经进入下半场。

[05:40] Entered the second half.
进入了下半场。

[05:41] This became a very famous viewpoint.
这成了一个非常著名的观点。

[05:44] What do you think of today's AI?
你如何看待今天的AI？

[05:45] What stage is it at?
它处于什么阶段？

[05:46] Can you give it a definition?
你能给它下一个定义吗？

[05:47] Yeah, for me, I might not see so clearly what the first half means, what the second half means.
是的，对我来说，我可能看不清楚上半场意味着什么，下半场意味着什么。

[05:54] Or rather, this definition has never been particularly clear to me.
或者说，这个定义对我来说从来没有特别清楚过。

[05:57] For me, AI has indeed entered a stage where I think everyone has started to worry less about one thing.
对我来说，AI确实进入了一个阶段，我认为每个人都开始不太担心一件事了。

[06:04] Whether AI can do it.
人工智能能否做到这一点。

[06:06] And more about whether the problem itself is well-defined.
以及关于问题本身是否定义良好。

[06:09] Yeah, I think this is a huge difference.
是的，我认为这是一个巨大的区别。

[06:11] For example, I think a year ago.
例如，我认为一年前。

[06:13] Or maybe early last year.
或者可能是去年年初。

[06:15] At that time.
那时。

[06:16] I was at Anthropic.
我在 Anthropic。

[06:17] And what everyone was worried about was like, 'Hey,'
而大家担心的是，‘嘿，’

[06:21] OpenAI's reasoning is so strong.
OpenAI 的推理能力如此强大。

[06:23] Do we have a chance to catch up?
我们有机会赶上吗？

[06:25] And how likely are we to surpass them?
我们有多大可能超越他们？

[06:28] Everyone was still very worried about this.
大家对此仍然非常担心。

[06:29] I think now, at least among.
我认为现在，至少在。

[06:31] At least among Gemini, OpenAI, and Anthropic, these three.
至少在 Gemini、OpenAI 和 Anthropic 这三者之间。

[06:34] I don't think any of them.
我不认为他们中的任何一个。

[06:35] Is really worried about not catching up.
真的担心赶不上。

[06:38] Mm-hmm.
嗯哼。

[06:38] And I think what might be harder for everyone now.
而且我认为现在对大家来说可能更难的是。

[06:40] Is.
是。

[06:42] Figuring out what to actually do.
弄清楚到底该做什么。

[06:44] This is something that.
这是某件事。

[06:46] I think is.
我认为是。

[06:47] Is a bet.
是一个赌注。

[06:48] But also.
但也是。

[06:50] I think it's also.
我认为它也是。

[06:51] Something that requires a lot of human insight.
需要大量人类洞察力的事情。

[06:51] Yeah.
是的。

[06:54] So that also means model capabilities have been leveled out.
所以这也意味着模型能力已经趋于平缓。

[06:58] Right?
对吧？

[06:58] They've become homogenized.
它们已经变得同质化。

[06:59] Commoditized.
商品化。

[07:00] So there's not a huge difference between the models.
所以模型之间没有巨大的差异。

[07:03] In terms of good versus bad, there's not a huge difference.
在好与坏方面，没有巨大的差异。

[07:06] But they need to differentiate.
但它们需要区分开来。

[07:08] I think from the actual user experience, you can feel the differences between these three companies' models.
我认为从实际用户体验来看，你可以感受到这三家公司模型之间的差异。

[07:15] But the hard part is in the past, you could see this difference on paper too.
但难点在于，过去你也可以在纸面上看到这种差异。

[07:20] What do you mean by 'on paper'?
你说的‘纸面上’是什么意思？

[07:22] 'On paper' means, like, publicly available there are many kinds of benchmarks these standardized measurement frameworks.
‘纸面上’的意思是，公开可用的，有许多种基准测试，这些标准化的测量框架。

[07:26] And for example, people used to look at SWE-bench.
例如，人们过去会看 SWE-bench。

[07:29] Yeah, yeah, yeah, you could look at SWE-bench.
是的，是的，是的，你可以看 SWE-bench。

[07:31] And for math, back then people would compare things like simpler ones AIME and harder ones like IMO.
至于数学，那时人们会比较像 AIME 这样的简单题目和像 IMO 这样的难题。

[07:36] Back then it felt like you could tell just from the numbers.
那时感觉你仅凭数字就能分辨出来。

[07:39] 'Hey, this model seems stronger at reasoning', 'that model seems stronger at coding', 'that model is stronger at this.'
‘嘿，这个模型在推理方面似乎更强’，‘那个模型在编码方面似乎更强’，‘那个模型在这一点上更强’。

[07:44] Now, on paper, everyone is actually pretty close.
现在，纸面上看，大家其实都差不多。

[07:47] And when you look at the numbers on paper like looking at SWE-bench you'll find, 'Hey, it seems like the best is only maybe one percentage point or two percentage points better than the not-so-good ones, but actually everyone is around 80%.
当你查看纸面上的数字，比如看 SWE-bench 时，你会发现，‘嘿，似乎最好的只比不太好的高一到两个百分点，但实际上大家都在 80% 左右。

[07:55] A slightly higher number around there or a slightly lower one is mostly just noise.
稍高或稍低的数字大多只是噪音。

[08:02] It's mainly just noise rather than signal.
这主要是噪音而不是信号。

[08:03] Yeah. But on the other hand,
是的。但另一方面，

[08:06] In actual usage, people can still experience the differences.
在实际使用中，人们仍然可以体验到差异。

[08:08] I think, mm-hmm.
我认为，嗯哼。

[08:11] From what I personally know, Claude is still the more general-purpose in terms of this tool-using agent, the best-performing one.
据我个人所知，在作为这种使用工具的智能体方面，Claude仍然是通用性更强、表现最好的一个。

[08:22] And in pure coding, maybe Codex has caught up a bit recently, narrowing the gap a little.
而在纯粹的编码方面，Codex最近可能已经迎头赶上了一点，缩小了差距。

[08:30] And Gemini might be better at pure reasoning and in some more everyday usage scenarios it might still be better for now.
而Gemini可能在纯粹的推理方面更胜一筹，并且在一些更日常的使用场景中，目前它可能仍然更好。

[08:40] And then in in coding and agents, it's still in a state of catching up.
而在编码和智能体方面，它仍然处于追赶状态。

[08:45] Mm-hmm.
嗯哼。

[08:45] These capabilities—are they deliberately choosing which direction to prioritize or is it simply a matter of good versus bad?
这些能力——是他们在刻意选择优先哪个方向，还是仅仅是好与坏的问题？

[08:52] Is it a capability issue or a prioritization issue?
是能力问题还是优先事项问题？

[08:55] I think there is actually an element of prioritization involved.
我认为实际上存在优先事项的考量。

[08:59] Especially in the past, it was mainly about prioritization.
尤其是在过去，这主要是关于优先事项的。

[09:01] When everyone could see the differences on paper,
当每个人都能在纸面上看到差异时，

[09:06] Prioritization was definitely the dominant factor.
优先排序无疑是主导因素。

[09:09] Because maybe like Claude has always valued this tool-use capability more.
因为也许像 Claude 一直更看重这种工具使用能力。

[09:16] And including coding.
包括编码在内。

[09:17] So maybe OpenAI also placed a lot of emphasis on reasoning for a while.
所以也许 OpenAI 在一段时间内也很重视推理。

[09:20] Yeah, and of course now they're starting to focus on coding too.
是的，当然现在他们也开始关注编码了。

[09:21] So back then, prioritization definitely accounted for most of it.
所以当时，优先排序无疑占了大部分原因。

[09:23] Because if you're more willing to prioritize something, it means you can spend more effort building the right infrastructure.
因为如果你更愿意优先考虑某件事，就意味着你可以投入更多精力来构建正确的基础设施。

[09:30] The right infrastructure, building the right data, and especially data, it's something that in a sense takes a lot of time and effort.
正确的基础设施，构建正确的数据，尤其是数据，从某种意义上说，需要花费大量的时间和精力。

[09:38] Right, so back then, it was definitely driven by willingness.
对，所以当时，确实是出于意愿驱动的。

[09:41] But at this point, I think both factors are at play.
但现在，我认为这两个因素都在起作用。

[09:47] Because well, on paper, everyone looks pretty similar, and even if you do some more internal testing, the numbers become not that different.
因为，嗯，纸面上看，每个人看起来都差不多，即使你做一些更内部的测试，数字也没有太大区别。

[09:59] And then the harder thing becomes how you define your problem, define the behavior you want.
然后更难的事情就变成了你如何定义你的问题，定义你想要的行为。

[10:08] isn't defined very clearly.
定义得不是很清楚。

[10:09] a lot of the model differences actually come from things
很多模型差异实际上来自于一些

[10:13] that you wouldn't even imagine.
你甚至都想不到的事情。

[10:15] right?
对吧？

[10:16] by 'things you wouldn't imagine,' I mean
我说的“你想象不到的事情”是指

[10:17] I mean, things you wouldn't imagine.
我指的是你想象不到的事情。

[10:21] if you ask me now,
如果你现在问我，

[10:23] it's hard for me to give you a very clear answer.
我很难给你一个非常清晰的答案。

[10:25] maybe after some time, looking back,
也许过一段时间，回过头来看，

[10:27] I'll be able to give a clear answer.
我将能够给出一个清晰的答案。

[10:28] but I can give an example
但我可以举一个例子

[10:31] of something you wouldn't imagine.
说明你想象不到的事情。

[10:33] like, if we go back
比如，如果我们回到

[10:36] maybe one, two, or even three years
也许一、两年前，甚至三年前

[10:38] back then, if you went online
那时，如果你上网

[10:42] to collect pre-training data,
收集预训练数据，

[10:44] you'd see models learning to write code.
你会看到模型在学习写代码。

[10:48] of course, there wasn't this agentic way of writing code back then.
当然，那时还没有这种代理式写代码的方式。

[10:50] it was just writing a piece of code (mm-hmm).
它只是写一段代码（嗯哼）。

[10:51] and you'd find that
你会发现

[10:52] models wrote code very well.
模型写代码写得非常好。

[10:54] but back then, people didn't know why.
但那时，人们不知道为什么。

[10:56] but the unexpected reason behind this might be
但其背后出乎意料的原因可能是

[10:58] if you just randomly collect from the web
如果你只是从网上随机收集

[10:59] without any data filtering,
没有任何数据过滤，

[11:03] naturally,
自然地，

[11:04] the quality of code data would be a bit higher than others.
代码数据的质量会比其他数据略高一些。

[11:07] because if you look at web pages
因为如果你看网页

[11:08] You'll find GitHub's quality is significantly higher than other normal web pages.
你会发现GitHub的质量明显高于其他普通网页。

[11:13] Before we get into today's topic, I'd like to talk about some recent news about our models.
在我们进入今天的话题之前，我想谈谈我们模型的一些最新消息。

[11:18] You see, everyone's been talking about OpenClaw recently.
你看，最近大家都在谈论OpenClaw。

[11:22] As a frontline researcher, what do you think of this new product form?
作为一名一线研究员，你如何看待这种新产品形式？

[11:25] What discussions are happening around you?
你周围正在发生哪些讨论？

[11:27] What's interesting is, I feel like the discussion outside the industry seems more intense than inside the industry.
有趣的是，我觉得行业外的讨论似乎比行业内的讨论更激烈。

[11:36] Oh, no one inside the industry is talking about it?
哦，行业内没有人谈论它吗？

[11:38] People inside are talking about it, but I think for industry insiders, it's not really, um, a particularly surprising thing.
业内人士在谈论它，但我认为对于业内人士来说，这并不是一件特别令人惊讶的事情。

[11:45] Oh, what do you mean?
哦，你是什么意思？

[11:46] Like, maybe inside the company, some people have already done similar experiments or demos like this.
比如，可能公司内部已经有人做过类似的实验或演示了。

[11:53] It's just that it wasn't packaged as a product and seriously marketed, polished and launched.
只是它没有被包装成产品并认真营销、打磨和发布。

[12:00] Right, and of course, the reality is, if you look at OpenClaw, the earliest version of the code on GitHub, actually, that code was, in a sense.
是的，当然，现实情况是，如果你看看GitHub上OpenClaw最早版本的代码，实际上，那段代码在某种意义上。

[12:08] not particularly clean.
不是特别干净。

[12:10] but I think what's important is it showed everyone this possibility.
但我觉得重要的是它向所有人展示了这种可能性。

[12:15] mm-hmm, and after showing this possibility, the OpenClaw author himself joined OpenAI.
嗯哼，在展示了这种可能性之后，OpenClaw的作者本人加入了OpenAI。

[12:22] and then then probably these model labs or some larger startups will catch up quickly and polish this into a truly usable product.
然后，这些模型实验室或一些更大的初创公司将迅速跟进，并将其打磨成一个真正可用的产品。

[12:31] mm-hmm (right), so I understand.
嗯哼（对），我明白了。

[12:32] actually, before OpenClaw was released, people at Google were already working on this.
实际上，在OpenClaw发布之前，谷歌的人们就已经在研究这个了。

[12:35] it just hadn't been released yet.
只是还没有发布而已。

[12:36] because big companies have longer processes.
因为大公司有更长的流程。

[12:39] right, my, my, my.
对，我的，我的，我的。

[12:40] at least personally, that's the impression I've gotten.
至少就我个人而言，我得到了这样的印象。

[12:45] What we're seeing is exactly that.
我们所看到的正是如此。

[12:46] Right.
对。

[12:46] So behind this product form, similar to OpenClaw, what does that inherently tell us?
那么，在这种产品形式背后，类似于OpenClaw，这本身告诉我们什么？

[12:50] At this point in early this year, I think, actually, technically speaking, it doesn't really prove much.
在今年年初的这个时间点，我认为，实际上，从技术上讲，它并没有真正证明多少东西。

[12:59] I mean, this OpenClaw product, of course it relies on many things the model can do, but those capabilities weren't actually only ready by early this year.
我的意思是，这个OpenClaw产品，当然它依赖于模型能做的很多事情，但这些能力实际上并不是在今年年初才准备好的。

[13:10] I think maybe last year, like when Opus released 4.5 (Claude series), and then, and then...
我想也许是去年，就像 Opus 发布 4.5（Claude 系列）的时候，然后，然后……

[13:17] of course back then, Opus was actually ahead of OpenAI and Gemini 3 in terms of tool use capabilities.
当然，当时 Opus 在工具使用能力方面实际上领先于 OpenAI 和 Gemini 3。

[13:22] So I think at that point, doing this thing, it was already something you could demonstrate.
所以我觉得在那一点上，做这件事，已经是可以展示的东西了。

[13:26] And actually, it didn't blow up immediately upon release.
而且实际上，它发布时并没有立即爆火。

[13:29] It only went viral some time after the launch.
它是在发布后一段时间才开始走红的。

[13:32] Hmm.
嗯。

[13:32] So, for me personally, technically it's not really something so surprising.
所以，对我个人而言，技术上来说并没有什么特别令人惊讶的。

[13:42] It's a natural overflow of model capabilities.
这是模型能力的一种自然溢出。

[13:43] Right, right, right, I'd say so.
对，对，对，我倒是这么认为。

[13:45] But I think the surprise for everyone might be that perhaps nobody had realized this before.
但我觉得让大家感到惊讶的可能是，也许以前没有人意识到这一点。

[13:50] It made everyone realize this could actually be done.
它让每个人都意识到这实际上是可以做到的。

[13:51] Realize what?
意识到什么？

[13:52] Realized that you can, like, let the model do very...
意识到你可以，比如，让模型做非常……

[13:55] I mean, you can control many different models and do many different things, and then aggregate all of that, and after aggregating, do this kind of very, very, very long-horizon task.
我的意思是，你可以控制许多不同的模型，做许多不同的事情，然后将所有这些聚合起来，聚合之后，完成这种非常、非常、非常长周期的任务。

[14:02] This kind of work.
这种工作。

[14:03] I think maybe previously, people hadn't widely reached a consensus on this.
我想也许以前，人们并没有在这方面达成广泛的共识。

[14:08] This thing showed everyone this kind of possibility.
这件事向大家展示了这种可能性。

[14:13] You see, what went viral early last year was Manus,
你看，去年年初走红的是Manus，

[14:15] and what went viral early this year is OpenClaw.
今年年初走红的是OpenClaw。

[14:16] So from Manus to OpenClaw,
所以从Manus到OpenClaw，

[14:18] what changed?
有什么变化？

[14:19] Is it a change in model capabilities,
是模型能力的变化，

[14:19] or a change in the product?
还是产品本身的变化？

[14:20] This is also something I've never really understood.
这也是我一直没太搞懂的。

[14:23] Hmm.
嗯。

[14:23] Like,
比如，

[14:27] What is the qualitative difference between Manus and OpenClaw?
Manus和OpenClaw之间，质的区别是什么？

[14:32] It's something
这是

[14:32] I actually haven't quite figured out myself.
我自己其实还没完全搞清楚的。

[14:34] To be honest, haha.
说实话，哈哈。

[14:36] OK.
好的。

[14:37] Hmm, like,
嗯，比如，

[14:38] or in other words,
或者换句话说，

[14:39] maybe OpenClaw went viral,
也许OpenClaw火了，

[14:44] but if you were to ask me retroactively,
但如果让我回过头来看，

[14:46] why Manus couldn't do this,
为什么Manus做不到，

[14:49] I don't understand why Manus couldn't do it.
我也不理解Manus为什么做不到。

[14:51] Maybe they just didn't get it right.
可能就是没做好吧。

[14:54] But you see,
但你看，

[14:55] whether it's Manus or OpenClaw,
无论是Manus还是OpenClaw，

[14:56] they both chose to sell.
它们都选择了出售。

[14:57] Manus was sold to Meta (Note: This acquisition has since been revoked; our program was recorded before the revocation).
Manus被卖给了Meta（注意：此收购后来被撤销；我们的节目录制于撤销之前）。

[14:58] OpenClaw was sold to OpenAI.
OpenClaw被卖给了OpenAI。

[14:59] What does this phenomenon tell us?
这个现象说明了什么？

[15:01] Why did they both sell?
为什么它们都卖了？

[15:03] I think, hmm,
我觉得，嗯，

[15:04] my own feeling is that for something
我自己的感觉是，要让某个东西

[15:08] to survive long-term,
能够长期生存下去，

[15:11] it still needs to have some moats.
它仍然需要有一些护城河。

[15:14] The moat is the model.
护城河就是模型。

[15:15] I think at least for now, many moats are on the model side.
我认为至少目前，许多护城河都在模型方面。

[15:21] But whether product-side moats will emerge in the future, I think that's hard to say.
但未来是否会出现产品方面的护城河，我认为很难说。

[15:24] Because everyone... This is all an age-old topic in the market.
因为每个人……这在市场上是一个老生常谈的话题。

[15:29] Many people talk about this. Things like data flywheels and such.
很多人都在谈论这个。诸如数据飞轮之类的东西。

[15:33] For now, I don't think there's any scenario that has truly formed a data flywheel.
目前，我认为还没有任何真正形成数据飞轮的场景。

[15:39] Even purely AI-native application scenarios.
即使是纯粹的AI原生应用场景。

[15:43] I think currently, besides agentic coding, other than writing code, there's no scenario that is truly AI-native.
我认为目前，除了代理编码，除了编写代码，没有真正AI原生的场景。

[15:50] became hugely successful because in a sense chatbots are actually an extension of search
取得了巨大的成功，因为从某种意义上说，聊天机器人实际上是搜索的延伸。

[15:57] A chatbot is an extension of search
聊天机器人是搜索的延伸。

[15:59] Right, that's why it's not independent of search
对，所以它不独立于搜索。

[16:01] It's because because think about it the most common way people interact with chatbots is
这是因为，因为想想看，人们与聊天机器人互动最常见的方式是

[16:08] I have a question and they ask the chatbot
我有一个问题，然后他们问聊天机器人。

[16:11] and that's essentially what search has always done
这本质上是搜索一直以来所做的。

[16:13] But what it offers
但它提供的是

[16:14] something far better than search is it becomes very interactive (交互的).
比搜索好得多的东西是它变得非常交互（交互的）。

[16:19] It has interactivity.
它具有交互性。

[16:20] You can ask follow-up questions and it can even help you summarize some of the information you get through it.
你可以问后续问题，它甚至可以帮助你总结你从中获得的一些信息。

[16:26] helping you distill it into a condensed answer to your question.
帮助你将其提炼成你问题的精炼答案。

[16:30] Right, this is something search could never give you before.
对，这是搜索以前永远无法给你的东西。

[16:32] Mm-hmm (right).
嗯哼（对）。

[16:33] But of course it's not exactly the same need.
但当然它不完全是相同的需求。

[16:37] But in terms of demand from a broad demand perspective it's fairly similar to the demand that existed before.
但从广泛需求的角度来看，它的需求与以前的需求相当相似。

[16:42] Manus and OpenClaw.
Manus 和 OpenClaw。

[16:44] I think they're the most famous wrappers right now.
我认为它们是目前最著名的包装器。

[16:46] But wrappers ended up being sold to model companies (Note: Meta's acquisition of Manus was later reversed; our show was recorded before the reversal).
但包装器最终被卖给了模型公司（注意：Meta 对 Manus 的收购后来被撤销；我们的节目是在撤销之前录制的）。

[16:48] Doesn't that show that wrappers still can't escape the grip of model companies?
这难道不能说明包装器仍然无法摆脱模型公司的控制吗？

[16:52] The escape velocity isn't enough.
逃逸速度不够。

[16:53] It's not fast enough, is it?
它不够快，不是吗？

[16:55] I think.
我认为。

[16:58] I think for wrappers to survive in the current environment there are two approaches I can roughly imagine.
我认为包装器要在当前环境中生存，我大致可以想象两种方法。

[17:02] One approach is what you just said.
一种方法就像你刚才说的。

[17:09] Escape fast enough.
足够快地逃离。

[17:11] That is, my growth is fast enough that by the time model companies catch on.
也就是说，我的增长足够快，以至于模型公司能够跟上。

[17:16] I've already captured significant user mindshare.
我已经获得了重要的用户关注。

[17:19] And when model companies catch up to your product,
当模型公司赶上你的产品时，

[17:23] by that time,
届时，

[17:23] I've already evolved my own model.
我已经进化了自己的模型。

[17:25] I think Cursor is trying to take this path (mm-hmm).
我认为 Cursor 正在尝试走这条路（嗯哼）。

[17:29] So Cursor, in this AI-native scenario,
所以 Cursor 在这个 AI 原生场景下，

[17:33] is pretty much the fastest-growing startup I can think of.
几乎是我能想到的增长最快的初创公司。

[17:36] Even a company like that is feeling a strong sense of crisis right now.
即使是这样的公司，现在也感到强烈的危机感。

[17:40] How strong is that sense of crisis?
这种危机感有多强烈？

[17:42] Anyway, my feeling is that for Cursor,
总之，我的感觉是，对于 Cursor 来说，

[17:46] its relationship with Anthropic right now has entered a very delicate phase.
它与 Anthropic 目前的关系已经进入了一个非常微妙的阶段。

[17:50] It's like they used to be close, seamless partners.
就像他们曾经是亲密无间的合作伙伴一样。

[17:55] Anthropic provided the model, Cursor provided the product.
Anthropic 提供了模型，Cursor 提供了产品。

[17:57] Later Anthropic developed Claude Code itself.
后来 Anthropic 自己开发了 Claude Code。

[17:59] Claude Code has become very successful.
Claude Code 已经非常成功了。

[18:01] And then Cursor is now trying to build its own model.
然后 Cursor 现在正试图构建自己的模型。

[18:03] So Cursor is working hard training its Composer.
所以 Cursor 正在努力训练它的 Composer。

[18:08] So I don't even think we need to talk about the future.
所以，我认为我们甚至不需要谈论未来。

[18:11] It's already happening right now.
它现在已经发生了。

[18:11] They're already in a fairly competitive relationship (mm-hmm).
他们已经处于一种相当竞争的关系中（嗯哼）。

[18:14] If they lose in this competition
如果他们在这场竞争中失败了，

[18:17] I think it would be quite problematic.
我认为这会相当有问题。

[18:18] Because when it comes to coding, at its core it's essentially a professional need serving professional users.
因为说到编码，它的核心本质上是服务于专业用户的专业需求。

[18:25] It's a productivity tool.
它是一个生产力工具。

[18:26] A common scenario with productivity tools is winner takes all.
生产力工具的一个常见场景是赢家通吃。

[18:32] I think this applies whether to Cursor or to Anthropic or for any company doing coding.
我认为这适用于 Cursor、Anthropic 或任何从事编码的公司。

[18:38] It's probably something they're all quite worried about.
这可能是他们都相当担心的事情。

[18:40] Mm-hmm (right).
嗯哼（对）。

[18:41] So that's what I was saying.
所以这就是我刚才说的。

[18:42] That's one path.
这是一条路。

[18:43] (It has to be fast) That is, you grow fast enough.
（它必须快）也就是说，你成长得足够快。

[18:46] You grow like crazy before anyone even thinks about acquiring you.
在你被收购之前，你就会疯狂地增长。

[18:49] Just grow wildly.
只是疯狂地增长。

[18:50] By the time they want to acquire you, you're big enough.
到他们想收购你的时候，你已经足够大了。

[18:53] Another way is for the market to be small enough so small that model companies can't even be bothered.
另一种方式是市场足够小，小到模型公司甚至懒得去管。

[18:58] I think Midjourney is exactly that example.
我认为 Midjourney 就是这样的例子。

[19:01] That's it.
就是这样。

[19:02] The market is so small that perhaps even though you could say Gemini could make an effort to replicate what Midjourney does, it might take some effort, some money, some data to pull it off, but it's small enough.
市场太小了，以至于也许尽管你可以说 Gemini 可以努力复制 Midjourney 所做的事情，但这可能需要一些努力、一些金钱、一些数据才能实现，但它足够小。

[19:13] To the point where Gemini probably wouldn't want to spend much time on that.
以至于 Gemini 可能不想在这上面花太多时间。

[19:17] It's beneath them.
这在他们之下。

[19:18] Right, haha.
对，哈哈。

[19:20] I think that might also be a way to survive.
我认为那也可能是一种生存方式。

[19:22] Yeah.
是的。

[19:24] So even Cursor hasn't escaped the model's grasp today.
所以即使是Cursor今天也没有逃脱模型的掌控。

[19:29] Has anyone successfully escaped?
有人成功逃脱了吗？

[19:31] For the big ones, I haven't seen any so far.
对于大的来说，我到目前为止还没有看到任何一个。

[19:35] For smaller ones, maybe Midjourney is an example.
对于小的来说，也许Midjourney是一个例子。

[19:37] Of course there must be other examples.
当然肯定还有其他的例子。

[19:38] I just haven't seen them yet.
我只是还没有看到它们。

[19:39] Right, smaller ones.
对，小的。

[19:40] I think there will be.
我认为会有。

[19:40] There will be examples.
会有例子的。

[19:41] Does Lovart count?
Lovart算吗？

[19:43] I think they have a shot.
我认为他们有机会。

[19:46] They have a shot.
他们有机会。

[19:49] Anyway, you can't do the general-purpose thing.
总之，你不能做通用型的事情。

[19:55] I think this is something the founder has to decide.
我认为这是创始人必须决定的事情。

[19:59] Whether you want to bet on something big with a one-in-ten-thousand chance of survival and swing for the fences.
是想押注于一个生存几率为万分之一的大事物，并全力以赴。

[20:05] Or go with a one-percent chance of survival and lock down something small first.
还是选择一个百分之一的生存几率，先锁定一些小的。

[20:11] If it were you.
如果是我。

[20:11] What would you choose?
你会怎么选？

[20:13] Hahahaha.
哈哈哈。

[20:16] If it were me.
如果是我。

[20:16] Deep down I'd definitely want to swing for the fences.
内心深处我肯定想全力以赴。

[20:18] But honestly

[20:20] I genuinely think

[20:22] You can't get there overnight

[20:26] So if it were me

[20:26] I'd choose to secure a small win first

[20:28] But I'd pick a small one with huge upside potential

[20:33] Why do you think OpenAI acquired OpenClaw?

[20:35] Why did Meta want to buy Manus (Note: Meta's acquisition of Manus was later revoked; our show was recorded before the revocation)

[20:37] Why doesn't Google acquire anyone?

[20:39] Oh, Google did acquire someone

[20:40] Google bought the Windsurf team

[20:42] Okay, Windsurf

[20:44] Yeah

[20:46] I don't get it

[20:47] Haha

[20:49] What do you mean you don't get it?

[20:50] Honestly, it's just that

[20:51] I don't get it

[20:52] I think

[20:55] I think Meta's acquisition of Manus

[21:00] I think for them

[21:01] The biggest benefit was

[21:04] If

[21:04] aside from how much they spent

[21:06] The biggest benefit was gaining a really strong

[21:09] product team in Asia

[21:12] What does being in Asia signify?

[21:13] Because

[21:14] I think on one hand

[21:17] Obviously everyone knows

[21:19] China's AI talent pool is still quite deep

[21:22] Although perhaps currently in terms of technology

[21:25] Purely from a technical standpoint

[21:26] Chinese AI hasn't really caught up with the US yet

[21:30] But

[21:31] Obviously there are many talented AI people in China

[21:33] Whether in pure technology or in product

[21:36] In terms of product, I think China essentially

[21:39] has better talent than the US

[21:40] Right, so for them

[21:42] I think Manus became a

[21:44] foothold in Singapore

[21:46] So they can attract some

[21:48] For example, from China

[21:48] Or from Singapore or East Asian talent

[21:52] And I actually haven't fully figured out

[21:58] How important this product itself is to Meta

[22:02] Or in other words

[22:03] Why couldn't Meta just build this product themselves?

[22:05] But whether it's Manus or OpenClaw

[22:07] They were in fact born from outside teams

[22:10] Why

[22:10] weren't they built by this group of Silicon Valley researchers?

[22:13] Have you thought about that?

[22:14] Yeah, I think

[22:16] Hmm, for me this question

[22:17] Actually

[22:20] I think once a company gets big

[22:23] Its burden gets bigger too

[22:25] Like, I might be a researcher

[22:30] and we can build something really

[22:33] interesting-looking

[22:34] very distinctive products

[22:36] But once I make that product public

[22:39] There's a ton of responsibility that comes with it

[22:41] First, you can't just launch this product

[22:43] and tell all your users

[22:46] You need to go buy another computer to do this

[22:48] Otherwise it might gain access to everything on your computer

[22:50] All the permissions—

[22:51] and crash your system

[22:52] Mm-hmm

[22:52] So for a big company

[22:54] Take Google, for example

[22:54] Google would never release a product like this

[22:56] Right? Mm-hmm

[22:57] So it takes a lot of time to polish the product

[22:59] And you have to make sure

[23:01] there are no legal risks

[23:03] and that it won't damage your brand with users

[23:07] Plus,

[23:08] if you ship it

[23:10] you probably have to allocate

[23:12] some relatively fixed resources

[23:13] to serve this model

[23:15] or serve this

[23:17] product line

[23:18] So yeah, yeah

[23:19] For big companies

[23:20] I think there's quite a lot of burden

[23:22] But for individuals

[23:23] it doesn't matter

[23:24] I mean, it's an open-source project anyway

[23:26] So what if my code is terrible

[23:28] Come help me write it

[23:29] Right? Hahaha, yeah

[23:31] I think whether it's Manus or OpenClaw

[23:33] they actually point to a direction

[23:34] which is

[23:35] this is also a possible narrative for 2026

[23:38] What are your thoughts on 2026

[23:39] and what are your expectations

[23:42] I think there are really so many possibilities

[23:46] And for me

[23:48] in terms of model capabilities

[23:52] I think

[23:54] Models— I sometimes really love saying this slogan

[23:57] which is that models should achieve

[23:59] train with finite context, use as infinite context (finite in training, infinite in use)

[24:03] In other words

[24:04] you use this limited

[24:06] this context length (context window) to train it

[24:08] but in usage, it can use a very, very long

[24:10] even nearly infinite context length

[24:12] I think this

[24:14] has a chance of being realized this year

[24:17] And once this is achieved

[24:19] I think it will unlock many new applications

[24:22] because, to give the simplest example

[24:24] you could potentially let this model

[24:27] interact with you continuously

[24:28] and continuously receive your information

[24:30] And as it runs

[24:32] it will continuously evaluate the current context and your conversation

[24:35] and possibly discard information it deems unimportant

[24:37] And then it becomes

[24:40] the personal assistant everyone dreams of

[24:42] Yeah, I think technically speaking

[24:44] I think this will

[24:45] definitely be realized this year no matter what

[24:47] But of course, of course

[24:48] I think what people haven't reached consensus on yet is

[24:52] how to technically achieve this

[24:53] Mm-hmm

[24:54] Obviously there are many technical paths

[24:56] But I think right now it's more about

[24:58] trying to see which path can work

[25:03] There might be several paths that all work

[25:05] Then we'll have to test them experimentally

[25:08] under common user scenarios

[25:11] to see which path is the most efficient

[25:13] Yeah, I think we're more at this stage right now

[25:15] rather than a stage where no one has ideas

[25:19] Everyone has ideas

[25:20] but we need to figure out which idea is the right one

[25:22] Standing here in Q1 2026 as

[25:25] a frontline researcher

[25:26] do you think the pace of model improvement is slowing down

[25:29] I think not at all (not at all) I think not at all

[25:32] How does its velocity curve compare to '25

[25:34] and what's changed from '24

[25:38] Mm, it's hard to say quantitatively

[25:40] because you need to give me a standard

[25:43] before I can quantitatively tell you

[25:44] because if the standard you give is, like

[25:46] I just look at some Benchmark

[25:48] like, say, SWE-bench

[25:49] how many points it gains each month

[25:51] then this will definitely slow down

[25:53] because by definition

[25:55] this Benchmark maxes out at 100%

[25:56] Mm-hmm

[25:57] so the closer you get

[25:57] the slower it definitely gets

[25:59] but this doesn't necessarily mean

[26:00] that users feel the model's capability growth has slowed

[26:03] because going from 50% to 60%

[26:06] it might feel like, hey

[26:07] that's a bit better

[26:08] but quite possibly

[26:09] For example, from 70% to 75%

[26:11] It found that the gains are even greater than from 50% to 60%

[26:13] Mm-hmm

[26:14] That's entirely possible

[26:15] If it's from 80% to 90%

[26:16] Or 90% to 100%, the difference would feel even more significant

[26:19] Not necessarily

[26:20] Because maybe past

[26:21] Maybe around 80% to 90%

[26:23] Users wouldn't notice any difference

[26:24] It might even get worse

[26:25] You said it doesn't get slower at all

[26:26] Based on what criteria?

[26:27] I think it's based on

[26:29] my personal feeling as a researcher

[26:32] Like

[26:32] My personal impression is

[26:34] the model's ability to learn things is getting stronger and stronger

[26:38] It used to take a lot of effort

[26:39] to get the model to learn to do something

[26:43] But now it probably doesn't require that much effort

[26:45] The most important thing is

[26:46] you need to clearly define the problem

[26:47] and figure out how to build the right data (Mm-hmm)

[26:51] Of course, data

[26:52] Data is broader now

[26:53] including environments and such

[26:56] And

[26:58] the rest

[26:59] often seems to fall into place naturally

[27:03] Right

[27:03] Why is the learning ability getting stronger?

[27:05] The model's learning ability has improved

[27:07] I think maybe on one hand

[27:11] There could be many reasons

[27:12] But I think one reason is pre-training

[27:15] Actually, over the past few months

[27:17] I think it has been getting stronger

[27:19] Pre-training

[27:19] Right, right

[27:20] Model pre-training

[27:21] has actually gotten stronger in the past few months (Mm-hmm)

[27:23] I think this might be

[27:26] somewhat controversial in a sense

[27:28] Because a few months ago

[27:30] I think

[27:31] many people were already discussing whether

[27:34] this Scaling Law

[27:34] had reached its limit

[27:35] Mm-hmm

[27:37] My experience is that it hasn't

[27:39] And my feeling is

[27:40] in the next four months

[27:43] I don't see any signs of it ending either

[27:46] Mm-hmm

[27:48] Why do people think it's reaching its limit?

[27:49] I think, well

[27:51] I-I-I

[27:51] obviously don't know

[27:53] why people think it's reached its limit

[27:54] Because I myself don't feel it's reached its limit

[27:56] But my guess would be

[28:00] When someone thinks a pattern has reached its limit

[28:03] it's basically

[28:04] one of these two situations

[28:06] Ah

[28:07] One situation is

[28:08] they feel the applicable range of this pattern has reached its limit

[28:12] Ah, maybe

[28:13] Maybe

[28:14] Fundamentally speaking

[28:15] Scaling Law

[28:16] simply can't extend infinitely

[28:18] which could be true

[28:19] But this is just a guess

[28:21] That is, this person might feel that

[28:22] the applicable range of this pattern has reached its limit

[28:25] Another possibility is

[28:25] this person feels that

[28:27] this pattern

[28:28] one of its conditions can no longer be met

[28:30] For example, they feel that data has already hit a wall

[28:35] Then I simply haven't extended it further

[28:37] Another possibility

[28:38] But actually there's a third possibility

[28:41] The third possibility is that

[28:44] there's a bug somewhere in their work

[28:46] that they haven't noticed themselves

[28:48] So they think it's reached its limit

[28:49] Oh

[28:51] From my perspective

[28:53] From my observation

[28:55] I think

[28:59] probably the vast majority of people who hit a wall

[29:01] it's because of the third reason

[29:03] It's because there's a bug

[29:04] What kind of bug?

[29:05] I think

[29:06] There are many possible kinds of bugs

[29:08] For example, one possibility is

[29:10] When you're working on Scaling Laws

[29:12] Some scientific assumptions weren't quite right

[29:14] For example, what kind of token horizon you choose

[29:16] That is, for each model size, what kind of

[29:19] expected training data volume you pick

[29:21] And then this amount of data

[29:24] Where this data comes from

[29:25] And then

[29:27] It's possible that these more scientific choices

[29:30] weren't made clearly

[29:30] That's one possibility

[29:31] But I think there's another possibility, which is

[29:33] there's simply a bug

[29:35] Actually, I don't think this is surprising in the industry

[29:39] Many times

[29:41] Fixing a single bug

[29:42] The progress it brings

[29:43] is far greater than some fancy tricks

[29:47] Right

[29:48] And of course, there are other situations

[29:52] These two examples I just gave

[29:54] are situations I've seen quite often

[29:57] So how do you deal with bugs

[29:58] How do you solve bug problems

[30:01] I think, right

[30:01] I feel like this is more of a mindset issue

[30:03] Because when you encounter a bug

[30:04] If you think it can't be fixed

[30:06] You'll say we've hit a wall

[30:07] When you encounter a bug

[30:08] I think, oh

[30:08] This can definitely be fixed

[30:10] Then you'll feel like we haven't hit a wall yet

[30:12] Because everyone definitely encounters bugs

[30:14] I think, I think

[30:17] This might be like what you said

[30:18] That is There are some things that are more about belief

[30:21] But for me

[30:22] A more important thing is the working system

[30:24] That is, when something

[30:30] is different from what you predicted

[30:31] Can you systematically rule out various possibilities

[30:34] I think this is a very important thing

[30:37] Mmhmm

[30:38] This is something I think Gemini and Anthropic do well

[30:41] That is

[30:43] Especially in pre-training

[30:44] That is, when behavior at a certain scale

[30:48] might be different from what you imagined

[30:50] People can design reasonable

[30:53] what we call ablation experiments (消融实验)

[30:55] reasonable experiments like this

[30:56] can help you see

[30:57] test whether some of your

[30:58] imagined possible factors

[31:00] are actually the real factors

[31:02] I think this

[31:03] systematic approach to problem-solving is the key

[31:07] Mmhmm

[31:09] You think

[31:10] Model capabilities can still improve

[31:12] Then its driving force

[31:13] Data and compute

[31:13] Algorithms

[31:14] Which do you think is the main driving force

[31:18] I think they all contribute

[31:20] But in a sense

[31:24] Data and compute are two things

[31:25] that are actually very strongly correlated

[31:28] Data and compute, mmhmm

[31:29] Right, because

[31:30] When your compute goes up

[31:31] you'll naturally attract more data

[31:32] When data goes up

[31:33] you'll naturally need more compute

[31:34] Right, and then

[31:36] For algorithms, I think

[31:39] Algorithmic progress often has a phase transition

[31:43] That is, there's a phase

[31:46] where you haven't figured out what to do at all

[31:49] At that stage, algorithms are extremely critical

[31:52] Because when you haven't figured out what to do at all

[31:54] you might have no way to scale up at all

[31:56] And then you might get stuck there

[31:57] But at a certain point

[31:58] you might discover

[32:00] the most important thing in the algorithm

[32:01] Then it might suddenly go from

[32:03] completely impossible to possible

[32:04] And then after that, algorithmic improvements

[32:06] are more of a gradual improvement

[32:08] That is It might improve your computational efficiency

[32:11] or the efficiency of using data

[32:12] Right, and then

[32:15] Let me give an example

[32:16] For example, from the perspective of language model pre-training

[32:22] Then this leap in algorithms

[32:24] Well

[32:25] I mean, the development of the Transformer

[32:28] But after the Transformer was discovered

[32:30] It's been mostly gradual and smooth

[32:32] Improving its efficiency

[32:34] Or your use of data

[32:35] Or the efficiency of compute usage has been improving

[32:37] Right

[32:38] So the current drivers are compute and data

[32:41] I think within the relatively clear frameworks we have now

[32:46] The main drivers are compute and data

[32:48] By clear framework, I mean

[32:49] For example, pre-training and post-training

[32:51] Whether it's post-training based on reinforcement learning

[32:53] Or based on supervised learning

[32:56] That is, post-training with supervised learning

[32:57] For example, within these two relatively clear

[33:00] paradigms (范式)

[33:02] Indeed, compute and data are the main drivers

[33:06] But it's undeniable

[33:07] That in some other directions, the driving factors might be different

[33:10] Hmm, what do you mean?

[33:11] To give a simple example

[33:12] For instance, multimodal generation

[33:15] Hmm

[33:15] Well I think it's probably something that, algorithmically speaking

[33:18] Hasn't been fully figured out yet

[33:21] So that's still a scientific problem

[33:22] That hasn't been solved yet

[33:23] Right

[33:25] But language is no longer a scientific problem

[33:29] Natural language generation

[33:31] I think, for now

[33:32] Before this technical approach hits a wall

[33:34] I think it's relatively clear scientifically

[33:37] But in terms of engineering

[33:38] There's still so, so, so much to be done

[33:41] How much more do you think pre-training can improve?

[33:43] Improving model capabilities through pre-training

[33:44] How much more

[33:45] How much further can it go

[33:46] Can we expect

[33:49] That's just how people are

[33:50] I mean, when you haven't hit the wall, you

[33:52] Don't actually know how long the road is

[33:54] What I can

[33:54] What I can see is that we haven't hit the wall yet

[33:57] But I don't know when we'll hit it either

[33:59] If I really had to estimate a timeline

[34:01] As I just said

[34:01] I think four months

[34:03] The next four months will still see progress

[34:06] But in the AI field

[34:07] No one can predict what happens after four months

[34:09] Hmm, so over the past few months

[34:10] When you look at pre-training and model capabilities

[34:12] You're still very excited

[34:14] Is this the general mindset and state around you?

[34:18] I think so

[34:21] Is this within a small environment at Google

[34:22] Or in the entire Silicon Valley environment

[34:24] I think it's hard to say for all of Silicon Valley

[34:25] Because Silicon Valley is too big a place

[34:27] People working on products might be excited about products

[34:29] Right, for product people

[34:29] What excites them most might be something like OpenClaw

[34:32] Hmm

[34:32] But for people working on models

[34:34] It's probably

[34:34] That we get more excited about this kind of model progress

[34:37] Hmm

[34:38] I think

[34:39] Uh

[34:41] For people working on models

[34:42] Is excitement a consensus?

[34:44] Over the past four months

[34:44] I personally think so

[34:46] Oh, I personally think so

[34:48] At least within the circle I have access to

[34:50] I think at

[34:51] Anthropic and Google, people

[34:53] Or at Gemini, people are probably thinking more about

[34:56] How our AI will keep progressing

[34:59] And soon we'll be replaced

[35:01] After being replaced, what should we do?

[35:03] Haha, rather than worrying about what to do when models hit a wall

[35:06] Hahaha

[35:08] Speaking of which

[35:09] Why

[35:10] Over the past few months

[35:11] Coding has been developing the fastest

[35:14] Why is this the case?

[35:16] I think the coding scenario

[35:18] First of all, coding itself

[35:20] Hasn't just been developing the fastest over the past few months

[35:23] I think coding itself

[35:24] Actually

[35:27] From Claude 3.5 (new)

[35:29] Or some people out there called it Claude 3.6 (yeah)

[35:32] After that

[35:32] It's been in a state of rapid development ever since

[35:35] And I think

[35:36] That was early last year

[35:37] Or the end of the year before

[35:38] That was October of the year before last

[35:43] Yeah, yeah

[35:44] It should be, maybe October or November

[35:47] But around that time

[35:49] From then on

[35:49] I've been in a state of rapid development

[35:51] I think the coding scenario has

[35:55] Two biggest advantages

[35:57] The first advantage is its reward signal (回馈信号)

[36:01] That is, its feedback signal

[36:04] Is very well-defined

[36:07] Because

[36:10] For example, if you

[36:11] For example, something like a software engineer (软件工程师) task

[36:14] Often the situation is

[36:16] I need to write some code

[36:18] To implement a feature

[36:20] A feature

[36:21] (Yeah) This feature needs certain inputs

[36:24] And produces certain outputs

[36:26] This is something very easy to

[36:28] Very easy to test

[36:30] So its feedback signal is very clear

[36:32] Your input and output match up

[36:36] Then it means your implementation is successful

[36:39] If not, then it's unsuccessful (Yeah)

[36:40] But this is just one example

[36:41] In coding-related work

[36:44] There are many, many

[36:45] Many such well-defined feedback signals

[36:47] And another big advantage is

[36:52] Coding data has a very natural foundation

[36:57] That foundation is GitHub

[36:59] GitHub has aggregated over the past few, roughly

[37:03] Decades

[37:05] A lot of high-quality code written by many excellent programmers

[37:10] And starting from that code

[37:11] You can build a tremendous number of environments

[37:15] I think these two things, from a model perspective

[37:18] Are why coding can be done very well

[37:20] Of course, I think from a product perspective

[37:23] There's another reason

[37:24] Which is that coding

[37:27] The demand for this product

[37:29] Is in a sense

[37:31] Relatively singular

[37:33] It's not like when you build something like a social media app

[37:38] Or a game

[37:39] Where everyone might have different tastes

[37:41] And it might be hard

[37:43] To satisfy everyone's needs

[37:46] Then you might need recommendation algorithms

[37:49] But with coding

[37:50] The good thing is that excellent programmers writing code

[37:54] Actually have fairly similar styles

[37:57] What kind of style

[37:58] Clean and concise

[37:59] Yeah, right, good code is

[38:01] (Not messy) There are some shared standards

[38:04] For example, like you said

[38:05] The code is concise

[38:07] Structurally clear

[38:09] Suitable for future development

[38:10] And has reasonable abstractions

[38:13] And of course many other standards

[38:14] But I think good programmers tend to have

[38:18] A fairly consensus-driven standard

[38:20] On this matter

[38:20] So from a product perspective

[38:23] It actually makes the coding product much simpler

[38:27] In your current work

[38:28] What percentage of code do you write with Claude Code

[38:33] How many times more productive does it make you

[38:37] You just asked a question that almost got me fired

[38:39] Google doesn't allow using Claude Code

[38:40] Hahahahaha, oh right

[38:44] I think, for me

[38:49] A conservative estimate

[38:51] Maybe 90% of the code is model-generated

[38:54] But it might be

[38:55] I need to spend a lot of time reviewing the code

[38:57] To see if it's written appropriately

[38:59] Written reasonably

[38:59] Whether it's really what I wanted it to write

[39:02] And I think after having AI-assisted tools

[39:06] The most important thing about writing code

[39:08] Has become

[39:10] How you design it

[39:12] How you design the logic of your code

[39:14] And which files it needs to interact with

[39:17] Files to associate with

[39:18] And what things need to be done

[39:20] And you need to give the model

[39:22] Maybe provide some reasonable context

[39:24] I mean, like

[39:25] For example, this code

[39:25] You can use it as a reference (参考) to take a look

[39:28] Right, actually outputting code

[39:31] I think models are way more capable than humans

[39:35] So for me

[39:36] If you actually count

[39:37] How many lines of code I wrote by hand

[39:40] How many lines of code the model wrote

[39:41] I'd say conservatively, the model wrote over 90%

[39:45] If not conservative, maybe 99% or 100%

[39:48] The remaining 10% is what it can't write

[39:50] Or why you didn't let it write

[39:53] Conservatively, 90%

[39:55] Giving myself some credit

[39:56] Hahaha

[39:57] I think what it can't write

[39:58] And the part I can write is becoming less and less

[40:01] Less and less and less

[40:02] What was it like in the past

[40:04] It was what it couldn't write

[40:06] I think

[40:07] Very early on, maybe about a year and a half ago

[40:14] At that time

[40:14] To be honest, on the market

[40:15] Only Claude was able to

[40:17] Actually write this kind of software engineering code

[40:21] At that time

[40:21] You could still feel many flaws in the model

[40:26] For example

[40:26] Sometimes when it wrote code

[40:27] It would only focus on this one file

[40:29] It wouldn't pay much attention to multiple

[40:32] The relationships between multiple files

[40:34] And if, say, a class

[40:37] Its definition was buried many layers deep

[40:40] Or it wasn't directly nested in this

[40:42] This direct tree structure

[40:44] The model probably couldn't find it

[40:46] Now I think this is happening less and less

[40:49] Hehe

[40:50] Really less and less

[40:51] As a researcher

[40:53] Your programming workload

[40:55] How many times that of the past

[40:57] Because from the perspective of writing code

[40:59] It's quite hard to quantify this

[41:00] But if we talk about, say, running experiments

[41:04] And the efficiency of implementing ideas

[41:07] I think compared to a year or even a year and a half ago

[41:10] It could be 20 or even 50 times faster

[41:15] Right, because models have really become

[41:18] It can be pretty insane

[41:19] You can open several at the same time

[41:21] And you have several ideas

[41:22] And test them simultaneously

[41:24] And sometimes even

[41:25] The model can help you monitor some experiments

[41:27] Monitor some results and stuff

[41:28] So

[41:29] It's really quite a significant efficiency boost

[41:32] Right, but

[41:36] If we talk about personal working hours

[41:38] I feel like it has made my working hours longer

[41:43] Why is that

[41:43] It's just that

[41:44] Because development speed has increased

[41:47] The more you try, the more you want to try

[41:48] There are more and more ideas to try

[41:50] So it feels like before, you might have had this situation

[41:54] You have something

[41:55] Like this file

[41:56] You haven't seen it before

[41:57] You might not quite understand it yourself

[42:00] Then you'd definitely have to spend time finding that person

[42:02] And you'd schedule

[42:03] That person, maybe a few hours later

[42:05] But now it's not like that

[42:05] You just see this file

[42:06] You don't understand it, just ask Claude or Gemini

[42:09] Gemini might tell you the result in five seconds

[42:12] And you just keep going

[42:13] Hahaha, so in terms of working hours

[42:16] I feel like working hours have actually gotten longer

[42:18] And the intensity has increased too

[42:20] Well, Google isn't that Google anymore

[42:22] Is that so

[42:22] Not that

[42:23] Google where you can coast along

[42:26] Not that work-life balance Google

[42:28] I feel like in the GenAI (生成式人工智能) field

[42:31] No one can just coast along

[42:32] Hahaha

[42:32] So what hours are you keeping these days

[42:34] I usually start around 9 in the morning

[42:38] Get to the office at 9

[42:41] At 9 AM, I might first get up and check emails

[42:42] And look at the experiments from the night before

[42:45] Then get to the office around 10

[42:47] And then at night

[42:50] If I'm alone in the US

[42:52] I might stay until around 10 or 11

[42:55] Of course, if my family is here

[42:57] If my wife is here

[42:58] I might go home a bit earlier

[42:59] But at home I'd be working anyway

[43:01] So I think in the GenAI field

[43:04] No one is just lying around

[43:06] Unless

[43:07] You've completely lost interest in technology

[43:10] And have no ambition for yourself

[43:12] Then no one would care if you just lay there

[43:14] But I think most people are quite self-driven

[43:18] They just want to do it themselves, right

[43:20] Do you think other fields

[43:21] Will have more of these Claude Code moments

[43:24] Where will the next explosion happen after coding

[43:27] You asked a good question

[43:29] If I could see it clearly

[43:30] I might have gone out to start a company already

[43:32] Hahahaha

[43:35] Right, but but

[43:36] It's true that besides coding

[43:39] We can already see

[43:40] That many

[43:40] Other directions are already having a big impact

[43:43] But if we only talk about those directions

[43:44] They might not be a good

[43:45] market direction

[43:46] Because

[43:48] Coding is special in that

[43:51] It itself is

[43:52] A very large market

[43:53] But if you look at some other directions

[43:54] They might not be

[43:56] Such a large market

[44:00] For example

[44:01] Some people say the next direction is

[44:03] This kind of

[44:05] AI-generated content or something

[44:08] But AI-generated content

[44:11] How big is that market

[44:12] Right

[44:14] I think

[44:15] If you say this content

[44:17] Is for people to consume

[44:18] Then people have limited time

[44:20] No matter how much content you generate

[44:25] People's time is only 24 hours a day

[44:27] Right

[44:29] Unless it completely replaces people

[44:30] Like replacing TV

[44:31] Then that would be another story

[44:32] Like the Vision Pro that came out before

[44:35] Then that would be another story

[44:36] But that would be

[44:38] A bigger story

[44:39] So I think

[44:40] Besides coding

[44:43] Everyone is still looking for

[44:44] The next big market

[44:45] And if there is one

[44:49] I think there will be

[44:51] But it's just

[44:52] Not necessarily that big

[44:54] I think the most likely one might be

[44:55] This kind of interactive education

[44:56] Or maybe

[44:57] You said coding is not a direction for you

[45:01] Because coding itself is already very big

[45:04] Yeah, it's already a huge market

[45:05] Do you think AI researchers

[45:07] How should we treat coding

[45:11] Should we use coding to validate our ideas

[45:12] Or should we make coding itself the end goal

[45:16] I think there are two types of people

[45:19] One type is

[45:21] They genuinely want to make coding better

[45:22] Another type is

[45:25] They want to use coding

[45:26] As a means to validate AI capabilities

[45:27] I think both are fine

[45:28] Both directions are fine

[45:29] But I think

[45:30] The people who genuinely want to make coding better

[45:32] They need to think more about products

[45:33] And the people who want to validate AI capabilities

[45:34] They need to think more about

[45:35] How to build better benchmarks

[45:37] Right

[45:40] I think both directions are very meaningful

[45:42] Just their focus is different

[45:45] Do you think

[45:48] The current state of AI research

[45:50] Is more like

[45:50] A gold rush

[45:51] Or more like

[45:52] A scientific revolution

[45:54] I think it's a bit of both

[45:57] Like those things AI

[45:58] There are many things that AI actually can't easily do

[46:01] But conversely, humans might do better

[46:05] For example, being a product manager

[46:09] To be honest, I think

[46:11] Being a good product manager

[46:12] Is something I currently can't figure out

[46:14] How to train AI to do

[46:17] Why is that

[46:19] There's no standard

[46:19] There's no standard (no metric)

[46:21] Like what makes a good product

[46:23] I can't really figure it out

[46:25] There's no very objective standard

[46:26] You have to build it and let people use it

[46:29] Only then do you know it's good

[46:30] Then everyone will say it's good

[46:32] Right, I think

[46:33] That's something with very unclear feedback signals

[46:36] Then I don't know how to train AI to do that

[46:38] Right

[46:40] When will programmers be completely replaced

[46:43] Will there be such a day

[46:45] Mm-hmm

[46:48] I think that day will come

[46:52] But it won't come all at once

[46:55] It won't be like programmers are all still there

[46:58] And after one night

[46:59] The next day all programmers are fired

[47:01] It won't be like that

[47:02] It will definitely be a gradual process

[47:04] But Everyone can already see this gradual process now

[47:06] Because some companies have already started laying people off

[47:09] Right, I think

[47:11] In a sense

[47:14] AI is a

[47:16] In a sense

[47:16] Of course it's a very good thing

[47:17] But from another perspective

[47:18] It might also be

[47:19] A very unfortunate thing

[47:21] That AI is a very centralized technology

[47:24] It will make a small number of people stronger

[47:27] But will make most people lose

[47:30] Their unique value

[47:32] Right, so I think

[47:33] For traditional software engineering

[47:39] The final result might be

[47:41] Now 1/1000 of the people do the work of everyone in the past

[47:45] Earning 100 times the current salary

[47:49] Then what advice do you have for programmers

[47:54] I think

[47:55] Haha, I think maybe

[47:59] Embrace new things

[48:00] I think that's very important

[48:02] I think

[48:03] One very important thing for future programmers might be

[48:05] How to effectively collaborate with AI

[48:07] Mm-hmm, like

[48:08] There are many things that AI might do

[48:11] Not that well

[48:12] Like how to

[48:13] Reasonably design an implementation plan for something

[48:17] And how to design it

[48:19] So that it might align with the company's

[48:20] Future development

[48:22] Those kinds of things

[48:23] You might have a hard time telling a model

[48:25] To make it understand these things

[48:26] Those things might still need humans to do

[48:29] But maybe things like specific

[48:30] Very specific

[48:31] Like the work many programmers did in the past

[48:33] Where your manager tells you to implement this plan

[48:37] And give it to me by next Friday

[48:39] I think that kind of work

[48:40] Might not exist in the future

[48:42] Then what kind of programmers would be in that 1/1000

[48:45] What are their traits

[48:46] 1/1000 is just a figurative number

[48:47] I really don't know if it would be 1/1000

[48:48] Or 1/10,000

[48:49] Or 1/100,000

[48:50] Or maybe 1%

[48:51] Don't be so pessimistic

[48:54] I'm a famous pessimist

[48:55] So don't take it too seriously

[48:58] And

[49:00] I think

[49:01] Good programmers in the future

[49:02] First, technically speaking

[49:05] They will definitely be very strong

[49:07] Because if you're technically weak

[49:09] There's no reason

[49:10] Why AI can't replace you

[49:11] But being technically strong might not be the only thing

[49:13] It won't be a necessary condition

[49:14] It might be a sufficient condition

[49:16] Another thing I think will be very important

[49:18] is that you have to understand how your part of the work

[49:22] fits into a large organization or a big company

[49:25] how to

[49:27] how to adapt and integrate into it (Mm-hmm)

[49:29] This might also be an important thing

[49:31] Mm-hmm, and

[49:31] And of course there might be many other things

[49:34] For example

[49:35] whether this person's planning ability is strong enough

[49:38] If their planning ability is strong

[49:39] they can definitely take this big

[49:41] very complex thing

[49:42] and break it down into many relatively smaller things

[49:44] and hand them over to different AIs to do

[49:46] But right now these three abilities seem important

[49:51] Things that AI might not be able to fully do yet

[49:53] doesn't mean it won't be able to in six months

[49:54] Maybe in six months you come ask me

[49:56] I find that the last thing AI can already do

[49:58] Then only two things remain

[49:59] Another six months later

[50:00] Maybe the remaining two can also be done

[50:01] Then maybe my answer would become more pessimistic

[50:04] So

[50:04] No one can predict what will happen in six months

[50:06] I can only speak from the current perspective

[50:10] That past Spring Festival

[50:11] Another thing many people paid attention to was Seedance

[50:13] Will Seedance make Google anxious

[50:15] I think actually

[50:19] Possibly yes

[50:20] But this anxiety

[50:22] Hasn't reached me yet

[50:24] Maybe it gives the Google DeepMind

[50:27] team responsible for multimodal generation

[50:29] some pressure

[50:31] But if you ask me

[50:35] I think

[50:36] I might not think they have much to be anxious about

[50:39] Like I think

[50:40] It doesn't reflect any paradigm shift

[50:43] More importantly, I think ByteDance

[50:45] whether it's the product effect

[50:48] or possibly in terms of data and such

[50:51] These details are done very very well

[50:53] I think indeed

[50:56] ByteDance has historically had

[50:57] a relatively strong advantage in multimodal generation

[51:01] But I think at least personally

[51:02] I haven't experienced

[51:03] that it's a paradigm shift

[51:06] Then maybe

[51:08] It's not enough to make everyone very anxious

[51:11] Right but there is definitely pressure

[51:14] Right

[51:14] Does Seedance's product capability come from model capability

[51:17] Or product capability

[51:18] I haven't worked

[51:21] At ByteDance

[51:22] So I don't

[51:23] know the specific details

[51:24] But if you ask me to guess

[51:25] I think the model probably accounts for the majority

[51:27] Mm-hmm

[51:29] What does good model capability come from

[51:31] Comes from data

[51:32] Because there probably isn't fundamental innovation in algorithms

[51:34] I think algorithms

[51:36] First of all because multimodal belongs to

[51:37] what we just said, still belongs to that

[51:38] scientific problem

[51:39] Multimodal generation belongs to scientific problems

[51:41] Right, multimodal generation

[51:42] Still belongs to a relatively scientific problem (Has multimodal understanding been solved)

[51:45] Compared to generation it's definitely more systematic

[51:48] Has a more systematic understanding

[51:50] But compared to text tokens

[51:54] Definitely still not that

[51:58] The paradigm isn't that fixed yet

[51:59] I think in generation it might be

[52:01] Because it's still something

[52:03] where the paradigm hasn't been fixed

[52:04] Maybe each company uses somewhat different techniques

[52:07] big or small differences

[52:09] And um right now we can mostly just see

[52:13] In terms of effects

[52:14] Maybe ByteDance and Google DeepMind are

[52:17] In terms of effects

[52:17] The two that do it better

[52:19] Mm-hmm, so it might also come from details

[52:21] Done better

[52:22] Right if you ask me to guess

[52:24] I would guess data

[52:26] Data

[52:26] If you ask me to guess I'd guess data but

[52:29] I haven't worked at ByteDance either

[52:30] So I'm just guessing blindly haha (Mm-hmm)

[52:34] What do you think about Wu Yonghui going from Google to ByteDance (ByteDance large model team Seed lead)

[52:38] Who am I to judge haha

[52:39] To evaluate Yonghui I think I think

[52:43] Of course, I haven't worked with

[52:45] Yonghui in the past,

[52:46] so

[52:46] actually I can't really give a very good assessment

[52:50] or an objective evaluation

[52:51] But I think after I joined Gemini,

[52:55] I saw more of Yonghui's good side

[52:57] I think, by looking at him,

[53:01] sneaking a peek at his past code commits

[53:04] and the projects he's led,

[53:05] my feeling is that he's one of the few people I've met at such a high level

[53:10] and also very senior

[53:12] yet still has very strong technical skills

[53:17] I think that's extremely rare

[53:19] So I think

[53:21] I'm probably not yet at the level to evaluate Yonghui

[53:26] at that level

[53:27] But if you ask me

[53:28] I think Yonghui is extremely strong

[53:30] You say, taking a snapshot in Q1 2026

[53:33] Do you think the capability gap between Chinese and US models

[53:36] is widening or narrowing?

[53:38] How far apart are they?

[53:40] I think

[53:40] Um

[53:42] If we take a snapshot right now

[53:44] and look at the development trends over the past year

[53:49] or the past year and a half

[53:51] Obviously

[53:51] the gap between China and the US is getting smaller and smaller

[53:55] But whether this gap will eventually close completely

[53:58] or even if China surpasses the US

[54:00] I think that's an open question

[54:04] I think for Chinese AI researchers

[54:08] and research institutions, it's also an opportunity

[54:11] And

[54:14] I think one very real thing is

[54:16] that

[54:17] China is indeed at a significant disadvantage in terms of actual compute resources

[54:20] It's at a big disadvantage

[54:23] But this significant disadvantage

[54:25] may have actually forced out some interesting things

[54:28] For example, Chinese model companies

[54:29] are actually quite good at distilling from others

[54:34] Right

[54:34] Recently Dario (Anthropic Co-founder and CEO) called out three companies for distilling from them

[54:39] I think distillation itself

[54:42] is actually an open secret

[54:47] But I think there are different ways to approach distillation

[54:51] There's brute-force distillation and smart distillation

[54:55] two different approaches

[54:58] Um

[54:59] What do you mean by brute-force distillation?

[55:01] To give the simplest example of brute-force distillation:

[55:04] It's

[55:05] taking a bunch of tokens generated by Claude

[55:10] and forcibly training on them

[55:15] If you do something like this

[55:16] I feel

[55:19] First, it's not very ethical from a business standpoint

[55:22] And intellectually, it's rather foolish

[55:26] Because the companies doing this

[55:29] essentially

[55:30] demonstrate one thing

[55:32] they don't even know what they want to do

[55:35] The only thing they can do is copy others

[55:37] and make their model

[55:38] look a bit better on the benchmarks

[55:40] Right, but essentially it shows that

[55:41] they don't even know what they should be doing

[55:43] That's brute-force distillation

[55:44] But

[55:45] actually, distillation also involves some very interesting scientific questions

[55:49] For example, is there a possibility that

[55:53] Just a random example

[55:54] Like, could it be that

[55:55] in my process of generating

[55:56] my own training data pipeline

[55:58] I use other models as assistants

[56:01] Or the answers generated by my own model

[56:04] use other models as their evaluators

[56:08] This is actually, I think, commercially

[56:11] a bit of a gray area

[56:13] But from a technical perspective, it's quite interesting

[56:16] Because if you think about it, in a sense

[56:20] Chinese labs may have become

[56:22] pioneers in Multi-Agent (multi-agent) training

[56:25] Oh

[56:26] And it's true Multi-Agent

[56:29] Because if they use models from different companies

[56:32] with these smarter approaches

[56:33] and integrate them into a single training system

[56:36] each model's distribution might be very different

[56:39] The distribution of their language is very different

[56:42] This is true Multi-Agent

[56:45] It might be more so than

[56:47] for example, using several Geminis together

[56:50] It's something more technically interesting

[56:52] So I think, for me, the distillation of intelligence

[56:57] I don't know, commercially

[56:58] whether it'll end up being clearly wrong

[57:01] or clearly right

[57:02] But technically it's actually quite interesting

[57:05] Which companies are you referring to with these two types of distillation?

[57:08] Can we bleep out the names in post-production?

[57:10] (Sure) Hahahaha

[57:12] First of all, I haven't worked in a Chinese lab（实验室）

[57:15] So I don't know exactly who

[57:17] But my feeling is

[57:19] XXX

[57:19] probably used hard distillation

[57:22] And XXX might have done hard distillation before

[57:25] But later they probably gradually tried

[57:27] to shift toward soft distillation

[57:29] I think it's fairly obvious

[57:31] The one that probably distills less is ByteDance

[57:34] I feel like ByteDance's model

[57:36] is still quite distinctive

[57:39] Hmm, what makes it distinctive?

[57:41] For example, this model

[57:42] How smart would you say it is?

[57:44] I think

[57:45] Doubao is definitely not as smart as Gemini or Claude

[57:50] But first of all, Doubao

[57:52] For example, Doubao's voice generation is extremely good

[57:55] Wait, is that difficult?

[57:56] Technically, Doubao is indeed the best at it

[57:59] Because I find that for life questions

[58:00] I just want to ask Doubao

[58:01] Because it's so fast

[58:02] But other models

[58:03] Why don't they optimize this product feature?

[58:05] I think it still has to do with their user base

[58:09] In the US

[58:09] I think people are more focused on

[58:15] how to improve work efficiency

[58:19] Don't you have life questions?

[58:22] I do in my life

[58:24] First of all

[58:25] I personally

[58:25] am indeed pretty boring in my personal life

[58:27] So I don't have many interesting life dilemmas

[58:28] to ask Doubao

[58:30] The questions I have more often in life

[58:31] are all technical ones

[58:32] Asking a smart model like Gemini is the best

[58:34] Hahahaha

[58:37] Right I don't have this urge to open Doubao at midnight

[58:39] for late-night emotional support

[58:41] It's not just emotions, but many things

[58:43] Like when you're cooking

[58:44] Hmm You might run into some problem

[58:46] You might need someone to tell you right away

[58:50] But you don't have such a person

[58:52] Hmm, those

[58:53] I think it's probably more of a data issue

[58:54] And probably for US companies

[58:57] the main priority right now is intelligence

[59:02] or work efficiency

[59:03] Someday in the future

[59:05] Will it become these daily matters?

[59:07] I think it's possible

[59:08] The fact is

[59:10] If you ask about these daily topics

[59:11] actually

[59:13] you'll find that Gemini

[59:13] from generation to generation

[59:15] does better and better

[59:17] Hmm

[59:17] Actually, many of my friends

[59:19] including myself in the past

[59:20] When I was at Anthropic before

[59:22] I might ask Claude to write code

[59:24] But for daily lookups

[59:26] I would ask Gemini, right

[59:27] Have you used Doubao?

[59:30] I've actually only used it once or twice

[59:32] I noticed you guys don't really use it much

[59:34] Hmm, first of all

[59:35] Is it a pecking order thing?

[59:36] (There's an intelligence pecking order) Hahaha, no no, not that serious

[59:40] I just think first of all

[59:43] It's like people in China trying to use American models

[59:46] There are some complicated things involved

[59:48] Oh

[59:49] Me using Chinese models in the US

[59:50] is actually quite complicated too

[59:53] Second, I simply don't have the motivation for it

[59:57] Especially since I think in my life

[01:00:02] Work is work

[01:00:03] When I'm relaxing, I just find different work to do

[01:00:06] So for me

[01:00:07] My best companions are Claude and Gemini

[01:00:10] But it might not be like that for others

[01:00:13] So it might just be my personal thing

[01:00:15] The one or two times I used Doubao myself

[01:00:18] It was because someone showed me the Doubao phone

[01:00:21] Hahahaha, right

[01:00:23] So what do you think of the Doubao phone?

[01:00:25] I think it's a great idea

[01:00:29] Personally, in terms of results

[01:00:31] They actually did a pretty good job

[01:00:33] Of course, what I don't know is

[01:00:36] Technically, how well optimized it is

[01:00:39] I mean, it

[01:00:39] I think it executes some tasks in real time

[01:00:43] From a results perspective, there's no problem

[01:00:44] But I don't know how much overhead it has

[01:00:47] If that overhead is very, very large

[01:00:49] Then it's probably a technical issue that needs to be solved. Mm-hmm.

[01:00:51] Because you don't want, you know

[01:00:53] Your model to book a high-speed train ticket for you

[01:00:57] And end up costing more than the ticket itself

[01:00:59] That would definitely be unacceptable

[01:01:02] Right, so

[01:01:03] Technically speaking

[01:01:05] I personally don't know how mature it is

[01:01:08] And from a product perspective

[01:01:10] For everyone, it's still quite

[01:01:12] Can't say surprising

[01:01:12] But it's something that gets people pretty excited

[01:01:14] And I think

[01:01:15] Apple probably wanted to do something like this before

[01:01:17] It's just that Apple's own models haven't been that great

[01:01:20] Apple doesn't seem to care much about its AI strategy

[01:01:23] Now, I think

[01:01:26] Apple definitely cares about AI strategy

[01:01:29] Because Siri, the phone assistant

[01:01:33] Was in Apple's product launches

[01:01:36] A very, very important highlight

[01:01:38] But their own models didn't catch up

[01:01:41] Now they might be trying to do this through a partnership with Gemini

[01:01:46] To try to make it happen

[01:01:48] As for whether they care about it now

[01:01:49] First of all, I don't know

[01:01:50] If you ask me to guess, I'd definitely say they care

[01:01:52] But if you ask me to explain

[01:01:53] Why from the outside it doesn't look like they care that much

[01:01:55] My only guess is that

[01:01:56] If from the outside it looks like you care a lot

[01:01:58] And you still can't pull it off

[01:01:59] Then you just look stupid

[01:02:00] Ah

[01:02:02] Saving face

[01:02:03] Ah, right, hahaha (I don't care)

[01:02:05] Then let's talk about Doubao's model

[01:02:07] You just said Doubao's model is quite distinctive

[01:02:10] Can you be more specific?

[01:02:12] One is that its voice is really well done

[01:02:13] That's the first point

[01:02:13] I think the voice is really well done

[01:02:14] It's the most distinctive thing I can feel

[01:02:17] I mean, I think the voice quality might be

[01:02:24] To put it politely, probably one of the best in the world

[01:02:26] To put it bluntly

[01:02:27] I think it's simply the best in the world

[01:02:28] Mm. Is that hard?

[01:02:30] Mm

[01:02:32] I haven't gotten to that level myself

[01:02:33] So I don't know if it's hard or not

[01:02:35] But I think it might be something that takes a lot of effort

[01:02:38] Whether in terms of data or various optimizations

[01:02:39] Is it a product thing or a model thing?

[01:02:41] It has to be a model thing

[01:02:42] It might also include some product aspects

[01:02:44] But it's definitely a model thing

[01:02:46] Right. And then

[01:02:48] I think that's one aspect

[01:02:48] And on the other hand

[01:02:50] On the other hand, I don't have that much personal experience

[01:02:52] Because I haven't actually used it that much

[01:02:54] So it's probably more from

[01:02:55] Feedback from friends and family

[01:02:56] That is Hey, this Doubao model is just fun to talk

[01:03:00] It's just fun to chat with

[01:03:01] Haha, right

[01:03:02] But I think that

[01:03:03] Is more of some subjective feedback

[01:03:07] I think one is the voice

[01:03:10] And another is that it

[01:03:11] Generates very fast, which is also very important

[01:03:13] Because many models

[01:03:14] Are showing you their chain of thought

[01:03:16] But I'm talking about trivial things in your daily life

[01:03:18] I don't want to see its chain of thought

[01:03:20] Right. I don't think this is technically difficult

[01:03:21] It's just that maybe

[01:03:22] People haven't spent more time on it yet

[01:03:25] On this

[01:03:25] And the fact is

[01:03:26] If you try Gemini 2.5 Pro and Gemini 2.5 Flash

[01:03:31] You'll find

[01:03:32] Gemini 2.5 Flash

[01:03:32] When completing the same problem

[01:03:35] It's already much faster than before

[01:03:37] And much less fluff

[01:03:39] So I don't think this is a

[01:03:42] Mm-hmm, in my view it's not a technical difficulty

[01:03:44] It's more about when to pay attention to it

[01:03:46] And do something about it

[01:03:49] I think maybe it's now

[01:03:51] Right now these American companies

[01:03:53] Are all still in the stage of

[01:03:54] Working hard to push the upper limits of intelligence forward

[01:03:59] And ByteDance

[01:04:00] Of course it's also pushing the upper limits

[01:04:02] But I think

[01:04:03] It might just be doing very well in user optimization too

[01:04:05] Also doing quite well

[01:04:08] Recently there's another topic

[01:04:09] That Chinese robots are very hot right now

[01:04:11] At the Spring Festival Gala

[01:04:13] I don't know if you have any observations about this

[01:04:17] I've watched some performances

[01:04:18] Also searched for some prices on Amazon

[01:04:20] I was really surprised they're so cheap

[01:04:22] Haha, did you buy one

[01:04:24] No, haha

[01:04:25] I wouldn't have any use for it even if I bought one

[01:04:26] But indeed I used to

[01:04:29] I don't know, in my mind I thought humanoid robots

[01:04:31] And

[01:04:32] Of course at the software level there's nothing really

[01:04:34] But mainly hardware

[01:04:34] I thought for hardware to be this mature

[01:04:37] It would probably cost something like

[01:04:39] Several million dollars or something

[01:04:40] But it seems when I checked

[01:04:42] The price is much cheaper than that

[01:04:44] I think this still reflects

[01:04:46] China's hardware industry chain

[01:04:48] Still has a lot of advantages

[01:04:50] But I

[01:04:51] Don't really know if it

[01:04:52] As a

[01:04:54] As a robot

[01:04:55] In terms of hardware

[01:04:55] I think it's indeed very very strong

[01:04:57] And from the software perspective

[01:05:00] I haven't quite figured it out

[01:05:01] I think robot models

[01:05:04] Are also something with relatively large disagreement right now

[01:05:08] Right

[01:05:08] What do you mean

[01:05:09] What I mean is

[01:05:11] I think robot models are probably more in the

[01:05:14] Feature engineering era

[01:05:17] Like you have a given environment

[01:05:20] A given scenario

[01:05:21] You optimize for that scenario

[01:05:23] People know how to do that

[01:05:24] Mm-hmm but doing RL

[01:05:26] Doing reinforcement learning

[01:05:28] Building appropriate virtual environments

[01:05:29] Still virtual

[01:05:30] This kind of

[01:05:31] This kind of data

[01:05:32] Then you do training

[01:05:33] Can improve

[01:05:35] But it doesn't have strong generalization

[01:05:38] I think this is

[01:05:40] Whether there is generalization

[01:05:41] Is actually a watershed for many AI directions

[01:05:45] A deterministic scenario

[01:05:49] A very single scenario

[01:05:50] Can you do this well

[01:05:51] This wasn't solved just in recent years

[01:05:54] It could be done more than ten years ago

[01:05:56] Like language is also language

[01:05:58] In this era before Transformer-like architectures

[01:06:02] It wasn't completely impossible

[01:06:04] Right, back then

[01:06:05] You could also train very strong models to do translation

[01:06:06] Mm-hmm

[01:06:08] You could train a very strong model

[01:06:09] To do semantic analysis

[01:06:10] But what you couldn't do is

[01:06:12] I can improve all abilities across the board

[01:06:14] By improving at one level

[01:06:16] Mm-hmm

[01:06:17] I think this is a watershed

[01:06:18] And I think language models

[01:06:22] After Transformer and GPT

[01:06:24] Entered that kind of stage

[01:06:26] Crossed a threshold

[01:06:27] Where you can improve all abilities by improving at one level

[01:06:28] And you might train at one point

[01:06:31] It will abstract this ability

[01:06:33] And generalize it to all related things

[01:06:35] But I think robots haven't reached that stage

[01:06:39] More still before that stage

[01:06:41] Where I have a single scenario

[01:06:43] A single thing

[01:06:46] Then I can optimize for that

[01:06:50] So what do you think

[01:06:51] About these robotics teams in Silicon Valley

[01:06:53] And there are also a lot of robotics people inside Gemini

[01:06:55] Mm

[01:06:56] What do you think

[01:06:56] That direction is a bit...

[01:06:59] What would you call it

[01:07:00] Is it a sub-direction of yours

[01:07:01] Or a parallel direction

[01:07:03] Or what

[01:07:04] I think

[01:07:05] In the past, it was quite a parallel direction

[01:07:07] But now, for robotics

[01:07:09] I think people are also trying

[01:07:10] To see if they can leverage language models

[01:07:13] As a base model

[01:07:14] And then train something like

[01:07:16] For example, VLA (Vision-Language-Action model)

[01:07:17] Especially multimodal models

[01:07:18] Right, right, right, and um

[01:07:22] So now

[01:07:23] It has become something closely related

[01:07:26] To the language model track

[01:07:27] Mm

[01:07:28] And personally, my feeling is

[01:07:32] They will become very important in the future

[01:07:34] But they haven't found their own path yet

[01:07:40] But what they're doing is really interesting

[01:07:43] I highly recommend everyone go check out

[01:07:44] Robotics labs

[01:07:46] They're way more interesting than language model labs

[01:07:48] Language model labs

[01:07:50] Feel like normal offices

[01:07:51] But robotics labs, they really

[01:07:53] Have people controlling these robots

[01:07:55] Collecting all kinds of data

[01:07:56] And watching the robot in like

[01:07:59] Shelves picking up all sorts of items and stuff

[01:08:00] Doing things like that

[01:08:01] I think it's a very interesting thing

[01:08:03] Which one did you go to

[01:08:04] Ah, I went to

[01:08:05] Wait, Gemini's own lab

[01:08:07] No, not Gemini

[01:08:08] Google DeepMind's own lab

[01:08:10] I've been to see it

[01:08:11] And also that Dyna

[01:08:13] I've also been to see

[01:08:14] They have a clothes-folding robot

[01:08:16] Right, their scenario might be a bit more narrow

[01:08:18] Like folding clothes

[01:08:20] Is one robot, maybe doing some other things

[01:08:22] Like pouring water and stuff

[01:08:23] Right, like that

[01:08:24] Your intuitive feeling

[01:08:25] Where does robotics progress compare to in LLM years

[01:08:29] It hasn't reached the GPT-1 moment yet, right

[01:08:30] Definitely not

[01:08:31] I think it definitely hasn't, right

[01:08:33] Mm

[01:08:34] It's like everyone still hasn't

[01:08:36] Figured out how to scale up

[01:08:39] I think for me

[01:08:40] Whether it's robotics or multimodal generation

[01:08:43] Neither has reached that point

[01:08:45] Then let's get into today's main topic

[01:08:47] We're still very interested in you

[01:08:49] And chat about

[01:08:50] How you went from someone who studied physics

[01:08:53] Into the world of AI

[01:08:55] Mm

[01:08:56] Where did you grow up

[01:08:57] How did you grow up

[01:08:59] I

[01:09:00] I was born in Ningxia

[01:09:01] In a very, very small city

[01:09:04] Called Dawukou

[01:09:06] See, that confused expression of yours

[01:09:08] Already shows how small this city is

[01:09:10] Mm

[01:09:10] This city existed in the past because of a coal mine

[01:09:13] Also because of Shitanjing

[01:09:13] A coal mine

[01:09:14] And then this city came into being

[01:09:15] Right, so I was born there

[01:09:17] But I

[01:09:18] Went to Shanghai with my parents during elementary school

[01:09:21] And so

[01:09:21] The latter half of elementary school and my middle and high school were in Shanghai

[01:09:24] Then I went to Beijing for undergrad

[01:09:26] What I just mentioned

[01:09:28] Undergrad in Beijing

[01:09:29] Then PhD in the US

[01:09:31] Right

[01:09:31] You had good grades since you were young, right

[01:09:33] You got into university through physics competition

[01:09:35] And studied theoretical physics at Tsinghua and Stanford

[01:09:38] Right, I didn't get in through physics competition

[01:09:40] Hahaha

[01:09:41] I think I was quite mediocre when I was young

[01:09:43] Hahahaha

[01:09:44] Ah first of all

[01:09:46] The middle school and elementary school I attended were both nobodies

[01:09:51] Hahaha

[01:09:54] I think I

[01:09:55] The middle school I attended at that time, competitions

[01:10:00] Were not something you should consider

[01:10:02] It was that kind of middle school

[01:10:04] Called Shangnan Middle School East Campus

[01:10:06] Another school that makes everyone confused

[01:10:07] A school that leaves people baffled

[01:10:09] Okay, since we're here, which elementary school was it

[01:10:11] What was the elementary school called (Dezhou Second Village Elementary School)

[01:10:13] My context management ability is too strong

[01:10:16] I can't even remember what it's called actually

[01:10:18] Hahaha, mm-hmm, right

[01:10:21] And right

[01:10:22] It was that middle school

[01:10:23] It was um

[01:10:25] In a small environment within one class

[01:10:27] There were still some classmates who wanted to do things properly

[01:10:30] But overall

[01:10:32] I think that middle school was in a relatively laid-back state

[01:10:35] Right, and

[01:10:38] I think maybe my grades were okay (What do you mean by okay)

[01:10:42] Okay means at that time the situation was

[01:10:45] Shanghai high schools had so-called

[01:10:47] At that time there were so-called four top schools

[01:10:48] Like Shanghai High School

[01:10:50] Then Hua Er

[01:10:51] Jiao Tong and Fudan affiliated high schools

[01:10:52] Right And at that time the situation was I could get into these four schools

[01:10:56] But couldn't get into the best classes in these four schools

[01:10:59] But at that time I really wanted to do competitions

[01:11:01] Because I had never done competitions before

[01:11:02] You started competitions in middle school

[01:11:03] I didn't do competitions in middle school

[01:11:04] Oh, I never did competitions in middle school

[01:11:06] Why did you want to do competitions if you never did them

[01:11:07] Because I never did them

[01:11:08] So I wanted to do them

[01:11:08] How did you get that idea (Hahaha, that's just how I am)

[01:11:11] My personality is

[01:11:13] I always love doing things I'm not good at

[01:11:16] Hahahaha, right

[01:11:19] And at that time I hadn't done competitions

[01:11:22] But I knew about them

[01:11:24] So I felt that compulsory education

[01:11:27] Not compulsory education, but before going to college I should give it a try

[01:11:31] So but then

[01:11:32] My grades weren't good enough for that

[01:11:33] So

[01:11:35] Going to the four top schools, the best four schools

[01:11:37] I couldn't get into their competition classes

[01:11:39] Then I discovered there was a slightly worse school

[01:11:42] That school was Gezhi High School

[01:11:44] A slightly worse school

[01:11:45] But that school had a competition class

[01:11:47] And I felt this competition class

[01:11:51] In today's terms it's an underdog

[01:11:55] Hahahaha

[01:11:58] Impressive

[01:11:59] In the words of that time, I felt like the barefoot aren't afraid of those wearing shoes

[01:12:02] Hahahaha

[01:12:04] I think, mm-hmm

[01:12:06] Worth a shot

[01:12:08] So actually at that time, back then

[01:12:11] At that time

[01:12:12] Shanghai still had this so-called early admission system

[01:12:14] Where before the high school entrance exam

[01:12:15] You could sign a contract with a school

[01:12:17] And then you would reserve a spot at that school in advance

[01:12:19] And then go directly there

[01:12:21] And then it was very natural to go

[01:12:23] And then go do competition high school

[01:12:25] So you were actually between the regular classes of Shanghai's four top schools

[01:12:28] And the competition class of Gezhi High School

[01:12:31] Without hesitation

[01:12:32] Chose Gezhi High School's competition class

[01:12:33] Of course I can't say

[01:12:35] I can't say that when I made the choice

[01:12:36] Getting into the best four high schools

[01:12:38] Was a sure thing

[01:12:39] Although my score was indeed enough later

[01:12:41] At that time the high school entrance exam hadn't happened yet

[01:12:43] Right right but at that time I felt

[01:12:45] Even if I could get in

[01:12:46] I should go to an underdog place and take a gamble

[01:12:50] Why

[01:12:52] Because I wanted to do this

[01:12:53] What was your purpose for wanting to do competitions

[01:12:55] I think the main thing at that time was wanting to experience it

[01:12:59] I felt I hadn't done it

[01:13:00] I had to find an opportunity to do it

[01:13:01] Why did you have to do it

[01:13:05] First, I felt it was indeed difficult

[01:13:07] Ah, it was indeed more

[01:13:08] There was just this excitement about difficulty

[01:13:10] Right

[01:13:11] It's indeed

[01:13:12] At least at that time

[01:13:13] Before I started

[01:13:14] The impression everyone gave me was

[01:13:16] That this thing was much more challenging

[01:13:19] Than the stuff you learn without doing competitions

[01:13:22] Mm-hmm

[01:13:23] The people who do this seem really strong

[01:13:25] If you don't do it you're just the smoothest stone

[01:13:28] Among all the mediocre rocks

[01:13:30] So at that time I felt I should do it

[01:13:32] So I went and did it

[01:13:33] Of course doing it actually brought some benefits

[01:13:36] Looking back later

[01:13:37] If I hadn't done competitions at that time

[01:13:38] I probably wouldn't have gotten into Tsinghua

[01:13:39] Oh, did you get bonus points or something

[01:13:42] At that time actually

[01:13:44] The competition direct admission system had already declined significantly

[01:13:48] Only those who made the national training team could get direct admission

[01:13:50] My high school

[01:13:51] Anyway I think

[01:13:52] I wasn't at the level of making the national training team

[01:13:54] So let's not talk about that

[01:13:56] But before taking the senior year competition exam

[01:13:59] By a twist of fate I went to Tsinghua for a summer camp

[01:14:03] And by a twist of fate on the last day of the summer camp

[01:14:06] I heard they were doing

[01:14:08] Independent enrollment

[01:14:10] But mainly aimed at Beijing students

[01:14:13] I frantically texted the admissions office teacher

[01:14:16] Saying I wanted to take the exam with them

[01:14:19] He agreed

[01:14:20] And then he agreed to let us take the exam

[01:14:22] You all or just you

[01:14:24] Just agreed

[01:14:24] Me And the few people from our high school who went together

[01:14:27] Those high school classmates from Shanghai who went to that summer camp

[01:14:31] Oh what reason did you use to convince him to text him

[01:14:34] I've forgotten the specifics of that text

[01:14:35] But the general idea of that text was

[01:14:37] You give Beijing students the exam

[01:14:39] Why not give Shanghai students the exam

[01:14:41] Oh, you were quite righteous about it

[01:14:42] Did you think they were playing favorites at that time

[01:14:46] I didn't think they were playing favorites

[01:14:47] I just felt they had this opportunity

[01:14:48] Why not give it to us

[01:14:50] Everyone's competing on the same playing field

[01:14:52] You were classmates at that time

[01:14:54] And so I sent this message

[01:14:56] And they actually let us take the exam

[01:14:58] How many people

[01:14:59] I can't quite remember

[01:15:01] Maybe from Shanghai

[01:15:03] There were probably about seven or eight people in that exam room

[01:15:06] You sent that text

[01:15:07] Maybe

[01:15:07] Maybe other high schools had other students who sent texts too

[01:15:10] But from our high school I was the one who sent it

[01:15:12] Oh so

[01:15:14] They were all Shanghai high schools

[01:15:15] Students who went to Beijing for that summer camp

[01:15:16] Students who attended the summer camp

[01:15:19] And then they let us take the exam

[01:15:21] And then we signed

[01:15:23] That easy to talk to

[01:15:25] Right, so what I learned from that incident

[01:15:28] The most important life lesson is

[01:15:31] Be bold

[01:15:32] Haha

[01:15:34] If you don't fight for it you'll never get it

[01:15:36] Even if you fight for it you might not get it

[01:15:37] But if you don't fight for it you definitely won't get it

[01:15:39] Were you nervous when you sent that text

[01:15:41] You were still in high school

[01:15:44] I can't remember anymore

[01:15:46] At that time I felt

[01:15:46] Was this a very bold thing for me

[01:15:49] No, at that time I was completely thinking

[01:15:52] I have to fight for it now

[01:15:53] If I don't fight for it today I won't be able to fight for it tomorrow haha

[01:15:56] Like

[01:15:57] The day I heard about it I immediately started frantically texting

[01:16:00] Frantically texting who

[01:16:01] Texting the admissions office

[01:16:02] That Tsinghua admissions office teacher

[01:16:03] Texting one person or multiple people

[01:16:05] Can't remember, probably one teacher

[01:16:07] Did he reply quickly

[01:16:09] Mm-hmm mm-hmm I think Tsinghua

[01:16:11] Just said yes

[01:16:12] I don't know if they discussed it among themselves

[01:16:14] But anyway in the end they said they agreed

[01:16:18] And then we took the exam together

[01:16:19] Right

[01:16:20] So I so I

[01:16:21] Why do I feel like

[01:16:22] I've always had quite a soft spot for Tsinghua

[01:16:23] I just feel

[01:16:23] that this school is willing to give people opportunities

[01:16:28] to provide equal opportunities for everyone

[01:16:31] How did you do on that exam?

[01:16:33] Well, when I came out, I felt like I totally bombed it

[01:16:37] Because I couldn't solve half a problem

[01:16:39] But later I found out others missed even more

[01:16:42] So I did get in after all

[01:16:43] Hahaha, yeah, exactly

[01:16:46] How many of your Shanghai classmates got in that year?

[01:16:49] Ah, I think two

[01:16:51] Independent recruitment

[01:16:52] Was it a score reduction or something?

[01:16:53] It lowered the cutoff to the first-tier university line

[01:16:54] Lowered to the first-tier line

[01:16:55] Oh

[01:16:57] So how did you do on the gaokao?

[01:16:59] Later, sure enough, my gaokao wasn't high enough for Tsinghua

[01:17:02] But I could get into any school except Tsinghua and Peking University

[01:17:06] Oh

[01:17:08] So why

[01:17:09] Online it says you were recommended for admission

[01:17:12] I think it's just that people

[01:17:14] who didn't go to school during those years find it hard

[01:17:17] hard to really understand what happened back then

[01:17:18] Because two cohorts before mine

[01:17:21] you could still get recommended admission with a provincial first prize

[01:17:24] A provincial first prize got you recommended admission

[01:17:27] What about your time?

[01:17:28] In our time, with a provincial first prize

[01:17:30] you made the provincial team

[01:17:31] then represented the provincial team at the national competition

[01:17:34] and only by making the national training team could you get recommended admission

[01:17:36] I made the provincial team and went to the national competition

[01:17:38] But I didn't make the national training team

[01:17:40] Right So in my year, I didn't have a recommended admission slot

[01:17:43] Oh

[01:17:44] Were you good at competitions?

[01:17:47] I think I was pretty mediocre

[01:17:49] Like

[01:17:50] Isn't not being the best basically the same as being mediocre?

[01:17:53] And I obviously wasn't the best

[01:17:54] So I was just mediocre

[01:17:58] What was your family's attitude toward you doing competitions?

[01:18:00] What was their attitude?

[01:18:03] The best thing about my parents is

[01:18:06] they didn't really interfere much

[01:18:07] They may have tried to control me at some point

[01:18:09] but later found they couldn't

[01:18:10] Oh, how so?

[01:18:11] I just didn't listen to them

[01:18:12] Oh

[01:18:14] I think most Chinese families

[01:18:20] it's already considered pretty good when kids discuss things with their parents

[01:18:23] I usually just informed them

[01:18:25] Haha, informed them of what?

[01:18:27] Informed them, oh, I'm going to the independent recruitment exam

[01:18:30] Yeah, and

[01:18:32] Including filling out applications for high school and college

[01:18:35] My parents might not have even seen my application forms

[01:18:38] Oh, they're pretty laid-back, huh?

[01:18:41] I think they just

[01:18:47] when you can't understand what someone is doing

[01:18:49] the best thing is to not meddle

[01:18:51] I think my parents understood this very well

[01:18:53] Yeah, hahaha

[01:18:57] So you're pretty rebellious, huh?

[01:19:00] I think I am

[01:19:03] Pretty

[01:19:05] My personality is

[01:19:07] I really care about what I want to do

[01:19:09] If it's something I've figured out I want to do

[01:19:12] Don't try to stop me

[01:19:14] And I'll definitely do my absolute best

[01:19:18] But if it's something I don't want to do

[01:19:19] Forcing me won't help, I won't do it. Right

[01:19:22] Are you very competitive?

[01:19:24] Pretty strong

[01:19:26] Yeah, but I think I'm more competing with myself

[01:19:29] pushing myself, I guess

[01:19:31] Not really willing to compete with others

[01:19:34] Oh, right

[01:19:35] Of course, if

[01:19:36] well

[01:19:37] it's something I think is important

[01:19:39] and you also think it's important

[01:19:40] then I definitely have to outdo you, hehe

[01:19:44] So then you got to Tsinghua, that was even more amazing

[01:19:47] You studied quantum physics, why?

[01:19:49] Yeah, I was doing condensed matter theory at the time

[01:19:53] Why did you choose this major?

[01:19:56] A twist of fate

[01:19:57] Looking back now

[01:19:59] Of course I can

[01:20:00] come up with some very reasonable-sounding explanations

[01:20:04] But honestly, going back to that time

[01:20:06] I think it was just a twist of fate

[01:20:08] So at that time we were in the Jixian class

[01:20:11] And the Jixian class had a very good tradition

[01:20:13] First of all, although the Jixian class was in the physics department

[01:20:15] It didn't restrict what students could do

[01:20:17] Actually 2/3 of the students in the Jixian class wouldn't do physics

[01:20:20] Ah

[01:20:20] And for

[01:20:21] Why did you enter this class

[01:20:23] Uh

[01:20:24] At that time the entire Tsinghua physics department was Jixian class

[01:20:27] Maybe not anymore now

[01:20:28] Anyway it was at that time

[01:20:29] And another good tradition it had was

[01:20:29] It encouraged students to learn through practice

[01:20:33] So it encouraged students

[01:20:33] To enter research labs as early as possible

[01:20:37] And learn through research

[01:20:40] And at that time I really wanted to do theory

[01:20:46] Was it because you found it difficult

[01:20:48] It feels like you have a fascination with difficulty

[01:20:52] Maybe it's also a kind of illness

[01:20:54] I can talk more about this later

[01:20:56] What are the bad consequences of this illness

[01:20:57] Hahaha

[01:20:58] Right and then then right

[01:21:00] Then I wanted to do theory

[01:21:01] And of course the Jixian class

[01:21:04] Or what we call the Xuetang class

[01:21:05] Had a smaller class

[01:21:06] And then the

[01:21:07] Teacher recommended saying hey

[01:21:09] The Institute for Advanced Study is a great place

[01:21:10] Tsinghua Institute for Advanced Study

[01:21:11] The research institute founded by Mr. Chen-Ning Yang

[01:21:13] Is a great place

[01:21:14] So I went there to find a teacher

[01:21:16] And there happened to be

[01:21:19] A teacher who was still young at that time called

[01:21:21] Called Wang Zhong, he was my undergraduate teacher

[01:21:22] Mm-hmm, at that time he didn't have many students either

[01:21:24] And we chatted

[01:21:27] Of course I knew nothing

[01:21:28] But he was quite patient

[01:21:29] And gave me

[01:21:30] Gave me some papers to read

[01:21:32] And after reading I discussed with him

[01:21:34] Later I discovered condensed matter theory

[01:21:36] Especially the project we were doing at that time

[01:21:37] Was related to topological insulators

[01:21:39] And these kinds of directions

[01:21:42] Actually

[01:21:44] Was a direction very suitable for undergraduates to get started with

[01:21:47] It didn't require too much background knowledge

[01:21:50] You only needed to know

[01:21:52] The most basic thing is you need to know quantum mechanics

[01:21:54] Statistical mechanics

[01:21:55] Solid state physics

[01:21:56] Which are actually very very easy to learn

[01:21:59] Basic knowledge

[01:22:00] But it might really test

[01:22:01] The depth of your understanding of this knowledge

[01:22:03] So for undergraduates

[01:22:05] It's actually a particularly good direction

[01:22:07] Where you can get started quickly

[01:22:09] And do some actual projects

[01:22:10] And then we did some work together

[01:22:13] Among which possibly

[01:22:14] The work in open quantum systems

[01:22:17] Looking back now is still quite important work

[01:22:20] Right and then

[01:22:23] In a sense

[01:22:24] I think looking back now

[01:22:27] Doing that work

[01:22:28] Doing research during that period

[01:22:30] Is actually very very similar to doing AI now

[01:22:32] It's more that you have an idea

[01:22:34] You have an understanding

[01:22:36] And at that stage you can

[01:22:38] You can do a numerical experiment

[01:22:39] To verify whether your idea and understanding are correct

[01:22:42] You find AI is actually the same

[01:22:44] AI is also you have an idea

[01:22:45] You have an understanding

[01:22:46] You design some experiments

[01:22:48] To verify whether your understanding is correct

[01:22:49] And then you design some model

[01:22:52] Training pipeline

[01:22:53] To implement your ideas

[01:22:56] Right so actually these two are very similar

[01:23:00] Can you talk about your non-Hermitian system research

[01:23:04] Ah, I can talk about it

[01:23:05] I'll try to speak in human terms

[01:23:07] But it's also possible I'll actually be talking nonsense

[01:23:09] So those who don't want to listen can skip ahead

[01:23:12] Hahahaha

[01:23:14] Slide the progress bar

[01:23:16] You can set two markers on the progress bar

[01:23:18] Right and then right

[01:23:19] Non-Hermitian systems are like this

[01:23:22] One of the most basic assumptions of quantum mechanics is

[01:23:26] An isolated system

[01:23:27] Its evolution is described by unitary evolution

[01:23:32] Unitary evolution is kind of nonsense

[01:23:33] Sorry

[01:23:34] What unitary evolution means is

[01:23:35] It's a linear process

[01:23:37] And this linear process

[01:23:40] Can be described by an operator

[01:23:42] Called the Hamiltonian

[01:23:44] Ah, the Hamiltonian, in a certain sense

[01:23:47] It's somewhat like the energy of the system

[01:23:48] But not exactly

[01:23:49] It's somewhat analogous to

[01:23:50] So It determines how the system evolves over time

[01:23:53] And if it's

[01:23:54] An isolated system

[01:23:55] This Hamiltonian will be a Hermitian matrix

[01:23:57] A Hermitian matrix is one where you transpose it

[01:23:59] And then take the complex conjugate

[01:24:00] And it's the same as the original

[01:24:02] But real systems

[01:24:04] The vast majority are not isolated systems

[01:24:07] For example, you

[01:24:08] Me, as a human being

[01:24:09] Definitely have to exchange information with the outside world

[01:24:11] And exchange matter

[01:24:12] Materials are the same

[01:24:14] If you put a piece of material there

[01:24:17] Unless you create an extremely high vacuum

[01:24:19] You always have to interact with the substrate

[01:24:21] You have to exchange with the external environment

[01:24:23] So real systems

[01:24:24] Are mostly not isolated systems

[01:24:25] And isolated systems

[01:24:26] Won't be described by a unitary process

[01:24:29] And the corresponding Hamiltonian

[01:24:30] Won't be Hermitian either

[01:24:31] Hamiltonian

[01:24:32] That's where the term 'non-Hermitian' comes from

[01:24:34] It's essentially for studying open quantum systems

[01:24:36] Quantum systems that exchange with the outside world

[01:24:38] Their behavior

[01:24:39] And at that time, something very puzzling was discovered

[01:24:43] We were initially trying to study

[01:24:45] Some topological phenomena in these open quantum systems

[01:24:48] And then we found

[01:24:50] The theoretical results from hand calculations

[01:24:52] Just couldn't match the numerical results no matter what

[01:24:57] More precisely

[01:24:58] The hand calculation result

[01:24:59] Assumed the system

[01:25:00] Had periodic boundary conditions

[01:25:01] For example, on a ring

[01:25:02] Or on the surface of a torus

[01:25:04] And numerically

[01:25:07] Because it's closer to the actual situation

[01:25:08] It would calculate with open boundaries

[01:25:11] For example, the behavior of a material in a square shape

[01:25:13] And these two results just couldn't be reconciled

[01:25:15] So we tried to understand this

[01:25:16] And later found

[01:25:18] The basic paradigm people used to describe Hermitian systems

[01:25:20] A fundamental paradigm

[01:25:24] Is the so-called Bloch wave

[01:25:26] Which assumes the eigenstates of the system are

[01:25:28] Linear combinations of waves

[01:25:31] This

[01:25:32] Sine and cosine waves, that kind of thing

[01:25:33] Linear combinations of such waves

[01:25:34] This assumption

[01:25:38] In non-Hermitian systems, it actually

[01:25:43] breaks down — it becomes wrong

[01:25:45] The fact is

[01:25:46] Later we found

[01:25:47] In non-Hermitian systems

[01:25:48] Actually, the energy eigenstates

[01:25:50] All

[01:25:51] Can potentially accumulate at one edge of the system

[01:25:53] Right, and then we systematically established this

[01:25:55] Set of descriptive methods

[01:25:57] And then built a framework

[01:26:00] To describe a non-Hermitian system with open boundaries

[01:26:03] How to describe its eigenstates

[01:26:05] And thereby describe its time evolution

[01:26:07] And some dynamics

[01:26:09] So

[01:26:10] That was the work at that time

[01:26:12] And later there was a lot of

[01:26:15] Because it was actually a

[01:26:17] A paradigm shift

[01:26:18] So later there was a lot of

[01:26:20] Follow-up work

[01:26:21] But later I actually switched directions

[01:26:22] So I didn't continue much in this direction

[01:26:25] Why didn't you continue with it

[01:26:28] It's hard to catch a paradigm shift, isn't it

[01:26:31] It's hard to catch a paradigm shift

[01:26:33] Yes, yes

[01:26:35] This is the weakness of human nature

[01:26:37] I feel like

[01:26:38] I always love challenging myself with things I don't know

[01:26:40] Hahaha especially at that time

[01:26:42] Just

[01:26:43] I don't know what I was feeling in that direction

[01:26:46] Maybe looking back at that work a few years later

[01:26:49] It would become the most important work in that direction

[01:26:52] Later when you do some more work

[01:26:53] It might indeed make you more famous

[01:26:55] Get more citations

[01:26:56] Write more good journal articles

[01:26:58] Find a good faculty position

[01:26:59] But it feels like for a scientific career

[01:27:03] It wouldn't be that exciting

[01:27:06] So at that time I wanted to switch to something else

[01:27:08] Switch to something I wasn't good at

[01:27:08] Do it right

[01:27:10] And then

[01:27:10] So when doing my PhD I switched directions

[01:27:12] To do high energy theory

[01:27:14] High energy theory, right

[01:27:15] High energy physics, right

[01:27:16] So your undergraduate and PhD were also different

[01:27:18] Also different

[01:27:20] It's not just jumping from physics to AI

[01:27:22] Actually your undergraduate and PhD both look like physics

[01:27:25] But the directions had already changed significantly

[01:27:26] Right, two directions with almost no connection

[01:27:28] Oh, that's quite amazing

[01:27:30] Including your choice of competitions

[01:27:32] Going to Gezhi High School was also quite amazing

[01:27:35] Right

[01:27:36] What kind of human nature is this

[01:27:38] I think it's just

[01:27:40] To put it badly, I love torturing myself

[01:27:43] Hahaha, to put it nicely, challenging myself

[01:27:46] Hahaha

[01:27:48] Mm-hmm, are you happy being tortured

[01:27:51] I think if someone tortures themselves just for the sake of being tortured

[01:27:54] Then that person has psychological issues

[01:27:56] But If a person is being tortured in order to learn more things

[01:27:59] And enrich their experiences and abilities

[01:28:02] I think it's worth it

[01:28:05] Your undergraduate teacher

[01:28:06] Teacher Wang Zhong was also an underdog, right

[01:28:08] Does he count

[01:28:09] No, hahaha

[01:28:10] He was doing quite well

[01:28:11] How can you say that about him haha (At that time)

[01:28:13] I just said he was very young

[01:28:15] No no no, he was very young

[01:28:16] But he

[01:28:17] My impression of him has always been

[01:28:19] He is a very sharp person

[01:28:20] Very capable of seeing problems

[01:28:23] Trying to understand problems

[01:28:24] Understanding them very clearly

[01:28:25] Indeed he might not be like many teachers who are

[01:28:32] Very famous

[01:28:34] In society or very dazzling

[01:28:35] At least not at that time

[01:28:36] Now he's very famous

[01:28:38] At that time he wasn't that famous yet

[01:28:39] But I think in terms of ability

[01:28:41] I think he's very strong

[01:28:43] Right, and actually he started out

[01:28:47] When he was doing his PhD he studied with Teacher Shoucheng

[01:28:49] Teacher Zhang Shoucheng

[01:28:51] So

[01:28:53] People who can be chosen by Teacher Shoucheng

[01:28:54] Basically won't be too bad

[01:28:55] Mm-hmm

[01:28:57] Did he say anything about you changing directions for your PhD

[01:29:04] He didn't say anything

[01:29:05] I think he is

[01:29:08] He is someone who doesn't like to interfere with others

[01:29:11] Hahahaha

[01:29:13] I don't know what he was thinking inside

[01:29:15] But I think

[01:29:16] He is someone who doesn't like to interfere with others

[01:29:18] Eh, quantum physics

[01:29:19] What kind of worldview is it as a whole

[01:29:21] It and, um

[01:29:23] I think

[01:29:24] I think the biggest difference is I think, um

[01:29:27] There are many

[01:29:27] Many differences from classical physics

[01:29:30] But I think

[01:29:30] They are two corresponding concepts, right

[01:29:32] Classical physics and quantum physics

[01:29:34] They are theories at different energy and time

[01:29:38] Or spatial scales

[01:29:40] That is, essentially our world is all quantum

[01:29:43] Of course right now

[01:29:44] We don't know what exists at smaller scales

[01:29:45] Right, like At smaller scales

[01:29:46] There are many different ideas

[01:29:49] For example, string theory is an idea

[01:29:50] And then look at other ideas

[01:29:52] Quantum gravity is also an idea, things like that

[01:29:53] Right, but none of those can be verified

[01:29:55] Verified

[01:29:56] The effective theory at the smallest scales is quantum physics

[01:29:59] The tiniest, tiniest scales

[01:30:01] That can be experimentally verified

[01:30:04] The effective theory at the smallest scales is quantum

[01:30:07] Of course, this includes quantum mechanics and quantum field theory

[01:30:10] And classical physics is

[01:30:12] When the spatial scale you're looking at and

[01:30:16] Is relatively large

[01:30:17] This quantum physics

[01:30:18] Will gradually, gradually reduce to classical physics

[01:30:20] Actually, it's more about at different scales

[01:30:23] Having different effective theories

[01:30:24] This, this thing

[01:30:26] Is actually a very profound idea in physics

[01:30:28] It's what's called the renormalization group

[01:30:29] What the renormalization group says

[01:30:31] Is that

[01:30:34] The theory describing a system

[01:30:37] At different energy scales

[01:30:39] May look completely different

[01:30:41] Right, and even if they may ultimately, at the root

[01:30:45] Are all a grand unified theory

[01:30:46] Of course, right now

[01:30:47] There isn't really a true grand unified theory

[01:30:49] If one exists

[01:30:50] Even if they share the same root at the origin

[01:30:52] But at different scales

[01:30:53] They may also look completely different

[01:30:55] So classical physics and

[01:30:57] Quantum physics

[01:30:57] Are more like two descriptions at different scales

[01:31:00] Speaking of quantum physics

[01:31:01] There are several terms that seem related

[01:31:03] For example, the butterfly effect

[01:31:05] For example, quantum entanglement

[01:31:06] Can you talk about these

[01:31:08] I think this is something everyone can understand

[01:31:10] And I don't know physics either

[01:31:11] Don't blame me, everyone

[01:31:12] I don't know quantum physics either

[01:31:14] Right, I think

[01:31:15] Quantum entanglement

[01:31:16] Is indeed something relatively well-known

[01:31:19] And quite unique to quantum physics

[01:31:22] And then it's very simple

[01:31:23] It's like, say I have two particles

[01:31:24] For example, they're in an entangled state

[01:31:26] And then maybe they're actually very far apart

[01:31:29] But actually

[01:31:29] If I perform some measurement on one of them

[01:31:31] Or perturbation

[01:31:32] It will also affect the state of the other

[01:31:34] This is real

[01:31:35] This is real, right

[01:31:37] What kinds of things have quantum entanglement

[01:31:39] What kinds of two objects, there are many

[01:31:41] There are many

[01:31:43] Just, there are many

[01:31:44] Actual situations

[01:31:45] It's actually

[01:31:46] When you look closely enough, enough, enough

[01:31:49] At a small enough, microscopic scale

[01:31:50] The vast majority of particles may be in entangled states

[01:31:54] But practically speaking

[01:31:55] You can For example, create one spin and another spin

[01:31:58] First bring them together

[01:31:59] Then collapse them into an entangled state

[01:32:02] Then you can pull one of them very far away

[01:32:04] Then it becomes an entanglement

[01:32:06] A state entangled over a long distance

[01:32:07] And I think even, I remember a few years ago

[01:32:10] There were people who specifically did experiments

[01:32:13] Putting a bacterium and some other thing

[01:32:15] Into a quantum entangled state

[01:32:17] What do you mean by prepare

[01:32:19] Into a quantum entangled state

[01:32:22] This can be manually operated

[01:32:23] This is something that can be manually operated

[01:32:25] Why, how do you operate it

[01:32:26] Generally speaking

[01:32:27] It's through some

[01:32:29] Some measurements and the action of evolution operators

[01:32:32] Can put it

[01:32:32] Into this state

[01:32:34] But the hard part here

[01:32:35] Is actually how to implement this experimentally

[01:32:37] This process

[01:32:38] You can imagine

[01:32:39] It's like you perform some quantum measurements

[01:32:41] And some, some so-called quantum gate operations

[01:32:43] Actually

[01:32:45] It's quite difficult

[01:32:46] Which brings us back to the question just now

[01:32:49] That every system is actually not isolated

[01:32:51] You might have these two spins

[01:32:52] And you think, hey

[01:32:53] If I prepare them this way

[01:32:54] Don't I get an entangled state?

[01:32:55] Then I just separate them and I'm done

[01:32:57] But the real problem is

[01:32:58] These two particles actually live in our world

[01:33:01] Other particles constantly

[01:33:01] Bump into them

[01:33:02] Or external heat disturbs them a bit

[01:33:04] And the state is gone just like that

[01:33:05] So the hard part is

[01:33:06] How to actually implement this process experimentally

[01:33:08] Right, and then

[01:33:10] Another example of entanglement might be more well-known

[01:33:13] I should actually mention that example

[01:33:14] Which is Schrödinger's cat

[01:33:17] That's a much more famous example

[01:33:21] It says its state is actually a superposition

[01:33:24] Of a radioactive source emitting a particle

[01:33:26] And the cat being dead

[01:33:28] That's one state

[01:33:29] The other state is the radioactive source not emitting a particle

[01:33:31] And the cat being alive, a superposition of these two

[01:33:34] So for example

[01:33:34] If you measure that radioactive source

[01:33:36] And find that it emitted a particle

[01:33:37] You know the cat is dead

[01:33:39] No matter how far apart the cat and the source are

[01:33:42] Right, so that's entanglement

[01:33:44] But the butterfly effect is a

[01:33:48] Is a different thing

[01:33:49] And the butterfly effect

[01:33:52] Well the famous part of the butterfly effect

[01:33:54] Is actually from classical physics

[01:33:56] What people hear about in classical physics

[01:33:58] The butterfly effect is that famous example

[01:34:01] Where maybe a butterfly in South America

[01:34:02] Flaps its wings

[01:34:03] Half a month later

[01:34:04] A typhoon hits North America

[01:34:07] But from a more mathematical formulation

[01:34:10] It says that at time

[01:34:15] At the initial moment

[01:34:16] If you make a very tiny perturbation

[01:34:19] And then measure the impact of this perturbation

[01:34:21] How large it becomes in the future

[01:34:22] You'll find

[01:34:23] That this perturbation grows exponentially

[01:34:27] Right, that's mathematically

[01:34:28] A description of the classical butterfly effect

[01:34:31] But something people were puzzled about before

[01:34:35] Is how could this phenomenon exist in quantum systems

[01:34:37] Because as we just said, isolated

[01:34:39] An isolated quantum system undergoes unitary evolution

[01:34:40] It's a very linear process

[01:34:42] So in a certain sense

[01:34:44] If you have one state

[01:34:46] That is, one vector and another vector

[01:34:48] With not too large an angle between them initially

[01:34:50] Then after some evolution

[01:34:51] This angle shouldn't change

[01:34:54] And so there should always exist

[01:34:56] This situation where initial states are

[01:34:58] Very slightly different

[01:34:59] And in the future, bam, it grows exponentially

[01:35:02] That seems from quantum mechanics, like

[01:35:03] Something unlikely to happen

[01:35:05] But as we just said

[01:35:06] Our world is actually quantum at the microscopic level

[01:35:09] And becomes classical at the macroscopic level

[01:35:11] But they're part of the same continuum

[01:35:12] How can one have it and not the other

[01:35:13] That's what people were trying to understand

[01:35:15] And of course Later people gained a better understanding

[01:35:17] Which is that actually

[01:35:19] When discussing the butterfly effect in quantum systems

[01:35:21] You shouldn't discuss the change between two states

[01:35:24] This change

[01:35:24] Instead you should discuss something

[01:35:27] Called local observable（局域可观测量）

[01:35:29] That is, the change in local observables

[01:35:31] That actually corresponds to what you see

[01:35:33] In classical physics, those changes

[01:35:35] So after four years of studying quantum physics

[01:35:37] What were you thinking at the time

[01:35:40] What do you think physics helped you with

[01:35:42] When you were about to graduate as a senior

[01:35:44] I think the biggest benefit of studying physics as an undergraduate

[01:35:47] Is first of all

[01:35:50] Think things through clearly

[01:35:51] Reading isn't about reading a lot

[01:35:53] But about reading deeply

[01:35:54] Reading a lot doesn't mean you can discover new things

[01:35:59] But if you have

[01:36:00] A perspective different from others on something

[01:36:02] That's what's more valuable

[01:36:04] To society

[01:36:05] This one thing

[01:36:05] And another thing is don't trust theory too much

[01:36:09] Don't trust pure theory too much

[01:36:11] Because

[01:36:12] I came to this conclusion

[01:36:13] Because the main reason that discovery happened at that time

[01:36:16] Was because we could do numerics

[01:36:19] It started because numerics and theory didn't match

[01:36:22] Then we carefully studied that problem

[01:36:24] And discovered this thing

[01:36:27] Then why did you go study high energy physics for your PhD

[01:36:29] That's also a theory

[01:36:30] This brings us back to the topic we just discussed

[01:36:32] That always loving to challenge very difficult things

[01:36:34] Sometimes also brings some bad results

[01:36:37] What bad results

[01:36:39] For example I feel like

[01:36:40] I think my PhD, for myself personally

[01:36:44] I learned a lot

[01:36:46] Grew a lot

[01:36:47] But for this world

[01:36:49] It didn't produce any contribution

[01:36:52] Haha, this high energy theory direction

[01:36:53] It's difficult enough

[01:36:55] Very very difficult

[01:36:56] And um

[01:36:58] But the bad thing about it is

[01:37:01] It's actually not particularly verifiable

[01:37:03] There are no objective evaluation criteria

[01:37:06] Because

[01:37:07] High energy theory has developed to the point where

[01:37:10] Experiments completely can't catch up at this stage

[01:37:12] Experiments completely can't catch up to what you're discussing in theory

[01:37:15] Whether it's energy scales

[01:37:16] Or these microscopic scales

[01:37:18] Right

[01:37:19] How does it progress

[01:37:21] What does its progress depend on

[01:37:23] If not experiments

[01:37:26] One source of progress

[01:37:29] Comes from mathematical self-consistency

[01:37:31] Mm-hmm, like for example

[01:37:32] You propose a framework

[01:37:35] To describe these things

[01:37:36] Then can you be self-consistent with existing

[01:37:39] Already verified theories at lower energy scales

[01:37:43] Like for example

[01:37:43] You study string theory

[01:37:45] Then naturally the question everyone asks is

[01:37:46] Can string theory at low energy

[01:37:48] Return to quantum field theory

[01:37:49] And then return to classical physics

[01:37:51] Then this self-consistency is one criterion

[01:37:54] I think this is very reasonable

[01:37:55] A very scientific thing

[01:37:57] Of course there are also some unscientific factors

[01:37:59] That when this field completely lacks experiments

[01:38:03] And objective standards

[01:38:06] There definitely won't be just one framework that appears

[01:38:08] There definitely won't be just one self-consistent framework that appears

[01:38:10] At this time who does well

[01:38:12] Who doesn't do well

[01:38:13] Actually depends on

[01:38:16] The subjective judgments of some old-timers in the field

[01:38:20] Did someone hurt you

[01:38:22] I wasn't hurt by anyone

[01:38:23] It's just that the longer I stayed in that field

[01:38:27] The more I felt this thing was stupid, like

[01:38:32] A person's life isn't that long

[01:38:34] Why waste your own time

[01:38:36] Serving old-timers

[01:38:40] Right

[01:38:41] So it feels like spending 5 years learning a lot of knowledge

[01:38:46] Buying a big lesson

[01:38:48] This lesson is

[01:38:49] This big lesson is to (do experiments)

[01:38:52] Hey, it's about doing

[01:38:54] Things with relatively objective evaluation criteria

[01:38:57] Mm-hmm, or from another perspective

[01:38:58] Or from another perspective

[01:39:00] Like

[01:39:01] Do things that can have an impact on this world

[01:39:06] So actually your undergraduate went relatively smoothly, right

[01:39:09] In the quantum physics research field

[01:39:10] Very quickly

[01:39:11] You very quickly had very good academic results

[01:39:13] And it was paradigm-level change

[01:39:15] But you quickly felt it wasn't attractive anymore

[01:39:17] So You wanted to challenge something more difficult in your PhD

[01:39:20] Right

[01:39:20] And during the PhD period it was actually quite lonely

[01:39:24] At least in terms of results it was like that

[01:39:25] Hahaha

[01:39:26] The outside world couldn't tell

[01:39:27] From the outside it all looks like a very glamorous resume

[01:39:29] PhD at Stanford

[01:39:30] Right, I think

[01:39:31] In terms of actual research output

[01:39:35] I think

[01:39:35] No one would say my PhD papers were bad

[01:39:39] But if I'm being completely honest

[01:39:40] How much impact did they have on the world?

[01:39:41] I think almost none

[01:39:42] No impact, practically zero

[01:39:44] Right, so for me personally

[01:39:46] I was really unhappy with that

[01:39:48] But I also wasn't unhappy enough to, you know

[01:39:52] worry that people would say I was slacking off

[01:39:55] I really wasn't slacking off

[01:39:57] You can still meet all the external expectations

[01:40:00] Right

[01:40:01] How do you pull that off?

[01:40:02] Well, this is something that

[01:40:05] You know how it really feels, right?

[01:40:06] Right, exactly

[01:40:07] I think meeting external expectations

[01:40:09] Or meeting the standards of a small circle

[01:40:12] It's like training a model

[01:40:15] Once you're in that small circle

[01:40:17] And you know what their evaluation criteria are

[01:40:19] It's easy to do well

[01:40:21] Even if you don't actually believe in those standards

[01:40:23] You can still meet them

[01:40:24] Mhm

[01:40:24] But deep down, you know you don't buy into them

[01:40:26] Because sometimes even when you don't believe in it

[01:40:29] And you hit those marks

[01:40:29] You can fool yourself and just keep moving forward

[01:40:32] But I eventually realized I couldn't fool myself

[01:40:37] Couldn't lie to myself

[01:40:38] Mhm

[01:40:39] Right

[01:40:40] When did you realize that?

[01:40:41] I think probably around

[01:40:44] The last two years of my PhD

[01:40:47] I started having that feeling

[01:40:48] But back then, I hadn't really figured it out yet

[01:40:52] Hadn't figured out what to do if not this

[01:40:55] So I spent some time

[01:40:57] Exploring different directions

[01:40:59] For example At first I mostly looked into

[01:41:02] Quantum computing

[01:41:04] Or quantum information, that kind of direction

[01:41:06] Then I got a postdoc offer

[01:41:09] After getting the postdoc offer

[01:41:10] It felt more urgent

[01:41:13] Because when you're still in school

[01:41:16] You can still have a student mindset

[01:41:18] After leaving school, it's your own career（事业）

[01:41:21] You have to carve out a path for yourself

[01:41:23] So at the time I felt

[01:41:25] Quantum computing and AI were probably two

[01:41:29] I think they offer young people

[01:41:32] More opportunities

[01:41:34] So what was your postdoc direction?

[01:41:37] The postdoc had no direction

[01:41:37] It was basically just theoretical physics

[01:41:40] A postdoc is a very independent position

[01:41:41] You basically do whatever you want

[01:41:43] Right, it's more like

[01:41:45] In a way, it's kind of like doing charity

[01:41:48] Huh?

[01:41:48] Who's doing charity?

[01:41:50] Well, there are probably some

[01:41:53] Whether it's government organizations that care about research

[01:41:55] Or private organizations

[01:41:56] They donate money

[01:41:57] To the university

[01:41:58] Or allocate funding to the school

[01:42:00] The school uses that money to hire postdocs

[01:42:01] Who then do research in a department

[01:42:04] And share their research

[01:42:05] Broadly with other people in the department

[01:42:08] I think it's more about creating a kind of social atmosphere

[01:42:12] This kind of

[01:42:12] This kind of work

[01:42:13] Right, and so

[01:42:14] So there really aren't many restrictions

[01:42:15] You can basically do whatever you want

[01:42:17] But I didn't actually do

[01:42:19] The postdoc for very long

[01:42:20] I was probably at Berkeley for two or three months in reality

[01:42:23] But officially, I was only there for two weeks

[01:42:27] What do you mean by officially?

[01:42:28] I mean

[01:42:28] I had actually already gone there before I officially started

[01:42:30] Because I was already in the Bay Area anyway

[01:42:32] I went there before I officially started

[01:42:33] But after I officially started

[01:42:34] I only stayed for two weeks before quitting

[01:42:37] What happened during those two weeks?

[01:42:39] Nothing happened in those two weeks

[01:42:40] I wasn't even planning to start the position

[01:42:41] But the people at Berkeley were just too nice

[01:42:42] They were like, uh

[01:42:44] No worries, just wait until things are settled

[01:42:45] Come for as long as you can

[01:42:47] Oh, so you told them you were actually talking to Anthropic

[01:42:50] Right

[01:42:51] I told them

[01:42:51] Actually I think I might go do AI

[01:42:55] Maybe I shouldn't join

[01:42:56] Mm-hmm But Berkeley wasn't

[01:42:58] Not just Berkeley

[01:42:59] I think the Bay Area

[01:42:59] Teachers at both these schools are very nice

[01:43:02] They really take care of you

[01:43:03] They felt you haven't fully finalized things yet

[01:43:05] So better hold onto the current job first

[01:43:09] Do you think physics helped you later when doing AI

[01:43:10] In what ways

[01:43:14] I think in terms of hard skills there wasn't much help

[01:43:17] In terms of pure tool-based skills

[01:43:21] Actually the transfer from physics to AI

[01:43:25] Is very very little

[01:43:27] But I think if you really have to ask

[01:43:29] I think maybe the main

[01:43:31] Main

[01:43:33] No

[01:43:33] Can't say it's ability

[01:43:34] It's personality

[01:43:35] Maybe

[01:43:35] Maybe physics people want to get to the bottom of things more

[01:43:39] Want to understand something more

[01:43:40] And want to do things more systematically

[01:43:42] Because we're used to this very systematic

[01:43:44] Whether it's experimental methods

[01:43:45] Or theoretical methods

[01:43:47] So I think this might be

[01:43:50] A good thing

[01:43:51] But I don't think this is unique to physics people either

[01:43:55] Like Why wouldn't computer science people have this trait

[01:43:57] I know many computer science people

[01:43:59] Who also have this trait

[01:44:01] Many chemistry people also have this trait

[01:44:02] Biology students also have this trait

[01:44:03] So I don't think it's unique to physics

[01:44:06] Right but actually it's quite interesting

[01:44:08] There are indeed many in this field

[01:44:10] Especially with language models

[01:44:13] This kind of large scale AI

[01:44:14] There are indeed many people from physics backgrounds

[01:44:17] Who have been very successful

[01:44:18] Right especially at Anthropic this company

[01:44:22] When many people describe this generation of AI

[01:44:23] They all say it's a black box

[01:44:26] Can you use a scientific perspective

[01:44:27] To understand this black box

[01:44:28] The operating principles of artificial intelligence

[01:44:30] I think

[01:44:33] Everything in this world is a black box

[01:44:36] Like even physics

[01:44:39] Something everyone thinks they understand

[01:44:41] Actually doesn't really have

[01:44:44] An understanding from its microscopic behavior

[01:44:47] All the way to macroscopic manifestations

[01:44:50] Like whether it's quantum mechanics

[01:44:52] Or quantum field theory

[01:44:52] They all describe behavior at that energy scale

[01:44:55] Essentially the system is still a black box

[01:44:56] You still don't know at its most microscopic level

[01:44:58] What kind of dynamics

[01:45:00] AI is the same

[01:45:01] Whether it's a black box or not

[01:45:02] Is actually all relative

[01:45:03] We indeed don't understand language models to the level of

[01:45:07] Neurosurgery-level precision

[01:45:09] It's not that I understand this behavior

[01:45:11] To the extent of

[01:45:12] Saying this behavior is caused by which neuron

[01:45:15] Which artificial neuron's which activation

[01:45:18] Producing this behavior

[01:45:19] We don't have that

[01:45:21] Haven't reached that level of understanding

[01:45:22] Except in some very sparse

[01:45:24] Very small networks

[01:45:26] Like Anthropic

[01:45:27] Has this so-called Interpretability

[01:45:28] Interpretability team

[01:45:29] They might do some similar work

[01:45:30] But in practically usable language models

[01:45:33] We haven't reached such understanding

[01:45:34] But it doesn't mean we have no understanding at all

[01:45:37] For example Scaling Law

[01:45:38] It describes how models at that scale

[01:45:41] With model size and data improve in perplexity

[01:45:47] Under this metric get better and better

[01:45:50] Mm-hmm so you say there's no understanding at all

[01:45:53] Well if

[01:45:54] Scaling Law

[01:45:54] Doesn't count as a small part of understanding

[01:45:56] Then can we also say

[01:45:57] We actually don't understand this world at all either

[01:45:58] This world is also a complete black box

[01:46:01] So Scaling Law is a scientific law

[01:46:05] It's an empirical law

[01:46:06] An empirical law

[01:46:07] Right

[01:46:08] But

[01:46:09] The boundary between empirical laws and scientific laws

[01:46:11] is quite blurry

[01:46:14] For example

[01:46:17] If we look back at these thermodynamic

[01:46:19] various different laws

[01:46:21] The first law, the second law

[01:46:22] The Clapeyron equation and whatnot

[01:46:23] all this messy stuff

[01:46:24] When they were first discovered

[01:46:26] they were also empirical laws

[01:46:28] It's just that later on

[01:46:29] as time went by

[01:46:30] we gradually understood their microscopic mechanisms

[01:46:32] Then they might have become scientific laws

[01:46:34] Right, I think maybe something like Scaling Law

[01:46:36] or things like that

[01:46:38] Right now it's definitely still very impressive

[01:46:41] But in the future, when the technology becomes more fixed

[01:46:44] and people start to understand it more and more

[01:46:46] the microscopic process

[01:46:47] will it become a scientific law

[01:46:48] if such a definition exists

[01:46:51] I think it's possible

[01:46:55] Can you explain in scientific terms

[01:46:57] this so-called intelligence emergence

[01:47:01] First of all, this term itself isn't very scientific

[01:47:04] So naturally there's no way to use scientific language

[01:47:06] to describe something unscientific

[01:47:08] Intelligence emergence?

[01:47:10] Well, I think intelligence emergence

[01:47:14] to me it's more of a subjective feeling

[01:47:17] rather than an objective phenomenon

[01:47:19] When many people talk about intelligence emergence

[01:47:21] what they might have in mind is that previous language

[01:47:23] models could only do one type of thing

[01:47:26] like only translation

[01:47:27] only analysis

[01:47:28] only certain things

[01:47:29] But now it seems like the model

[01:47:30] can do everything

[01:47:32] But this thing

[01:47:35] Again, I think it's like

[01:47:37] to me

[01:47:38] it's more of a technical emergence

[01:47:40] rather than a behavioral emergence

[01:47:43] It's that through research

[01:47:45] we discovered

[01:47:45] how to do this kind of large-scale training

[01:47:49] and then be able to lift all capabilities across the board

[01:47:52] I think this is the more fundamental thing

[01:47:54] As for intelligence emergence itself

[01:47:56] Actually, I think, um

[01:47:58] everyone probably has a different definition in mind

[01:48:00] Right

[01:48:01] Your definition is

[01:48:02] To me, there's no definition

[01:48:04] Haha, to me

[01:48:07] The only qualitative difference is

[01:48:08] whether there's been a technical breakthrough

[01:48:11] that allows us to scale up

[01:48:13] and lift all capabilities across the board

[01:48:15] This, to me

[01:48:16] is a well-defined thing

[01:48:19] You ended up choosing AI

[01:48:22] between quantum computing and AI

[01:48:23] How did this shift happen

[01:48:26] Right, I think I still spent

[01:48:28] some time understanding

[01:48:29] where the bottlenecks lie in both directions

[01:48:32] I think the good thing is they both give young people opportunities

[01:48:34] The good thing is

[01:48:35] both have opportunities

[01:48:35] But quantum computing seemed to you

[01:48:38] to be closer to your main path

[01:48:40] at that time, right

[01:48:41] Well, that's why I needed to understand the details

[01:48:44] Because after understanding the details, I found out it's not

[01:48:46] It's the opposite

[01:48:47] Because quantum mechanics

[01:48:48] Oh, not quantum mechanics

[01:48:49] I mean quantum computing

[01:48:49] I think its main bottleneck right now

[01:48:51] is actually in the experiments

[01:48:53] It's not about how you design those algorithms

[01:48:56] or design those operators

[01:48:57] It's more about how you implement it experimentally

[01:49:00] That's something I'm actually not good at

[01:49:01] It's actually quite unrelated to many things

[01:49:04] I'm interested in

[01:49:05] It's actually relatively unrelated

[01:49:08] On the other hand, the things related to me are more

[01:49:11] Like AI, as I just mentioned

[01:49:12] It's more about having an idea

[01:49:14] and then you can use some numbers to verify it

[01:49:16] This numerical aspect in AI

[01:49:17] might be training a model or something like that

[01:49:19] Right and this is actually quite similar to doing physics

[01:49:22] It even is

[01:49:23] That's why

[01:49:24] I've always liked to compare this

[01:49:26] With 18th century physics

[01:49:28] Make comparisons

[01:49:29] It's more like physics of that era

[01:49:32] In that era theory and experiment weren't separated

[01:49:34] There were no theoretical physicists

[01:49:35] Experimental physicists

[01:49:36] You just did physics

[01:49:37] Just did physics

[01:49:38] You could do experiments yourself

[01:49:40] And also do theoretical speculation

[01:49:41] I think AI is a bit like that era

[01:49:43] So actually

[01:49:45] The distance from theoretical physics to experimental physics

[01:49:48] Is farther than directly jumping to AI

[01:49:50] Farther, mm-hmm

[01:49:51] Actually farther

[01:49:52] And in terms of interest it's also farther

[01:49:54] You don't like experimental physics

[01:49:56] (I think) You don't like doing experiments

[01:49:57] I think, um

[01:49:59] It's indeed not where my interest lies

[01:50:01] Mm-hmm although I'm not willing to do it myself

[01:50:03] But I am indeed very interested

[01:50:04] In knowing how other people's experiments are going

[01:50:05] Hahahaha

[01:50:08] Doesn't AI require doing experiments

[01:50:10] Yes, but it's more like numerics

[01:50:12] Right it's not quite like

[01:50:14] That thing where you go to the lab and build an optical table

[01:50:17] And whatnot

[01:50:18] You also have to

[01:50:19] I think experiments are really something

[01:50:21] Maybe because I don't understand

[01:50:22] I haven't reached that level

[01:50:24] So some things seem quite mystical to me

[01:50:28] For example

[01:50:29] Everyone knows how to build this optical table

[01:50:32] But some people can build it for you

[01:50:34] Some people just can't build it after 6 years

[01:50:37] This is hands-on ability

[01:50:39] I just don't get it

[01:50:40] Hahahaha

[01:50:42] I sometimes think

[01:50:42] This thing is a bit mystical

[01:50:45] Oh

[01:50:46] Mm-hmm so numerics are still better

[01:50:49] Numerics are much clearer

[01:50:51] Right right right, for me

[01:50:52] Doing numerical experiments

[01:50:54] Or like AI

[01:50:55] Training models

[01:50:56] And studying various different techniques

[01:50:58] To look at certain details

[01:50:59] This thing is actually um, is

[01:51:02] I can understand why it's done this way

[01:51:06] Mm-hmm but when it comes to building the table

[01:51:08] I'm completely at a loss

[01:51:10] You've done it before

[01:51:12] I of course have

[01:51:14] Everyone has probably done basic

[01:51:15] Physics students definitely all

[01:51:17] Done basic experimental training

[01:51:18] But more importantly I have many friends who do experiments

[01:51:21] Whether visiting their labs

[01:51:23] And watching how they do experiments

[01:51:24] Or chatting with them about how to design experiments

[01:51:28] I feel like there are many things I can't quite understand

[01:51:30] But indeed some of them do it well

[01:51:31] Some don't do it well

[01:51:33] So you say doing AI research now

[01:51:34] Is like doing thermodynamics research in the 17th century

[01:51:37] What it's actually expressing is

[01:51:38] Although everyone can't very clearly

[01:51:42] Scientifically explain and understand this thing

[01:51:44] But it won't stop it from developing

[01:51:47] Right it's more like

[01:51:50] Why

[01:51:51] Comparing to thermodynamics of that era

[01:51:52] In that era

[01:51:53] Everyone actually didn't understand the microscopic theory of heat

[01:51:57] Everyone didn't know what heat was

[01:51:59] Just like now we can't understand

[01:52:01] Right just like now

[01:52:01] Everyone can't understand

[01:52:02] Which matrix element in this language model

[01:52:04] Is doing what

[01:52:07] Actually everyone doesn't understand

[01:52:08] But it doesn't prevent you from having some good empirical laws

[01:52:12] Like various laws of thermodynamics

[01:52:14] And various Scaling Laws now

[01:52:16] So

[01:52:18] From this perspective it is

[01:52:21] From this

[01:52:22] From the perspective of this direction

[01:52:23] Yes

[01:52:24] At this level

[01:52:25] It's something like

[01:52:26] And from a researcher's perspective

[01:52:28] It's that other point I was making

[01:52:29] Theory and experiment actually go hand in hand

[01:52:32] So how did you end up interviewing at Anthropic

[01:52:35] How did your Anthropic journey unfold

[01:52:40] I think the main thing was

[01:52:42] I had former colleagues at Anthropic

[01:52:44] Haha, yeah

[01:52:45] Former colleagues

[01:52:46] So Anthropic

[01:52:47] actually has a lot of people from

[01:52:48] physics backgrounds

[01:52:49] especially theoretical physics backgrounds

[01:52:50] Why is that

[01:52:51] In terms of their hiring choices

[01:52:53] why did they choose this group of people

[01:52:54] I think

[01:52:55] Of course, many

[01:52:57] Mmm

[01:52:59] A lot of people might come up with reasons like

[01:53:02] physicists are good at this or that

[01:53:04] But from my personal perspective

[01:53:07] I think the main reason is still connections

[01:53:09] Just connections

[01:53:10] Because in Anthropic's founding team

[01:53:14] there were actually

[01:53:16] three or four fairly technical people at the time

[01:53:18] and two of them

[01:53:19] are still very much on the technical front lines

[01:53:22] in leadership

[01:53:22] Both of them came from physics backgrounds

[01:53:24] And the people they might have recruited

[01:53:27] also came from physics backgrounds

[01:53:28] So it just continued that way

[01:53:30] But actually, at this stage

[01:53:32] after I joined

[01:53:33] they barely hired anymore

[01:53:35] people with no AI background at all. Right.

[01:53:37] So it's also a

[01:53:38] I think it's also a product of its era

[01:53:40] Right, and then

[01:53:42] Anyway, I decided to go into AI at that point

[01:53:43] So I tried to reach out to a few places

[01:53:46] And then

[01:53:47] You only looked at

[01:53:48] Anthropic?

[01:53:49] No, I also reached out to OpenAI and GDM

[01:53:51] That is, Google DeepMind

[01:53:52] But Google DeepMind

[01:53:53] because it was too slow back then

[01:53:56] Hahaha, so I didn't

[01:53:59] Just didn't

[01:54:00] end up in consideration

[01:54:02] But

[01:54:03] Too slow

[01:54:03] You mean their interview process was slow

[01:54:06] But later

[01:54:07] Obviously later

[01:54:07] They made huge strides with Gemini

[01:54:10] They moved really fast after that

[01:54:12] Haha, yeah

[01:54:13] And then Anthropic

[01:54:15] Well, anyway

[01:54:16] What about OpenAI

[01:54:17] I reached out to OpenAI too

[01:54:18] But OpenAI

[01:54:20] probably didn't find a particularly good fit in terms of projects and people

[01:54:22] And Anthropic was because I reached out at that time

[01:54:26] And then it was

[01:54:27] my first

[01:54:28] that manager

[01:54:29] my first manager

[01:54:30] And he used to do theoretical physics too

[01:54:32] And he said at the time

[01:54:36] We're trying to do reinforcement learning

[01:54:39] Trying to do this kind of large-scale reinforcement learning

[01:54:41] There are many scientific questions to understand

[01:54:43] That was in '24

[01:54:45] Around August or September

[01:54:46] At that time

[01:54:47] actually

[01:54:47] reinforcement learning wasn't as mature as it is now

[01:54:50] Back then most people didn't really know how to do it

[01:54:51] Because o1 hadn't been released yet

[01:54:53] Back then, o1 was just, yeah, yeah, yeah

[01:54:56] It was just

[01:54:57] Just

[01:54:58] Everyone knew it was out there

[01:55:00] But no one had seen the results yet

[01:55:01] But Anthropic didn't actually know how to do it back then

[01:55:03] They had a general idea at the time

[01:55:07] But there were many details that needed careful study

[01:55:10] So he told me, hey

[01:55:12] There's this thing

[01:55:13] Would you like to come interview

[01:55:15] And I thought, hey

[01:55:17] It might be a good opportunity

[01:55:18] How did you perceive reinforcement learning back then

[01:55:22] No clue, haha

[01:55:24] You roughly know pre-training

[01:55:25] Post-training, yeah, exactly

[01:55:26] I roughly knew the pipeline

[01:55:27] But I didn't really know the specifics of

[01:55:31] how industrial-grade language models are trained

[01:55:33] Mm I only knew how it's done in academia

[01:55:36] Right, and then

[01:55:37] So looking back, what I knew then

[01:55:40] In hindsight, it was basically nothing

[01:55:42] Right, and then, mm

[01:55:44] More than anything

[01:55:46] I felt at the time that this was an uncertain thing

[01:55:50] But it was a good opportunity

[01:55:52] So I just went for it

[01:55:53] Mm Of course there was some interview prep and the interview process, right

[01:55:56] How did you prepare

[01:55:57] What did you talk about

[01:55:58] At the time

[01:55:59] Who did I interview with

[01:56:00] Anthropic, some of my later colleagues interviewed then

[01:56:02] And then

[01:56:04] The interview questions weren't too hard

[01:56:05] Anyway, haha, right

[01:56:07] But for me

[01:56:08] I didn't know how to prepare back then either

[01:56:10] I just went through all the courses I could find

[01:56:14] Learned everything I could on my own

[01:56:16] Did all the assignments I could do

[01:56:18] And then I hand-rolled a whole system myself

[01:56:20] That Andrej Karpathy

[01:56:22] He has that famous project called

[01:56:24] I think it's called nanoGPT or something

[01:56:27] Anyway, he has one where

[01:56:27] You can train a tiny GPT model inside a Google Colab Notebook

[01:56:30] And I hand-rolled that

[01:56:33] And then I went to the interview

[01:56:35] And that was it

[01:56:36] Right

[01:56:37] And got the offer pretty quickly

[01:56:39] And then, right

[01:56:39] Got the offer

[01:56:41] And then Your first direction was large-scale reinforcement learning

[01:56:44] Actually, back then two teams reached out

[01:56:47] Two team managers

[01:56:49] Came to talk to me

[01:56:50] One was doing evaluation

[01:56:51] Basically model evaluation

[01:56:54] And the other was doing reinforcement learning

[01:56:57] I chose reinforcement learning

[01:56:59] You chose reinforcement learning back then

[01:57:00] Because it was more unclear, right

[01:57:05] Mm-hm, and back then

[01:57:07] Anthropic wasn't the big company it is now

[01:57:08] The company was actually quite small back then

[01:57:10] How many people

[01:57:11] When I joined

[01:57:12] Our big team only had about 10 people

[01:57:16] Or 10 people

[01:57:17] Or 11 people

[01:57:17] What was the big team called

[01:57:19] It was called Horizon

[01:57:21] Right, and then

[01:57:23] Back then that big team

[01:57:24] So like the parallel teams to this big team

[01:57:26] What were they

[01:57:28] That big team later basically became

[01:57:30] The team that covered every aspect of reinforcement learning

[01:57:33] Right, but back then

[01:57:33] Its whole larger group

[01:57:34] Was just reinforcement learning

[01:57:36] The whole larger group

[01:57:38] Well, for a startup

[01:57:39] It's hard to say what that group's goal was

[01:57:43] Because

[01:57:43] They probably had many different goals at various points

[01:57:46] But just at that stage

[01:57:47] The main goal was probably doing reinforcement learning

[01:57:48] Right, and then

[01:57:49] Of course there were also teams more focused on data below that

[01:57:53] Teams more focused on environments and infra and infrastructure

[01:57:57] And teams more focused on research and algorithms

[01:58:02] And the team I joined

[01:58:03] Was more on the research and algorithms side

[01:58:05] Mm, how many people did Anthropic have back then

[01:58:09] Uh, back then probably

[01:58:13] Around seven or eight hundred in total

[01:58:15] But the whole company

[01:58:18] Seven or eight hundred, right

[01:58:20] What was your first impression when you joined

[01:58:23] I think

[01:58:25] I think my impression of Anthropic

[01:58:27] Has actually been pretty

[01:58:28] Pretty consistent

[01:58:30] I mean, after joining

[01:58:31] My impression of the company was that it had very strong execution

[01:58:36] It's just that

[01:58:37] It's actually a relatively top-down company

[01:58:39] Right and then

[01:58:40] So after many things are decided

[01:58:43] They go all in

[01:58:44] And

[01:58:45] The atmosphere between employees in the company is actually very good

[01:58:48] Everyone

[01:58:51] Doesn't hide things

[01:58:52] And especially when I first joined it was very small

[01:58:54] So

[01:58:55] Everyone knew each other

[01:58:56] So the atmosphere was very good

[01:58:57] And

[01:59:00] I think

[01:59:02] If you're doing

[01:59:04] Just doing language model related things

[01:59:06] Actually looking back now

[01:59:08] That was a very very good learning opportunity

[01:59:11] Where you could get exposed to every aspect of

[01:59:12] Training this model

[01:59:14] And could find corresponding people to ask

[01:59:18] Did Anthropic at that time already have

[01:59:20] What we all know now

[01:59:21] That very firm bet

[01:59:24] Yes yes

[01:59:26] Where did this bet come from

[01:59:27] Why did this bet exist

[01:59:30] I don't know its complete source

[01:59:33] One obvious source I could see

[01:59:36] Was the previous generation model

[01:59:38] After Claude 3 was released

[01:59:41] On Twitter, which might not have been called X yet

[01:59:43] Many people on Twitter were discussing

[01:59:48] That Claude 3 seems to write code better than GPT-4

[01:59:53] In that era

[01:59:54] GPT-4 was still a model with a huge gap from everyone else

[01:59:57] So being able to do one important thing better than GPT-4

[02:00:01] Was quite impressive

[02:00:02] So it was discovered through trial

[02:00:04] I think at least that's one of the reasons

[02:00:06] It was very quick feedback on the market

[02:00:10] Right, this is also something I think this company is very strong at

[02:00:12] Its execution is very very strong

[02:00:17] Once it gets a signal

[02:00:19] That makes it feel very reasonable

[02:00:21] Something this company should do

[02:00:22] Then it will go all in

[02:00:24] It doesn't have that redundancy of large organizations

[02:00:27] Why was its coding definitely better than GPT-4

[02:00:31] Can't say haha

[02:00:33] Oh there is a reason

[02:00:34] There is a reason

[02:00:34] There is a reason, right

[02:00:36] But it's a random reason

[02:00:37] Not because I chose this

[02:00:39] So this result happened

[02:00:40] It's a purely technical reason

[02:00:42] But

[02:00:44] Indeed, I don't

[02:00:45] I can't determine whether it was randomly tried at first

[02:00:47] Or deliberately chosen

[02:00:48] If you ask me to guess

[02:00:49] I would definitely think it was randomly tried

[02:00:51] Oh

[02:00:52] A purely technical reason

[02:00:54] There was someone who did something

[02:00:56] There was indeed a certain team that did something

[02:01:01] Was it top-down

[02:01:02] Or bottom-up

[02:01:05] I think at first it might have been bottom-up

[02:01:08] But later it became a top-down thing

[02:01:11] To quickly capture some market

[02:01:13] Right, internal and market signals

[02:01:15] Right right

[02:01:15] I think this is

[02:01:16] Need to quickly go all in

[02:01:18] Right right

[02:01:18] I think this is something Anthropic is very very strong at

[02:01:21] It's very very reactive

[02:01:22] Reacts very quickly

[02:01:23] Where does its execution come from

[02:01:24] Comes from this person Dario

[02:01:26] Comes from his certain trait

[02:01:28] I feel like

[02:01:30] Mm-hmm Anthropic As a company

[02:01:31] It can implement this

[02:01:35] Relatively top-down mechanism

[02:01:36] Is a very unique thing

[02:01:38] Why

[02:01:39] Because

[02:01:40] Implementing top-down actually has one very difficult point

[02:01:43] That the person making technical decisions

[02:01:47] Must also be the company's decision maker

[02:01:49] Mm-hmm

[02:01:51] First of all you have to be technically convincing

[02:01:54] Then the researchers below will

[02:01:57] You can then

[02:01:58] Convince the researchers below to do this thing

[02:02:01] On the other hand, you have to be the decision-maker at the company

[02:02:03] You have to be able to take responsibility for the company

[02:02:06] Anthropic has that going for it

[02:02:07] That is, its technical leader

[02:02:11] Is actually a cofounder of the company

[02:02:14] Who are you referring to?

[02:02:15] Not Dario Amodei

[02:02:17] Like Jared Kaplan

[02:02:19] And Sam McCandlish

[02:02:20] And both of them are cofounders of the company

[02:02:22] They make this decision themselves

[02:02:24] It's their company

[02:02:25] So they have the authority to do this top-down

[02:02:27] Then Dario, as CEO

[02:02:28] Does he get to say yes or no?

[02:02:31] I don't know about their decision-making discussions

[02:02:35] Hahaha, okay

[02:02:36] What role did Dario play?

[02:02:38] I can only say

[02:02:39] The technical leader has the decision-making power

[02:02:42] I can only say

[02:02:43] For my work at that time

[02:02:45] The person I worked with the most was Jared

[02:02:49] But is this hard for other model companies?

[02:02:51] Very hard. For example, OpenAI couldn't do it

[02:02:54] <b>When Ilya was

[02:02:54] there, wasn't it possible?

[02:02:55] <b>When Ilya was there, it might have been possible

[02:02:56] But Ilya later, on one hand

[02:02:59] I don't know for what reason

[02:03:00] He seemed to have lost the ability to make decisions

[02:03:04] And then he left

[02:03:05] So...

[02:03:07] What about other companies?

[02:03:09] I think other companies all find it pretty difficult

[02:03:11] Even Gemini finds it pretty difficult

[02:03:13] But I think Gemini has a completely different playbook

[02:03:15] It's a bit different

[02:03:15] That is, um

[02:03:17] I think big companies and startups

[02:03:20] Their playbooks are fundamentally different

[02:03:21] Because for startups, what's important is to make bets

[02:03:25] That is, I have to bet on something

[02:03:27] If I want to bet

[02:03:29] It means there's risk

[02:03:31] So that means

[02:03:33] I can make decisions very quickly

[02:03:36] And push decisions through strongly

[02:03:38] So perhaps in this situation

[02:03:40] Top-down is a big advantage, I think

[02:03:42] So I think organizationally, Anthropic

[02:03:44] Has an advantage over OpenAI

[02:03:46] But as a big company

[02:03:47] It might have a different mindset

[02:03:49] Because a big company's mindset might be

[02:03:51] Not only can I minimize the gambling aspect

[02:03:55] But I can also have reserves in every area

[02:03:58] And then if anything succeeds

[02:04:00] I can catch up

[02:04:01] And if I succeed at something myself

[02:04:03] I might even take the lead

[02:04:04] That's probably the big company mindset

[02:04:06] So at Gemini

[02:04:07] Google is a very traditional

[02:04:08] Very bottom-up organization

[02:04:10] At the company level

[02:04:11] There may be some well-defined frameworks

[02:04:14] To evaluate whether your work is good or bad

[02:04:17] To guide you to do things the company needs

[02:04:21] But essentially

[02:04:21] It's still you deciding what you do yourself

[02:04:23] So you think Anthropic can make bets (referring to betting heavily on coding)

[02:04:26] Because of its unique culture

[02:04:29] Organization and culture, yes

[02:04:32] This sounds like

[02:04:34] Something other companies should be able to do too

[02:04:35] But it's very strangely found that

[02:04:38] Other companies find it hard to do

[02:04:40] While Anthropic can do it

[02:04:41] Yes, I think it still requires technical credibility

[02:04:45] Or the company's leaders need to have credibility

[02:04:48] I think this is actually quite difficult

[02:04:50] You're not even talking about the CEO having credibility

[02:04:51] It's the #1 technical person having credibility

[02:04:53] Yes, to me

[02:04:54] I think it's very important for the #1 technical person to have credibility

[02:04:57] But at the same time

[02:04:59] The CEO may not have become an obstacle

[02:05:01] Yes

[02:05:02] Is this hard?

[02:05:03] Ah, I think it depends on your

[02:05:07] This cofounding team

[02:05:08] Whether there's enough mutual trust

[02:05:12] This is also crucial

[02:05:14] I think Anthropic is also strong in this regard

[02:05:16] Very strong among startups

[02:05:17] Its cofounding team

[02:05:18] Not a single person has left the company

[02:05:21] If you look at their past

[02:05:23] They are a group of people who have truly fought battles together

[02:05:26] In the past

[02:05:26] They originated from, they were all former OpenAI employees

[02:05:29] Mm-hmm right

[02:05:30] And

[02:05:31] Many of them were even

[02:05:33] Co-authors on a series of key papers

[02:05:37] Co-authors, because like

[02:05:39] The Scaling Law paper

[02:05:41] Was Jared Kaplan and Sam

[02:05:42] And of course Dario

[02:05:44] And some others

[02:05:44] Maybe Tom Brown was there too

[02:05:45] I can't quite remember if Tom Brown was there

[02:05:48] And the GPT-3 paper had Tom Brown

[02:05:50] And Benjamin Mann

[02:05:51] And Jared Kaplan and Sam were both there

[02:05:53] Dario was also there

[02:05:54] So they are people who have been in the trenches together

[02:05:58] I think mutual trust is still very key

[02:06:01] Mm-hmm, many companies might just be doing their thing

[02:06:04] And can't even keep this small group united

[02:06:06] Then how can you expect

[02:06:07] This big company to stay united

[02:06:11] You're talking about OpenAI right

[02:06:12] Mm-hmm, hahaha

[02:06:15] When you joined Anthropic

[02:06:16] What was the most important

[02:06:17] Project the company was working on

[02:06:18] Did you participate in that big project

[02:06:20] Right

[02:06:20] At that time the goal was to do large-scale

[02:06:22] Large-scale reinforcement learning

[02:06:24] And use it to improve coding ability

[02:06:29] That was the most important thing at that time

[02:06:30] Mm-hmm and we were doing this

[02:06:32] This team

[02:06:34] The research focus at that time was this thing

[02:06:36] This is also why this team later gradually grew bigger

[02:06:38] And became more and more important

[02:06:40] And

[02:06:41] The final result was

[02:06:42] Everyone trained this 3.7 together

[02:06:45] The Claude 3.7 model

[02:06:47] Hey you said internally there was a 3.6

[02:06:48] This is

[02:06:49] Not internally called

[02:06:50] It's from the outside

[02:06:51] Claude 3.5 actually had two versions

[02:06:53] One might be the June version

[02:06:55] Another October version, and then

[02:06:58] You can also see

[02:06:59] Anthropic this company

[02:07:00] Used to have no product capability either

[02:07:03] Actually calling two models by one name

[02:07:05] Hahahaha

[02:07:07] So later outsiders to distinguish

[02:07:09] Called the later version of 3.5 as 3.6

[02:07:13] So Anthropic followed this outside convention

[02:07:16] And called it 3.6

[02:07:17] Called this newer model 3.7

[02:07:21] So

[02:07:21] If you look at the actual product timeline of this company

[02:07:24] It's actually 3.5, 3.5new, 3.7

[02:07:27] How could there be a 3.5new

[02:07:29] What were they thinking

[02:07:31] Haha

[02:07:32] I can only say

[02:07:32] Anthropic at that time

[02:07:34] Probably really had no product ideas

[02:07:36] So your first project was 3.7 or 3.5

[02:07:39] 3.7, 3.7

[02:07:40] Or 3.5new

[02:07:41] Actually I

[02:07:43] Didn't participate, almost didn't participate

[02:07:45] But 3.5new

[02:07:46] Already showed signs of coding

[02:07:48] Really? When you first started

[02:07:50] At the time of 3.5new

[02:07:50] Already saw

[02:07:51] Anthropic's model

[02:07:52] Would be stronger than other models in agentic coding

[02:07:55] Why is that

[02:07:57] Can't say hahaha

[02:08:01] So when you went in

[02:08:02] It was exactly when

[02:08:03] They knew about this thing

[02:08:04] That management also knew about this sign

[02:08:07] Right and when they wanted to make bets

[02:08:09] You had very good luck I think

[02:08:10] I think, right

[02:08:11] I think when I joined

[02:08:12] Everyone had definitely already seen

[02:08:13] This thing could be done and was important

[02:08:16] But didn't quite know how to do it

[02:08:19] And when I went in

[02:08:20] I was researching with everyone how to do it

[02:08:23] Right so the method was large-scale reinforcement learning

[02:08:26] Right from the big picture perspective

[02:08:29] But of course

[02:08:30] There are many technical details that need to be researched

[02:08:33] What know-how is in here

[02:08:37] Haha there are lots of NDA (Non-Disclosure Agreement) contents

[02:08:39] Hahaha

[02:08:41] Would NDAs be written in such detail

[02:08:43] Actually in principle

[02:08:46] In principle

[02:08:48] Employees cannot during their employment and after leaving

[02:08:52] Disclose any information related to the company's internals

[02:08:54] Of course in reality

[02:08:55] Everyone probably has a sense of degree in their mind

[02:08:56] That is

[02:08:58] If this technology hasn't been made public

[02:08:59] Definitely won't discuss it publicly

[02:09:01] But I think although I can't discuss it publicly

[02:09:04] But

[02:09:07] I think

[02:09:09] Doing simple things cleaner than anyone else

[02:09:11] Is the most critical thing

[02:09:13] What do you mean by clean

[02:09:14] You also used this word just now

[02:09:15] Right it's it's

[02:09:17] I think there are many fancy techniques

[02:09:20] For example doing reinforcement learning

[02:09:22] The simplest algorithm is Policy Gradient

[02:09:26] But that doesn't mean it's the only algorithm

[02:09:28] There are other algorithms

[02:09:29] Like various complex

[02:09:30] Search algorithms and such

[02:09:32] But

[02:09:34] Are these complexities necessary

[02:09:36] And these complexities might bring you

[02:09:39] Some efficiency

[02:09:42] That is efficiency improvements

[02:09:43] But they might bring you some

[02:09:45] For example

[02:09:46] Infrastructure difficulties

[02:09:48] Then how do you trade off these things

[02:09:50] These things actually need to be understood in research

[02:09:55] How to balance these different factors

[02:09:57] And choose the best path

[02:09:58] The most stable path

[02:10:00] Right and I think a lot of know-how

[02:10:02] Is actually in these

[02:10:04] These details

[02:10:05] How to handle all these aspects of details

[02:10:08] Then how was coding described as important at that time

[02:10:12] I think

[02:10:13] Is it considered a branch of large language models

[02:10:16] An important branch

[02:10:16] Or what

[02:10:17] I think everyone might have different ideas

[02:10:18] For me

[02:10:21] There are two reasons it's important

[02:10:23] One reason is

[02:10:26] What Anthropic has been talking about

[02:10:27] That coding itself

[02:10:28] Is also part of language model research

[02:10:31] If you can do coding very well

[02:10:33] It might make your research efficiency

[02:10:35] Improve by multiples

[02:10:36] Mm-hmm, forming a research flywheel

[02:10:40] This is one reason

[02:10:41] For me

[02:10:41] Another reason

[02:10:42] Is because coding is actually a model

[02:10:44] Using tools and interacting with the environment

[02:10:47] A very good abstraction

[02:10:49] First of all the benefits of this abstraction

[02:10:51] What are the benefits of this abstraction

[02:10:53] For example the feedback signal is clear

[02:10:54] And data is abundant

[02:10:57] And

[02:10:59] Actually it's very hard in other scenarios

[02:11:01] To find

[02:11:02] Tool-using scenarios that have both these traits simultaneously

[02:11:06] So for me this is a good abstraction

[02:11:08] Some research done in this area

[02:11:10] Might be useful for more general

[02:11:12] Those abilities to use tools and interact with the environment

[02:11:16] Some useful

[02:11:17] Useful lessons

[02:11:19] What was Cursor's status at that time

[02:11:23] At that time Cursor was still a

[02:11:25] Pure product company

[02:11:28] I think in a sense

[02:11:29] It seems like before I went to Anthropic

[02:11:32] During that period

[02:11:36] Claude and Cursor were both in relatively underdog states

[02:11:40] And somehow at 3.5new, which is 3.6

[02:11:45] The outside world's 3.6 generation

[02:11:48] First the model capability went up

[02:11:50] Then Cursor discovered

[02:11:51] This model

[02:11:52] Could really do this kind of Agentic coding tool

[02:11:55] It's just a shell

[02:11:57] Right but this shell wrapping this model

[02:11:59] Suddenly let the public experience

[02:12:01] Not the public

[02:12:02] The public here means the software engineering community

[02:12:04] At that time, um

[02:12:05] I realized

[02:12:06] Wow, this really seems like a productivity tool

[02:12:08] So after that, it just took off

[02:12:11] So around that time

[02:12:12] Anthropic realized

[02:12:12] Cursor is a future competitor

[02:12:15] I don't know about that

[02:12:15] You'd have to ask Dario, hahahaha, alright

[02:12:18] How was 3.7 made

[02:12:21] This was a watershed moment

[02:12:22] For Anthropic

[02:12:23] It was a watershed model

[02:12:25] I think for Anthropic's post-training

[02:12:27] It was a watershed

[02:12:29] Before 3.7

[02:12:31] Post-training was in a relatively

[02:12:33] Um

[02:12:35] Small-scale

[02:12:36] And

[02:12:37] It was more like patching things up

[02:12:40] That kind of state for the model

[02:12:41] People didn't value post-training, right?

[02:12:43] It's not that they didn't value it

[02:12:43] Everyone from the start

[02:12:44] For a long time

[02:12:45] No one really figured out

[02:12:47] How post-training should scale up

[02:12:49] Oh, but during that period

[02:12:51] Whether OpenAI or Anthropic

[02:12:52] Or even like China's DeepSeek, right

[02:12:55] They realized how to scale this up

[02:12:57] And how to scale it up

[02:12:58] You have to find

[02:13:00] The right environment

[02:13:01] Where the feedback signal is clear enough

[02:13:04] And the environment itself is a strong data source

[02:13:08] And then

[02:13:10] On top of that

[02:13:12] You can make the training very stable

[02:13:14] Then it can work

[02:13:18] Yeah, I remember back then

[02:13:19] Actually no one knew

[02:13:20] What OpenAI's secret project was

[02:13:21] Just knew it was called Strawberry

[02:13:22] Called Strawberry

[02:13:23] And then, um

[02:13:24] People thought it would bring a new paradigm

[02:13:26] A new paradigm of post-training reinforcement learning

[02:13:29] But no one knew much more than that

[02:13:31] Yeah, actually

[02:13:35] I think when I joined Anthropic

[02:13:38] People already had a pretty good idea

[02:13:41] About how this should roughly be done

[02:13:44] The general direction of how to do it

[02:13:46] And then

[02:13:48] Later on, as time went on

[02:13:50] As I learned more and more about this field

[02:13:53] I discovered

[02:13:53] At that moment

[02:13:54] The way OpenAI was doing things

[02:13:55] And Anthropic were actually quite different

[02:13:57] How so?

[02:13:58] In terms of the specific algorithms

[02:13:59] And the way they used data

[02:14:02] They were actually quite different

[02:14:03] Although both are called post-training and reinforcement learning

[02:14:05] Um, although both are called that

[02:14:06] But of course I don't think those are the fundamental differences

[02:14:08] In terms of the big picture

[02:14:11] They're the same

[02:14:12] They found some

[02:14:13] Found some very regression-like

[02:14:15] Very clear signals

[02:14:16] Very objective

[02:14:16] And the data itself is relatively clean

[02:14:19] And learnable for the model

[02:14:21] And do stable reinforcement learning training on top of it

[02:14:25] In the big picture, that's the direction

[02:14:26] But the specific implementations differ quite a lot

[02:14:28] But later it was proven

[02:14:29] The specific implementation

[02:14:30] Each company actually went in different directions

[02:14:32] But they all succeeded

[02:14:34] Um, and at the time OpenAI's goal wasn't coding either

[02:14:37] From what I understood, the narrative was

[02:14:40] Pre-training as the first paradigm

[02:14:42] The gold mine is almost exhausted

[02:14:43] So now we're opening a second gold mine

[02:14:46] Which is post-training and reinforcement learning

[02:14:47] To let the Scaling Law continue, right

[02:14:50] I think for a long time

[02:14:52] OpenAI had this idea

[02:14:55] I don't know if their thinking has changed now

[02:14:57] For me

[02:14:59] My thinking has gone through shifts

[02:15:01] Around the era of 3.7

[02:15:03] I actually felt like I

[02:15:04] At that time I also had the feeling that pre-training was almost

[02:15:08] Party is over

[02:15:09] This kind of feeling

[02:15:10] And right when you were about to join

[02:15:13] Right when I first joined

[02:15:14] And at that time when doing these 3.7 related

[02:15:16] These kinds of experiments

[02:15:18] I also once had this idea

[02:15:21] But later as my understanding deepened

[02:15:24] I felt I discovered

[02:15:25] Actually there's still room to do things

[02:15:28] And um

[02:15:31] Pre-training Scaling Law

[02:15:35] It doesn't tell you to keep getting bigger

[02:15:37] It's actually a very systematic framework

[02:15:40] That can tell you what kinds of things are more effective

[02:15:44] Right, mm-hmm

[02:15:46] And

[02:15:47] So later discovered

[02:15:48] Actually there are still many things to do

[02:15:49] The fact is

[02:15:51] Later Anthropic

[02:15:51] And Gemini's pre-training

[02:15:53] Have also been continuously progressing

[02:15:55] OpenAI itself was stuck for a long time

[02:15:57] Haha, are they paying attention to pre-training again now

[02:15:59] They should have been paying attention to pre-training for quite a while

[02:16:02] It's just recently they might have made some progress

[02:16:07] So pre-training and post-training as two paradigms

[02:16:11] Neither has reached its plateau

[02:16:13] I think neither has

[02:16:15] But you say predicting how far it will go

[02:16:18] Can't do that

[02:16:19] Right I think

[02:16:20] I think reaching a plateau has

[02:16:23] Two possibilities

[02:16:27] One possibility is the technology itself has reached

[02:16:29] Where you still have things you want the model to do

[02:16:32] But these two technologies just can't teach it

[02:16:35] Another possibility is

[02:16:36] The things you want to do have reached a plateau

[02:16:38] I think now it's the latter

[02:16:40] Right now we know oh

[02:16:41] There's a Chatbot

[02:16:42] You can teach it to do this

[02:16:43] And then there's coding

[02:16:44] You can teach it to do this

[02:16:46] And then we don't know

[02:16:47] Right, don't know what else to teach it

[02:16:50] That is to say

[02:16:51] This model is still a very smart kid

[02:16:53] Right you can actually teach it many things

[02:16:55] Right but we humans as teachers

[02:16:56] Now don't know what the next thing to teach is

[02:16:58] Right right

[02:16:59] Or how to reasonably teach it

[02:17:01] Using current paradigms

[02:17:04] Speaking of 3.7, what other know-how

[02:17:07] How many months did this take

[02:17:11] This finally all in all

[02:17:15] From starting training to release

[02:17:16] Probably took about four or five months

[02:17:20] From when you first joined

[02:17:22] From when everyone started

[02:17:23] Doing research for this thing

[02:17:26] That probably took two or three months

[02:17:27] And then later from starting training to training completion

[02:17:30] With bumps along the way

[02:17:31] Many things to handle

[02:17:32] And there was a lot of new infrastructure

[02:17:35] Actually infrastructure is really important

[02:17:36] Very time-consuming

[02:17:37] And then probably took about two months or so

[02:17:40] What important work did you do in it

[02:17:41] I don't think I did anything important

[02:17:42] Hahaha

[02:17:44] I think

[02:17:47] My personal contribution

[02:17:47] I personally

[02:17:52] My contribution to any model

[02:17:53] My statement

[02:17:55] Is always

[02:17:56] I feel like I'm not that important to that thing

[02:17:57] I think more importantly I was very lucky

[02:17:59] To have the opportunity

[02:18:02] To join an important project at that time

[02:18:03] And did some things

[02:18:06] Mm-hmm, because in a sense

[02:18:07] I think AI in recent years

[02:18:10] This thing itself is unstoppable

[02:18:14] It doesn't depend on whether you do it or not

[02:18:18] If you don't do it someone else can do it just as well

[02:18:20] So I think in this era

[02:18:23] Actually all things that give individuals credit

[02:18:25] Are somewhat hyped

[02:18:29] Suspicious

[02:18:30] Of being hyped

[02:18:31] But indeed I think for me

[02:18:33] I am very lucky

[02:18:33] Being able to join at that stage was a big deal

[02:18:36] And, well, I learned a few things

[02:18:38] So you were lucky to be there at that stage

[02:18:41] At Anthropic

[02:18:42] this company's

[02:18:44] large-scale reinforcement learning team

[02:18:47] what did you do

[02:18:48] I think around the 3.7 era, what we mainly worked on was still

[02:18:51] working on this agentic coding thing

[02:18:54] how to scale this thing up

[02:18:56] or how to prepare

[02:18:58] like how to set up

[02:18:59] all kinds of environments and data

[02:19:02] including what algorithmic problems you'd run into

[02:19:03] Most of the research at the time was on this part

[02:19:05] Any tips on this?

[02:19:10] Looking back, there aren't really any particularly useful tips, haha

[02:19:12] I think

[02:19:15] When it comes to technical tips

[02:19:16] this is actually something

[02:19:17] that on one hand, people are really eager to hear about

[02:19:21] but companies won't let you talk about

[02:19:22] and in reality isn't very useful

[02:19:24] Why?

[02:19:25] Because a lot of algorithm design isn't actually independent

[02:19:27] independent of the algorithm itself

[02:19:29] It's very strongly

[02:19:30] dependent on your infrastructure

[02:19:32] A simple example is

[02:19:34] some companies

[02:19:35] there's a problem people often discuss

[02:19:36] which is during reinforcement learning

[02:19:38] the sample（采样） machine, the one that generates these

[02:19:42] these trace（轨迹）

[02:19:42] these token(词元) — that machine and the trainer(训练器)

[02:19:46] used to actually train the model

[02:19:47] and then update the model weights — that machine

[02:19:50] these two machines might be different

[02:19:53] But the difference

[02:19:54] is partly due to numerical differences

[02:19:56] and partly because

[02:19:58] of using this kind of asynchronous training architecture

[02:20:00] so naturally

[02:20:01] fundamentally they're different

[02:20:03] So different companies might have different

[02:20:06] degrees of this difference

[02:20:07] so your algorithm design will also differ

[02:20:09] Some companies might have these two differences

[02:20:12] being very, very large

[02:20:14] then the biggest part of your algorithm might be

[02:20:16] how to control this

[02:20:17] and how to keep the training stable

[02:20:19] Things like the actual training effectiveness

[02:20:23] will be weighted slightly less

[02:20:25] But some companies might

[02:20:26] have particularly excellent infrastructure

[02:20:29] so the difference between these two isn't that big

[02:20:30] then you can probably spend more effort

[02:20:31] on the training effectiveness

[02:20:33] So a lot of these small tips

[02:20:35] are actually not very useful

[02:20:37] A lot of know-how is actually not very useful

[02:20:40] I say this because I've indeed noticed

[02:20:44] that many

[02:20:45] other labs — well, not

[02:20:47] people in these three labs probably really want to know

[02:20:50] like how Anthropic does this

[02:20:51] or how Gemini does that

[02:20:53] But sometimes I'm reluctant to answer

[02:20:54] One main reason

[02:20:55] is that fundamentally I think

[02:20:57] answering this question would mislead them

[02:21:00] Modern AI training is a large system

[02:21:03] You actually need to understand

[02:21:06] all aspects of this system

[02:21:07] to have a holistic understanding

[02:21:08] of what makes something useful because of what

[02:21:11] rather than saying the thing itself is useful

[02:21:14] What happened from 3.7 to 4.5?

[02:21:17] Both pre-training and post-training, yes

[02:21:20] And, um

[02:21:21] Of course it's just

[02:21:22] more scaling up

[02:21:24] And data

[02:21:25] Whether it's data or training

[02:21:27] the compute is at a much larger scale

[02:21:32] But I think in terms of paradigm, there wasn't

[02:21:34] anything particularly major that changed

[02:21:39] How many people was it when you left Anthropic?

[02:21:43] Close to 2,000, I think

[02:21:45] More than doubled

[02:21:46] Ah, um

[02:21:47] So during your time at Anthropic

[02:21:49] it happened to be going through its most dramatic transformation

[02:21:52] Ah, I probably just caught

[02:21:54] the tail end of it being a small company

[02:21:57] Actually, I think after three

[02:22:00] or four months, the company already started

[02:22:02] and suddenly there were way more people

[02:22:04] Did the culture change?

[02:22:06] There were still some rather chaotic phases

[02:22:08] And then

[02:22:09] Especially around the time when I left

[02:22:12] The period right before I left

[02:22:13] I think culturally, it went through some

[02:22:15] some chaos

[02:22:17] Because some people came in from outside

[02:22:19] and there was probably some conflict with the original culture

[02:22:23] Oh, the previous culture was

[02:22:25] I think before, it was just

[02:22:26] pretty simple

[02:22:27] Yeah, it was very simple

[02:22:27] It was

[02:22:28] more like a small workshop

[02:22:30] Everyone was friends

[02:22:31] And everyone knew what the others were doing

[02:22:33] And

[02:22:34] No one was particularly

[02:22:38] you know, doing

[02:22:39] too much self-promotion or anything like that

[02:22:41] Doing pointless things

[02:22:42] No one was doing pointless things

[02:22:44] Everyone had a lot on their plate

[02:22:46] And the company back then

[02:22:47] probably had a stronger sense of urgency

[02:22:49] And later on, people probably felt that

[02:22:51] with more people

[02:22:52] this kind of culture would definitely take some hits

[02:22:56] What kind of atmosphere did it bring?

[02:22:59] I think

[02:23:00] There were indeed some people I personally didn't like very much

[02:23:05] Of course, that doesn't mean they're actually bad

[02:23:07] I'm just saying I personally didn't like them

[02:23:08] I mean, I probably don't like

[02:23:11] people who talk a lot in this field

[02:23:16] Like, I think 'idea is cheap'

[02:23:19] Ideas are cheap

[02:23:21] Many ideas

[02:23:21] are actually quite obvious, everyone knows them

[02:23:23] The hard part is how to implement them

[02:23:24] How to break it down into small

[02:23:26] actionable steps

[02:23:27] and actually get it done

[02:23:28] I don't think I like those

[02:23:32] who spend a large part of their day

[02:23:34] on Slack, I mean Slack

[02:23:35] is a workplace software used in the US

[02:23:37] and spending a lot of time on Slack

[02:23:39] talking about grand principles

[02:23:42] I think it's just

[02:23:45] not very useful, haha

[02:23:49] Why did you suddenly leave later on?

[02:23:50] Had you completed some milestone at the time?

[02:23:53] How long had you been thinking about it?

[02:23:55] At the time, I think I'd been thinking about it for

[02:23:59] a month or two

[02:24:00] about a month or so

[02:24:00] a little over a month

[02:24:01] That was fast, yeah yeah

[02:24:03] I think one aspect was

[02:24:06] Um, it was

[02:24:08] I actually didn't really agree with Dario's anti-China stance

[02:24:13] Ah, I think as a company CEO

[02:24:19] For him personally

[02:24:20] whatever views he holds, I think it's fine

[02:24:22] But as a company CEO

[02:24:24] I think

[02:24:25] pushing this view to such an extreme

[02:24:28] was a very emotional reaction

[02:24:30] Yeah, and this was a relatively minor reason

[02:24:33] But on the bigger picture

[02:24:34] There are many companies

[02:24:34] Like I just mentioned

[02:24:35] There were some cultural shocks at the company

[02:24:37] And including myself

[02:24:38] I probably wanted to learn some different things

[02:24:42] I mean, Anthropic

[02:24:43] is after all very focused

[02:24:45] And you might be doing

[02:24:47] If you really want to work on everything related to language

[02:24:48] models in all aspects

[02:24:49] And

[02:24:50] working on this kind of tool use, this Agentic stuff

[02:24:53] and coding and such

[02:24:54] then Anthropic is actually great

[02:24:56] You can learn a lot

[02:24:57] But there are many things Anthropic doesn't do

[02:24:59] For example, no one at Anthropic is doing

[02:25:01] this kind of multimodal generation

[02:25:03] You want to learn but there's nowhere to learn it

[02:25:04] And Anthropic probably didn't spend too much energy

[02:25:06] on this kind of more low-level

[02:25:08] engineering

[02:25:10] infrastructure

[02:25:12] Right

[02:25:12] So probably wanting to learn more things

[02:25:16] was also one of my motivations for leaving at the time

[02:25:19] What percentage was the anti-China stance?

[02:25:22] Because of Dario's personal reasons

[02:25:23] I've in public

[02:25:24] Combined say 40%

[02:25:25] But this number anyway just listen to it

[02:25:28] This number just tells you

[02:25:29] It's not the main reason

[02:25:30] But it is indeed a very big reason

[02:25:33] Not controlling

[02:25:34] Not a controlling reason

[02:25:35] Right not a controlling reason

[02:25:36] But it's a majority holder reason

[02:25:42] Your choice is also quite amazing

[02:25:44] Because most people

[02:25:45] When it's still an underdog

[02:25:50] Joining will create more emotional attachment

[02:25:51] Willing to accompany the company for a longer time

[02:25:53] But you instead jumped to Google

[02:25:56] Because many researchers once they enter Google

[02:25:58] They feel Google doesn't give enough scope

[02:26:02] Mm-hmm

[02:26:03] So they instead want to jump to places like xAI

[02:26:05] Or smaller organizations like Anthropic

[02:26:08] Your move seems to be the opposite

[02:26:11] Right I think

[02:26:12] Actually depends on what you yourself want

[02:26:14] If what you really want is I have a very clear

[02:26:18] Like you said a very clear scope

[02:26:19] And this thing

[02:26:20] Is closely related to my final product model

[02:26:23] I must get one of my ideas

[02:26:27] Into this model

[02:26:29] Then Google might be a very bad place

[02:26:31] Because after all there are so many researchers

[02:26:33] So many already mature organizations

[02:26:35] Doing this thing

[02:26:37] Has a very complicated process

[02:26:40] But I think Gemini is very

[02:26:43] If what you want is research freedom

[02:26:46] Freedom to explore

[02:26:47] And want to learn from broader humanity

[02:26:52] I think in this world

[02:26:52] You probably can't find a second place stronger than Gemini

[02:26:56] So

[02:26:58] So it's

[02:26:59] I think

[02:27:00] Essentially it still depends on what you yourself want

[02:27:03] But I think many people when they leave

[02:27:05] Regardless of where they leave from

[02:27:07] After switching to another place

[02:27:08] The main reason they might feel unhappy

[02:27:10] Is because they didn't figure out what they wanted

[02:27:11] For example if you came to Google

[02:27:14] But told me

[02:27:15] At first you thought you wanted research freedom

[02:27:18] And more motivation was learning

[02:27:20] And after you went

[02:27:21] Discovered you still wanted product impact

[02:27:25] Then you might feel very uncomfortable haha

[02:27:27] You don't pursue impact

[02:27:29] You also said this

[02:27:31] Now AI is a very large system

[02:27:33] And is a

[02:27:35] Very large collaborative effort

[02:27:38] What are you pursuing in it

[02:27:39] I think it's divided into stages

[02:27:40] I think

[02:27:42] At Anthropic

[02:27:43] After experiencing too much

[02:27:44] Product-related things

[02:27:48] I might also want to change my mindset

[02:27:51] To learn some different things

[02:27:53] But you say is there any day

[02:27:54] I might switch back to this mindset

[02:27:55] And want to produce some product influence

[02:27:58] That's also possible

[02:27:59] How do you quantify product influence

[02:28:02] This is very clear internally

[02:28:03] Really

[02:28:04] Hard to quantify

[02:28:05] I think

[02:28:06] Because when publishing papers there was still first author

[02:28:09] This kind of lead author

[02:28:11] Now

[02:28:12] Mm-hmm actually there's no way to quantify

[02:28:14] The reality is there's no way to quantify

[02:28:15] This is also why I think in this era

[02:28:18] Actually talking about each individual's influence

[02:28:20] Is a very very ethereal thing

[02:28:24] I think essentially it's still the organization that did

[02:28:28] Such a thing

[02:28:29] Or the world needs this

[02:28:30] So producing product impact is a subjective feeling

[02:28:32] At least on the model side it is

[02:28:36] Right and then

[02:28:37] Of course actually you can

[02:28:38] I think you can

[02:28:39] The details are about what things you yourself have done

[02:28:44] Specific technical contributions

[02:28:45] And the effects produced technically

[02:28:47] This can be discussed objectively

[02:28:49] But more subjective things are

[02:28:50] You were saying

[02:28:51] how much did this effect account for in the final product

[02:28:53] No one can really say for sure

[02:28:57] Can you describe what you did on 3.7

[02:28:59] What kind of technical work did you do

[02:29:01] that actually had an impact on the model

[02:29:03] It was mainly related to agentic coding

[02:29:04] and the environment around it

[02:29:08] And some algorithmic work as well

[02:29:10] On the algorithmic side, it was mainly about making the training more

[02:29:14] stable

[02:29:14] To be honest

[02:29:17] But I do think there were definitely some algorithmic improvements

[02:29:20] but they didn't achieve particularly ideal results

[02:29:22] To be honest

[02:29:23] It's definitely better than the previous algorithms

[02:29:26] Yeah

[02:29:27] But I don't think that was my personal contribution

[02:29:30] I think it was a collective effort from everyone, haha

[02:29:32] Right, every time I ask you

[02:29:33] you always say it's a collective effort

[02:29:35] It's not an era of individual heroism anymore

[02:29:39] Right, I think the era of individual heroism

[02:29:42] for language models

[02:29:43] has probably passed

[02:29:45] When was it?

[02:29:46] It was the Transformer moment

[02:29:48] Right, at that point when the technology

[02:29:49] hadn't yet reached the scale-up stage

[02:29:52] The person who discovered that technology

[02:29:54] might be a hero

[02:29:55] Or a small group that discovered it

[02:29:56] might be heroes

[02:29:57] After that technology was found

[02:29:58] for probably a long time

[02:29:59] from the model side, it's all been

[02:30:01] I think more about collectivism

[02:30:02] whether this group can work together

[02:30:05] whether they can toward a common goal

[02:30:07] spending their own time together

[02:30:09] and their own energy

[02:30:10] That's the most important thing

[02:30:11] Rather than what each individual

[02:30:13] contributed

[02:30:15] The reason you say collectivism

[02:30:17] is because the capability actually comes from AI, is that right?

[02:30:20] The reason I say collectivism

[02:30:21] is because I think AI as a field is fundamentally simple

[02:30:24] Like

[02:30:26] I don't think there's any

[02:30:27] Except maybe that leap moment

[02:30:29] where the idea might require

[02:30:31] some really deep insights

[02:30:32] In the process after that

[02:30:33] many ideas are actually very trivial (微不足道的)

[02:30:36] Very stupid, basically

[02:30:39] Anyone could think of them

[02:30:40] Anyone could do them

[02:30:41] It's just that you got lucky

[02:30:43] and happened to seize the opportunity to do it

[02:30:44] Including when you described Anthropic

[02:30:46] doing coding, it seemed like there was some randomness to it too

[02:30:48] But you have to seize it

[02:30:50] Right, right. But I think when it comes to coding

[02:30:52] it might still involve more

[02:30:53] than the technical stuff on the model side

[02:30:55] a bit more corporate heroism, perhaps

[02:30:58] That is, whether you can bet on it fast enough

[02:31:02] Yeah, Anthropic was indeed very strong in that regard

[02:31:04] But if Anthropic hadn't done it today

[02:31:05] some other company probably would have

[02:31:06] I think so. It's inevitable

[02:31:08] So it's all about emergent capabilities in AI

[02:31:11] It's just about whether you can seize that capability

[02:31:12] Whether it's a company or an individual

[02:31:13] Right, right

[02:31:14] I think before usable language models

[02:31:19] before large-scale language models emerged

[02:31:21] a lot of things were not inevitable

[02:31:24] Like whether someone could invent something

[02:31:26] whether a language model could be trained at scale

[02:31:28] and whether the GPT paradigm

[02:31:30] could be discovered

[02:31:32] There was a lot of uncertainty

[02:31:34] But like you said, for example

[02:31:37] if there had been no Google Brain back then

[02:31:39] Transformer might not have been discovered

[02:31:42] It might have taken many, many years

[02:31:43] before another well-funded organization with talented people discovered it

[02:31:46] That would have been a huge impact

[02:31:48] But after entering that stage

[02:31:50] especially now, the situation has reversed

[02:31:53] Any organization that wants to stop AI progress

[02:31:57] can't do it

[02:32:00] Anthropic has

[02:32:02] Anthropic is very concerned about AI safety

[02:32:03] But does Anthropic have the ability to stop AI development?

[02:32:06] It doesn't

[02:32:07] If you stop developing

[02:32:08] Others will continue

[02:32:08] Your voice will only get smaller

[02:32:10] Right, actually right now it's

[02:32:12] It's more like this kind of situation

[02:32:14] The world is pushing us forward

[02:32:16] Rather than us pushing the world forward

[02:32:20] I feel like in the future it'll be even harder for us to stop AI

[02:32:22] Haha, I think we already can't stop it

[02:32:25] I just think

[02:32:28] Trying to prevent one specific thing from happening with AI

[02:32:30] Probably isn't the right mindset to begin with

[02:32:34] This also relates to what we were just talking about

[02:32:37] Because we were just talking about Anthropic

[02:32:38] One of Anthropic's very important motivations

[02:32:41] Is so-called AI safety

[02:32:44] I think when it comes to AI safety

[02:32:47] The motivation when it was founded

[02:32:48] Right

[02:32:49] What does that have to do with it now

[02:32:51] The relationship now is complicated, meaning

[02:32:56] A natural

[02:32:58] Question people might ask is

[02:32:59] A company focused on AI safety

[02:33:01] Why is it now training frontier models

[02:33:04] Anthropic's explanation is that

[02:33:07] First, I need to have the most cutting-edge model

[02:33:09] Only then do I have a voice to push my AI safety agenda

[02:33:12] So actually, its thinking all along has been

[02:33:16] I want to build the best model in the world

[02:33:18] Everyone will have to listen to me

[02:33:20] To push forward my safety policies

[02:33:22] But from my personal perspective

[02:33:24] I think this idea is very naive

[02:33:26] Looking at this now

[02:33:28] It's not going to happen

[02:33:30] What's more likely to happen is

[02:33:30] Everyone will have great frontier models

[02:33:33] And you won't be able to stop anything from happening

[02:33:36] Maybe for this issue

[02:33:40] What we should focus on and think more about now is

[02:33:43] If you really want to avoid AI

[02:33:45] Bringing about some crisis

[02:33:47] What would be a more self-enforcing approach

[02:33:52] Let me give an example of a self-enforcing mechanism

[02:33:53] Like nuclear weapons, for example

[02:33:55] Nuclear weapons are also something that everyone thinks, hey

[02:33:57] This might have the power to destroy the world

[02:33:59] But with nuclear weapons, in the end

[02:34:00] The way they were ultimately controlled

[02:34:02] Is multi-party control

[02:34:04] In this world

[02:34:05] There are many countries with nuclear weapons

[02:34:09] They all have the ability to destroy each other

[02:34:11] So stability is maintained through this kind of balance of power

[02:34:14] I think if you want to stop AI from doing bad things

[02:34:17] Maybe

[02:34:18] Ultimately, you'll need a similar mechanism to achieve that

[02:34:21] Rather than hoping

[02:34:22] Pinning your hopes on

[02:34:23] One company setting a law to do something

[02:34:25] Mm, right

[02:34:25] And it sets it itself

[02:34:26] It can only govern itself

[02:34:28] Mm, you also just mentioned

[02:34:29] Anthropic has an interpretability team

[02:34:31] Right How far has their interpretability gotten

[02:34:34] In some relatively simple

[02:34:37] Relatively sparse neural networks

[02:34:41] They can do some interesting research

[02:34:44] For example

[02:34:44] Look at what a certain output

[02:34:48] Or input text or image

[02:34:51] What its internal representation looks like

[02:34:54] And then maybe you invert that representation somehow

[02:34:57] What kind of thing it can output after that

[02:34:59] Doing this kind of research

[02:35:03] You also just mentioned a viewpoint

[02:35:05] That AI is essentially simple

[02:35:06] Can you describe what you mean by this

[02:35:07] This is a conclusion

[02:35:09] Right, I think this is

[02:35:10] This isn't even a conclusion

[02:35:11] It's just my statement（陈述）

[02:35:13] It's my statement

[02:35:15] It could be right or wrong

[02:35:16] Oh, and my explanation for this

[02:35:18] This is your view

[02:35:19] Right, my explanation for this

[02:35:20] My explanation for this statement is

[02:35:24] I think the reason it's essentially simple is

[02:35:26] That you can run experiments

[02:35:28] Like, compared to things that are fundamentally difficult

[02:35:32] Like physics, for example

[02:35:34] The difference is

[02:35:35] With that

[02:35:36] Without experimental data at that energy scale

[02:35:38] You simply can't understand the theory at that energy scale

[02:35:41] But AI isn't bound by this（约束）

[02:35:43] It doesn't matter if you don't understand it

[02:35:45] It can still move forward

[02:35:46] And also right now

[02:35:47] The fact is

[02:35:48] I can do any experiment I can think of

[02:35:50] It's just that possibly

[02:35:51] I need some time

[02:35:52] To scale up the compute

[02:35:54] Or get the infrastructure ready

[02:35:57] But there's no fundamental difficulty

[02:35:59] Right

[02:36:01] So I've always been saying

[02:36:04] I feel AI doesn't give people the sense

[02:36:07] That it's hitting a wall because

[02:36:09] First, you can try many things

[02:36:12] Second

[02:36:13] It's not that everyone has run out of ideas

[02:36:16] With no ideas left to try

[02:36:17] More often it's that there are too many ideas

[02:36:19] Need to try them one by one

[02:36:20] Take time

[02:36:21] Mm-hmm

[02:36:24] Feels like humans are so insignificant

[02:36:26] In front of these experiments

[02:36:27] Yes, so

[02:36:29] I think very soon

[02:36:30] AI might start doing experiments itself

[02:36:33] How soon is very soon

[02:36:34] Within 4 months

[02:36:35] I think in the next 6-12 months

[02:36:38] AI will do experiments itself

[02:36:39] I think of course this statement

[02:36:40] Is not very well-defined

[02:36:41] Sorry I said something very vague

[02:36:43] Like um

[02:36:45] AI improving itself

[02:36:47] Or speeding up its own development process

[02:36:50] This is actually already happening

[02:36:53] Right

[02:36:53] Like we discussed earlier

[02:36:55] It's already helping us

[02:36:56] To achieve some of the things we want

[02:37:00] And speed up our experimental pace

[02:37:02] But I think in the next six to twelve

[02:37:04] Sorry

[02:37:05] What it currently can't do is

[02:37:07] Whether it can

[02:37:09] From start to finish complete an AI research project

[02:37:13] Like not only can it write this code

[02:37:15] It can also run this experiment

[02:37:16] Run this experiment

[02:37:17] Can also see the results

[02:37:18] See the results

[02:37:19] Can also analyze the results

[02:37:20] Analyze the results

[02:37:20] Know where it did wrong

[02:37:21] Then propose new hypotheses

[02:37:23] Design new code

[02:37:25] Run new experiments

[02:37:27] This chain is not yet complete

[02:37:30] But I think

[02:37:30] This chain

[02:37:30] Might be the next thing to gradually become complete

[02:37:34] Based on your various reasons

[02:37:36] At the moment you left

[02:37:37] Decided to leave Anthropic

[02:37:39] What were your expectations for this company's future

[02:37:41] I think when I left

[02:37:43] I was actually quite pessimistic about this company

[02:37:47] But later obviously I was overly pessimistic

[02:37:49] Hehehe why pessimistic

[02:37:50] The reason I was pessimistic at that time was

[02:37:53] I think when I left Anthropic

[02:37:55] Anthropic actually um

[02:37:58] Its main revenue source was API

[02:38:01] Selling tokens

[02:38:02] And

[02:38:04] This is a bad business

[02:38:06] Is a bad business

[02:38:07] Because this business

[02:38:08] Is only a good business for one company

[02:38:10] Which is Google

[02:38:11] Because this

[02:38:11] This business eventually leads to price wars

[02:38:15] Eventually it will be price wars

[02:38:17] In price wars if you don't have the complete chain

[02:38:21] There's not much advantage

[02:38:24] But later Anthropic obviously on the product side

[02:38:28] I think indeed there were many clever ideas

[02:38:30] Did many good things

[02:38:31] Whether it's Claude Code getting better and better

[02:38:33] And Claude Cowork

[02:38:34] And various

[02:38:35] Work and efficiency related things

[02:38:37] All slowly converged

[02:38:39] So it feels like it has now become more than

[02:38:42] What I thought at the time

[02:38:43] If you ask me which of OpenAI and Anthropic would die first

[02:38:46] Of course they won't really die

[02:38:47] Just which would become less important first

[02:38:50] At that time I would think hey

[02:38:50] Maybe Anthropic would become less important first

[02:38:53] But later first OpenAI got punched by Google

[02:38:56] Then Anthropic itself got on track

[02:38:58] So now it seems Anthropic has more advantage

[02:39:01] Haha

[02:39:02] Have you ever regretted it

[02:39:04] Mm-hmm not really

[02:39:04] I think for me personally

[02:39:06] My personal motivation was still wanting to switch places

[02:39:10] Improve myself

[02:39:11] I think for this

[02:39:13] For the thing I wanted to do

[02:39:14] This choice wasn't wrong

[02:39:17] You also mentioned Anthropic's products have many clever ideas

[02:39:20] Especially this year

[02:39:22] Like Cowork and such

[02:39:24] Where does this come from

[02:39:27] I think I didn't see Cowork's development process

[02:39:29] So I don't know

[02:39:30] And Claude Code

[02:39:31] I think the person, the product

[02:39:35] Might also

[02:39:35] Really have some opportunities for individual heroism

[02:39:39] Is it a researcher or a product manager

[02:39:40] Boris Cherny

[02:39:43] I think Claude Code almost

[02:39:46] At least the beginning of this thing

[02:39:47] Was him wanting to do this thing himself

[02:39:49] To improve his own or colleagues' work efficiency

[02:39:52] Finally became something

[02:39:54] Important to everyone

[02:39:56] What kind of person is Boris

[02:39:58] I didn't have too much personal contact with him

[02:39:59] I mostly just saw his work, when at the company

[02:40:02] He's a researcher right

[02:40:04] Right but he's mainly on the product side

[02:40:07] So Anthropic does have a dedicated product department

[02:40:10] Didn't used to be so separated

[02:40:11] Later had a separate one

[02:40:13] Right, Anthropic seems to really understand AI products

[02:40:16] Right I think

[02:40:18] I think this is why

[02:40:19] When we first started talking

[02:40:21] Felt that product managers

[02:40:23] Might still be quite hard to replace with AI currently

[02:40:25] Hahaha mm-hmm

[02:40:27] Good product managers

[02:40:28] Hey he doesn't seem to be the previous generation of product managers

[02:40:31] He's not the kind who arranges features and such

[02:40:35] He seems to know how to collaborate with AI

[02:40:37] Some kind of product manager

[02:40:38] Right I think the previous generation of product managers might

[02:40:41] But not entirely

[02:40:43] The previous generation also had some

[02:40:44] Interaction

[02:40:45] Interaction-level changes

[02:40:46] But every interaction-level change

[02:40:48] Actually brings a very big product

[02:40:49] Like maybe Douyin

[02:40:52] Is a product with interaction-level change

[02:40:54] Then it immediately brought huge

[02:40:56] Mm-hmm opened new directions

[02:40:58] And I think

[02:40:59] Maybe Claude Code is also a product at this level

[02:41:04] Claude Code and Cowork were both by Boris

[02:41:06] I don't know who did Cowork

[02:41:07] OK I already left

[02:41:09] I see

[02:41:10] Then tell me about after you arrived at Google

[02:41:12] DeepMind, has your work focus changed

[02:41:13] Work focus changed or not

[02:41:15] Mm-hmm, still

[02:41:17] Some changes happened

[02:41:19] And

[02:41:21] I anyway mainly focus on

[02:41:24] Doing ML coding

[02:41:27] And some relatively long horizon things

[02:41:30] These two things

[02:41:31] 其实刚才都都大概提了一嘴

[02:41:32] Like ML coding

[02:41:33] Actually just now both were roughly mentioned

[02:41:36] Actually it mainly wants to achieve

[02:41:37] The complete AI training itself process we just talked about

[02:41:39] Of course in this process

[02:41:41] There are many practical problems

[02:41:44] Many practical details to solve

[02:41:46] I think in the big picture

[02:41:48] Everyone actually has quite a consensus on how to do it

[02:41:50] But still back to details

[02:41:52] There are many things to handle in details

[02:41:53] Like how to choose appropriate data

[02:41:55] How to choose appropriate feedback signals

[02:41:58] And it brings new infrastructure challenges

[02:42:02] And

[02:42:03] Now it's about slowly figuring out these things

[02:42:05] Slowly figuring them out

[02:42:06] And

[02:42:10] Like long horizon

[02:42:11] Is the other thing we just talked about

[02:42:12] That is wanting to achieve

[02:42:14] That this model can

[02:42:16] Still that slogan

[02:42:18] Train with finite

[02:42:20] But use as infinite

[02:42:22] I think wanting to make this training

[02:42:26] Length longer and longer and longer

[02:42:29] Might not be making a single training

[02:42:32] This segment's length keep increasing

[02:42:34] Might not be a very realistic solution

[02:42:37] But a very realistic thing is

[02:42:38] How do you under limited context

[02:42:40] Do longer work

[02:42:43] Actually if you think about it

[02:42:44] Humans are actually like this

[02:42:45] Human context is actually very very short

[02:42:47] If you ask me now what I ate last night

[02:42:48] I can't remember at all

[02:42:50] Ah you might still remember

[02:42:51] Hahaha I can't remember at all

[02:42:53] Because why

[02:42:54] Because it's not critical to my current scenario

[02:42:56] Right Like even if I knew what I ate last night

[02:42:57] So what

[02:42:58] So I choose to forget it

[02:43:00] So human context is essentially very short

[02:43:02] But they can selectively forget

[02:43:04] And selectively retrieve

[02:43:07] To bring back these important

[02:43:09] Information relevant to the current scenario

[02:43:12] So

[02:43:13] I think that might also be for me

[02:43:15] A very interesting direction

[02:43:18] These two things are actually somewhat related

[02:43:20] Somewhat complementary

[02:43:21] Why, these two things

[02:43:22] Actually both are within the large category of models using tools and with environment

[02:43:26] And different models

[02:43:28] Different people interacting

[02:43:30] Within this category

[02:43:30] The node everyone completed in the past

[02:43:32] Is Agentic coding, which is both tools and environment

[02:43:37] Environment is this virtual machine

[02:43:39] Or interacting within your own computer

[02:43:42] And this thing

[02:43:45] Actually horizontally it grows different usage scenarios

[02:43:50] Then doing AI research

[02:43:52] Is actually horizontally

[02:43:52] Another scenario in this scenario

[02:43:55] This scenario

[02:43:56] Actually not only horizontally is it a new scenario

[02:43:58] Vertically

[02:43:59] It also makes the scale of this thing longer

[02:44:03] Because completing a code completion or something

[02:44:07] Is a very quick thing

[02:44:08] But doing a complete AI research

[02:44:11] Or doing this kind of computer science research

[02:44:13] Is a very long process

[02:44:16] Right so

[02:44:17] It's actually like a T-shape

[02:44:19] Horizontal extension

[02:44:20] Vertical extension too

[02:44:23] Is long horizon still a scientific problem

[02:44:26] Mm-hmm there are scientific problems

[02:44:28] Also engineering problems

[02:44:28] I think its scientific problems are more about

[02:44:32] How to try different solutions

[02:44:34] After trying in a more scientific way

[02:44:37] To find the path we ultimately want to take

[02:44:40] This solution

[02:44:41] What are the ways

[02:44:42] Mm-hmm

[02:44:45] I might not be able to say too specifically

[02:44:47] But broadly speaking

[02:44:49] Some solutions are from the pre-train perspective

[02:44:52] From the pre-training perspective

[02:44:54] Some solutions

[02:44:54] Are similar to this sparse attention

[02:44:57] Sparse attention

[02:44:58] For example DeepSeek also has some work

[02:45:00] And academia also has a lot of work

[02:45:03] And from the post-training perspective

[02:45:04] Also have post-training solutions

[02:45:05] Like for example externally

[02:45:09] Like what you use every day, Cursor and such

[02:45:11] They have very strong context management

[02:45:13] Managing this context ability

[02:45:14] Like it can let the model choose

[02:45:16] I think this middle segment is unimportant

[02:45:18] Just throw it away

[02:45:19] And that segment is important so store it in some file

[02:45:21] Retrieve it when needed

[02:45:22] These two broadly speaking

[02:45:25] These two solutions

[02:45:27] Both have people researching

[02:45:29] Of course the specific implementation details

[02:45:30] Are more than the examples I just mentioned

[02:45:32] The examples I just mentioned

[02:45:33] Are relatively public examples

[02:45:34] The specific implementation details

[02:45:36] Of course each company has its own little secrets

[02:45:38] Well, I think ultimately it all comes down to that

[02:45:43] And then

[02:45:45] I personally spend a lot of

[02:45:48] more time on post-training approaches

[02:45:52] Because

[02:45:53] Well, first of all,

[02:45:54] because I myself

[02:45:56] haven't actually spent official work time on pre-training

[02:45:59] Pre-training is more of an interest to me

[02:46:01] something I want to learn about

[02:46:02] But I myself

[02:46:03] haven't actually done that much work on it

[02:46:05] And on the other hand,

[02:46:07] I think post-training approaches

[02:46:10] actually align better with my own understanding of this

[02:46:13] My understanding of this

[02:46:14] is exactly what we've been talking about

[02:46:16] whether you can train with short context

[02:46:19] but still handle long-context tasks

[02:46:23] Pre-training approaches

[02:46:23] essentially still require you to have long context

[02:46:25] Training it requires the data to contain it

[02:46:27] Right, yeah.

[02:46:28] Right, right. So

[02:46:29] so it doesn't quite fit my philosophy on this problem

[02:46:32] Oh, right.

[02:46:33] So do you think it's possible now?

[02:46:35] Training for long with short

[02:46:37] I think

[02:46:37] It's definitely possible

[02:46:40] but we're not sure which approach works best

[02:46:44] Gemini does long-context really well

[02:46:46] Why is that?

[02:46:49] There are some tricks

[02:46:50] [laughter]

[02:46:55] There are some tricks that really surprised me, haha

[02:46:58] Oh, this is about pre-training, right?

[02:47:00] Doing long context well

[02:47:02] definitely requires both sides

[02:47:03] But I'm just saying, for me,

[02:47:05] the pre-training side

[02:47:06] that trick still really surprised me

[02:47:08] [laughter]

[02:47:11] Right, OpenAI doesn't do it as well as Gemini

[02:47:13] on long context

[02:47:14] But there are also different opinions

[02:47:16] Some people say that with this Gemini 3 generation

[02:47:19] long context actually got a bit worse

[02:47:20] and stuff like that. Right.

[02:47:22] Again, when you joined Gemini

[02:47:23] it felt like people didn't have high expectations for Gemini

[02:47:27] No, I already had pretty high expectations for Gemini at the time

[02:47:29] Haha, what year and month was that?

[02:47:32] I joined at the end of September last year

[02:47:36] That was before Gemini

[02:47:37] released Gemini 3

[02:47:39] You had high expectations for it

[02:47:40] What about others?

[02:47:43] I think people in the industry

[02:47:44] still had a pretty good impression of Gemini back then

[02:47:47] I mean, I think

[02:47:49] before, everyone thought Google was in real trouble

[02:47:51] under OpenAI's impact

[02:47:53] I think people's perception

[02:47:55] probably shifted with the Gemini 2.5 generation

[02:47:59] Because 2.5 was clearly

[02:48:01] you could tell Google was getting the hang of it

[02:48:04] Of course, even before that, Gemini's

[02:48:06] 1.5 also had some, you know,

[02:48:12] small things

[02:48:14] where it was already pretty strong in specific areas

[02:48:17] It was clearly no longer far behind

[02:48:19] But 2.5 was really

[02:48:20] truly a generation

[02:48:21] I think it was when people actually started using the model

[02:48:23] Anyway, I myself have used 2.5 quite a bit

[02:48:25] used it quite a lot

[02:48:26] You went to Gemini because you saw 2.5?

[02:48:28] My going to Gemini had nothing to do with that

[02:48:30] Mainly it's because I knew

[02:48:31] what kind of atmosphere Gemini had

[02:48:33] There were a lot of people doing different kinds of research

[02:48:35] And I also knew some people

[02:48:37] actually

[02:48:40] doing really interesting research

[02:48:40] And many Gemini engineers

[02:48:43] I think their technical skills are extremely, extremely strong

[02:48:46] I think

[02:48:47] I learned so, so much from them

[02:48:51] And, um,

[02:48:53] that's the reason for me

[02:48:54] But I think from everyone's perception

[02:48:56] I think people in the industry, after seeing Gemini 2.5

[02:49:00] probably realized

[02:49:02] that Gemini was catching up

[02:49:06] So for you

[02:49:07] that wasn't a signal for you to join Gemini, right?

[02:49:09] It wasn't a signal for me to join

[02:49:10] Then why did you join Gemini?

[02:49:12] Well, like I just said,

[02:49:13] Mainly because I wanted to accomplish something back then

[02:49:14] Actually, I wanted to have that

[02:49:16] But you know

[02:49:17] Gemini has strong people

[02:49:17] Right? Yeah, exactly

[02:49:19] It's because when they came

[02:49:22] When they approached me, they'd definitely want me to

[02:49:25] Go talk to their people, right?

[02:49:27] So from those conversations

[02:49:28] You can actually get a sense of how things are

[02:49:32] Oh, so they came to you

[02:49:34] Yeah

[02:49:34] But I think in the end it became a two-way street

[02:49:36] So, hahaha

[02:49:37] Wasn't OpenAI an option for you back then?

[02:49:39] If you wanted to leave Anthropic

[02:49:41] OpenAI was also an option at the time

[02:49:42] OpenAI should still have been stronger than Gemini

[02:49:44] In terms of momentum, right?

[02:49:46] At that time

[02:49:47] But

[02:49:48] Though back then

[02:49:49] Weren't there all those internal politics

[02:49:52] Infighting was starting to emerge

[02:49:53] I think so

[02:49:55] So OpenAI was indeed an option for me back then

[02:49:57] And of course there were also options like xAI

[02:49:59] And I think

[02:50:00] The main reason I didn't end up at OpenAI

[02:50:02] Was that I had concerns about its

[02:50:03] Culture, at least at that time

[02:50:06] I had pretty big concerns about its culture

[02:50:09] I just felt that

[02:50:13] To put it bluntly, people who actually get things done

[02:50:16] There weren't as many as at Gemini

[02:50:18] Even fewer than at Anthropic

[02:50:20] Right? I really care about that

[02:50:22] Hahaha, yeah

[02:50:24] So a sense of cultural and personal connection brought you to Gemini

[02:50:26] Yeah

[02:50:28] And then you also caught that Gemini 3 inflection point, right?

[02:50:31] Hmm

[02:50:32] Gemini 3 should have been a major turning point for them

[02:50:36] A turning point period, right?

[02:50:37] I think in terms of actual impact

[02:50:40] I think it was two things

[02:50:41] That created a major turning point for Gemini

[02:50:45] Turning it into a heavyweight

[02:50:48] player in the market

[02:50:49] The player is Nano Banana

[02:50:51] Nano Banana and Gemini 3

[02:50:52] Two things back to back, which is

[02:50:55] I think if there were only Gemini 3

[02:50:57] It probably wouldn't have had such great results

[02:50:59] Because when your market share is less than

[02:51:01] Even 10%

[02:51:03] Whether your model is slightly better or worse

[02:51:04] It just spreads too slowly

[02:51:08] But what Nano Banana did was

[02:51:10] First, it went viral in the market, it was a huge hit

[02:51:13] Then a ton of people downloaded the Gemini app

[02:51:16] And then Gemini 3 was released right after

[02:51:18] Retaining those users

[02:51:20] So

[02:51:21] Now it's become a major player

[02:51:23] I think if Gemini hadn't thrown this punch

[02:51:25] OpenAI's position would be really comfortable

[02:51:28] Its market share is so high that

[02:51:29] Whatever you do with the model

[02:51:30] It doesn't actually matter that much to them

[02:51:35] To be honest

[02:51:36] I think when ordinary people use models

[02:51:39] Their perception of the model's capabilities

[02:51:42] Is actually very, very weak

[02:51:45] Most people don't even use the o-series models

[02:51:47] Most people just use the regular

[02:51:48] ChatGPT one

[02:51:50] Right, so I think for Genimi

[02:51:52] This Nano Banana built up the user volume

[02:51:56] And then Gemini 3 retained those users

[02:51:57] Was something critical

[02:52:00] How many ChatGPT users did it actually take away?

[02:52:03] Hmm, I don't know the exact numbers now

[02:52:07] But my feeling is

[02:52:10] Gemini's market share is probably around 20%

[02:52:15] But I haven't really checked the current data carefully

[02:52:19] Looking at it with hindsight

[02:52:21] These two factors

[02:52:22] Together contributed to Gemini's challenge to OpenAI today

[02:52:25] So from an insider's perspective you must have known earlier

[02:52:28] What happened and why

[02:52:30] Google would undergo such changes

[02:52:32] Yeah, I think

[02:52:34] First of all, Google's technical reserves

[02:52:37] Have always been sufficient

[02:52:37] Hmm, enough talent

[02:52:38] Yeah, they've always been sufficient

[02:52:40] And then

[02:52:42] Organizationally speaking

[02:52:42] It became increasingly clear later on

[02:52:44] It's having a better framework to let

[02:52:48] Everyone work together on this thing

[02:52:50] So there might slowly be some progress

[02:52:53] Right and then

[02:52:56] I think in a sense

[02:52:58] As an outsider

[02:53:00] In a sense

[02:53:01] I think OpenAI saved Google's life

[02:53:04] Oh because everyone used to worry

[02:53:08] This chatbot

[02:53:09] Would completely replace search

[02:53:11] Right if this really happened

[02:53:13] Google would actually be in a tough spot

[02:53:14] But fortunately

[02:53:15] OpenAI did this thing first

[02:53:19] Then made Google realize this thing is important

[02:53:21] But it didn't take this thing all the way

[02:53:23] Didn't take this thing to the extreme

[02:53:25] Didn't completely kill off search

[02:53:28] Maybe just ate some market share

[02:53:30] As a result

[02:53:31] Let Google itself catch up on chatbots too

[02:53:34] Now the one in a tough spot is them

[02:53:37] What if

[02:53:38] For example there's a company, just hypothetically

[02:53:41] In a fictional world

[02:53:42] A company not only made a chatbot

[02:53:44] But also marched forward triumphantly

[02:53:46] Doing better and better

[02:53:47] Really just ate up your search in one go

[02:53:50] Completely didn't give you a chance to fight back

[02:53:51] Then it would be very tough

[02:53:53] Did the chatbot not eat up search

[02:53:54] Because OpenAI didn't do it well

[02:53:56] Or why

[02:53:57] Or because it can't kill off search

[02:53:59] I think

[02:54:00] Both sides actually have reasons

[02:54:01] That is first um

[02:54:03] Current chatbot interaction methods

[02:54:05] Actually won't completely eat up search

[02:54:08] Because it's stronger than search

[02:54:10] Like we said earliest just now

[02:54:12] The one point it's stronger than search

[02:54:13] Is that it has strong interactivity

[02:54:15] You can follow up

[02:54:16] And

[02:54:18] It can help you condense some very complex information

[02:54:21] This is where it's very strong

[02:54:22] So this portion of usage scenarios

[02:54:24] It will indeed steal people from search but

[02:54:27] There are still some very stupid scenarios in search

[02:54:28] Where you have a very simple thing

[02:54:31] You don't want to waste this time

[02:54:32] On a chatbot

[02:54:33] Like I just

[02:54:35] I just search buy rice

[02:54:37] I search buy and it's done

[02:54:39] Just

[02:54:40] Do I have to ask ChatGPT

[02:54:41] Do I have to ask which one is good

[02:54:43] And it's still spinning there

[02:54:44] Spinning for half a day

[02:54:46] Then gives you a link

[02:54:46] You click again

[02:54:47] Then go to the webpage to buy

[02:54:48] Right there's no need for that

[02:54:49] So from actual usage

[02:54:52] Its current form

[02:54:53] Is not enough to completely eat up search

[02:54:55] Right and

[02:54:58] Of course from another perspective

[02:55:00] It might not have reached the peak in the chatbot thing either

[02:55:01] It really let Google catch up

[02:55:03] Now it's not quite caught up yet

[02:55:06] In terms of product

[02:55:08] I think in terms of product it's not caught up

[02:55:09] But in terms of model it has already caught up

[02:55:11] But if you want investors to invest in OpenAI

[02:55:15] They would say

[02:55:18] When they placed their bet

[02:55:19] They recognized clearly

[02:55:21] OpenAI is actually a product company

[02:55:22] Its moat is actually product and brand

[02:55:24] Then from today's perspective

[02:55:27] It seems Google hasn't been able to in this matter

[02:55:29] Catch up

[02:55:32] Can't say surpass OpenAI

[02:55:34] Catch up to OpenAI

[02:55:35] Right I think

[02:55:37] This is actually

[02:55:39] Anyway this is all from my perspective as an outsider

[02:55:41] An observer's perspective

[02:55:43] You're a commentator today

[02:55:44] Hahaha

[02:55:45] From an observer's perspective

[02:55:47] I think Google has traditionally been a bit slow with products

[02:55:49] Has always been relatively slow

[02:55:53] And so

[02:55:55] 然后所以

[02:55:57] Do you think OpenAI has an advantage when it comes to products?

[02:55:59] I think it's possible.

[02:56:01] Right.

[02:56:01] And what's one thing Google is particularly good at?

[02:56:04] Finding an extremely simple product form.

[02:56:08] Everyone looks the same.

[02:56:10] Then it just competes with you relentlessly on technology.

[02:56:13] And you can't outcompete it.

[02:56:15] Oh, right.

[02:56:16] That's exactly what Google is good at.

[02:56:18] Because search engines are exactly like that.

[02:56:21] Search is a classic example.

[02:56:22] Everyone has the same search box.

[02:56:24] One button, but it just searches faster than you.

[02:56:26] And more accurately than you.

[02:56:26] There's nothing you can do about it.

[02:56:28] Mm-hmm.

[02:56:30] So that's why.

[02:56:31] Like.

[02:56:33] It feels like all along.

[02:56:35] Google has been in this state of doing very well, but...

[02:56:40] Wall Street never really bought into it.

[02:56:42] Everyone always wondered where this company's moat really is.

[02:56:45] There's no product ingenuity.

[02:56:47] No retention mechanisms either.

[02:56:49] But it has survived until now.

[02:56:51] So what's the reason its technology is so good?

[02:56:53] I think it's still about the people, right?

[02:56:54] I think it's the culture.

[02:56:56] It's said to be.

[02:56:58] A place that particularly, particularly values.

[02:56:59] In the past, it particularly valued engineers.

[02:57:01] Later, it particularly valued research.

[02:57:03] That's the kind of culture.

[02:57:05] So it's very well suited for.

[02:57:05] Products where technological capability spills over.

[02:57:07] Capability-based products.

[02:57:09] Right, if you look at it from this angle.

[02:57:11] Then do you think OpenAI's position is secure?

[02:57:13] Now?

[02:57:15] I don't think anyone's position is secure right now.

[02:57:16] Hahahaha, right.

[02:57:19] I think the form of AI.

[02:57:23] Still has a long way to go.

[02:57:25] Mm-hmm.

[02:57:26] We're not at any endgame yet.

[02:57:29] That's the feeling about this.

[02:57:32] Right.

[02:57:33] It feels like back home there's already a bit of this sentiment.

[02:57:36] Yeah, I don't get it.

[02:57:36] Like, why don't I get it?

[02:57:38] I'm really puzzled.

[02:57:38] Like.

[02:57:39] So back home, people think we're fighting over a super app.

[02:57:42] A super app is zero-sum, right?

[02:57:44] I think conditioned on the chatbot thing (taking the chatbot

[02:57:49] as the condition to build on) that's the super app.

[02:57:51] Then maybe there's something to fight over.

[02:57:54] But the problem is.

[02:57:55] Is this form the super app form?

[02:57:58] What if someone else.

[02:57:59] Comes out with a completely different form one day.

[02:58:01] And your functionality becomes a subset.

[02:58:04] Of that thing.

[02:58:05] That's quite possible, right?

[02:58:06] I don't think there's anything.

[02:58:08] I don't see anything impossible.

[02:58:11] Why wouldn't the chatbot be the ultimate form?

[02:58:13] But after all these years, this is all we've seen.

[02:58:16] Right, it's all just a chat box.

[02:58:18] I think on this matter, I really don't have any.

[02:58:21] Rational or quantitative criteria.

[02:58:24] To explain it.

[02:58:24] More like you just feel like this whole thing is stupid.

[02:58:27] Like this model clearly has so many capabilities.

[02:58:30] But the way we use it is a chatbot (Note: This video was recorded over 2 months ago, when the agent paradigm was not yet clear).

[02:58:32] It just doesn't quite make sense.

[02:58:34] You know what I mean, so.

[02:58:35] We need a product manager.

[02:58:36] To unlock the model's capabilities.

[02:58:39] Hahaha.

[02:58:41] Humans have only communicated with AI through chatbots until now.

[02:58:44] That seems stupid to you, right?

[02:58:45] It's stupid because.

[02:58:46] Then what should we use to communicate with AI?

[02:58:48] Haven't figured it out.

[02:58:48] If I had figured it out, I'd already be doing it.

[02:58:50] Hahahaha.

[02:58:53] Hey, you didn't tell me.

[02:58:53] What exactly changed inside Google.

[02:58:55] To lead to what the outside world saw.

[02:58:57] The rapid leap in model capabilities.

[02:58:59] Right, like I just said, it's one thing.

[02:59:00] I think the organization has more clarity now.

[02:59:01] And.

[02:59:03] Once the organization is clear.

[02:59:04] Did the organization change?

[02:59:06] Right.

[02:59:06] Especially pre-training.

[02:59:08] Has become very, very clear now.

[02:59:09] That is who is responsible for what

[02:59:12] And every point

[02:59:13] Who is the responsible person at every node

[02:59:15] These things are very clear

[02:59:16] Was it chaotic before

[02:59:17] It was very chaotic in the earliest days

[02:59:19] I wasn't there in the earliest days

[02:59:21] But according to colleagues

[02:59:22] Based on colleagues' or people I knew's descriptions

[02:59:25] It was still more chaotic before

[02:59:26] Mm-hmm right right

[02:59:27] And now

[02:59:28] At least pre-training has also become very very clear

[02:59:30] And plus

[02:59:31] This Google

[02:59:32] Has always had

[02:59:33] This relatively strong technical background

[02:59:36] And it does things relatively systematically

[02:59:37] So I feel

[02:59:39] Pre-training at Google

[02:59:40] Is a very very controllable thing

[02:59:42] Mm-hmm predictable thing

[02:59:43] You can

[02:59:45] You can know

[02:59:47] The next generation won't be bad

[02:59:50] Oh you might even know how good it will be

[02:59:53] Through Anthropic's top-down management it also

[02:59:56] Mm-hmm not bad

[02:59:58] Then Google is this bottom-up

[02:59:59] It's still bottom-up right

[03:00:01] It's definitely more top-down than before

[03:00:04] Compared to the earliest days

[03:00:05] But compared to Anthropic

[03:00:07] It's still more bottom-up

[03:00:08] Like different cultures can both work

[03:00:11] Right right

[03:00:12] For model training

[03:00:13] Right that's

[03:00:13] I think big companies have big company ways

[03:00:15] Startups have startup ways

[03:00:17] So big companies are

[03:00:17] You also just said

[03:00:18] It's a completely different narrative

[03:00:19] It's a different

[03:00:22] Method, what is Google's method

[03:00:23] Now I think Google more says

[03:00:27] Like this kind of relatively deterministic thing

[03:00:28] Like pre-training

[03:00:29] Is already a relatively deterministic paradigm

[03:00:31] Then maybe Google will be more like

[03:00:33] Making it into an engineering project

[03:00:35] Google's engineering management ability is very strong

[03:00:38] So it can slowly do it well

[03:00:40] Mm-hmm what is an engineering project

[03:00:41] Engineering project means

[03:00:43] You are actually

[03:00:45] Actually very very

[03:00:46] Very top-down organization

[03:00:47] And very clear

[03:00:49] What we need to do in the next stage

[03:00:51] Then go do this thing

[03:00:53] What nodes need to be handled in between

[03:00:56] And even doing research is like

[03:00:59] Having a very clear framework

[03:01:01] Telling you how to

[03:01:03] Verify whether your results are good or bad

[03:01:05] Evaluate whether your results are good or bad

[03:01:07] Right so this is

[03:01:08] Something Google is very strong at

[03:01:10] In any big engineering project in the past

[03:01:13] So pre-training

[03:01:15] Actually I think has now entered

[03:01:17] Google's comfort zone

[03:01:19] And

[03:01:20] Post-training of course has more uncertainty

[03:01:22] Then maybe post-training currently

[03:01:23] Is still more bottom-up

[03:01:25] Everyone can try more broadly

[03:01:28] You say pre-train is also a kind of RL

[03:01:30] Why do you say that

[03:01:31] I think it's

[03:01:33] It's hard to say from a pure technical perspective

[03:01:36] Pre-train is pre-training

[03:01:38] Or supervised learning

[03:01:39] What is the essential difference between SFT and RL

[03:01:43] Because pre-training and SFT

[03:01:45] Of course pre-training and SFT are essentially not that different

[03:01:46] That is

[03:01:47] You just take the data you get

[03:01:50] As your ground truth

[03:01:52] Then you treat that as your expert

[03:01:55] Treat that as your expert output

[03:01:56] Then you align toward the distribution of that expert output

[03:02:00] Reinforcement learning might be a broader level

[03:02:03] One level, it's saying first this

[03:02:07] This original output

[03:02:09] Is also not a given expert

[03:02:10] But something I produced myself

[03:02:12] And among them there are good results

[03:02:14] And also bad results

[03:02:15] So you want good results to move closer to that

[03:02:16] and bad results to move away from it, something like that

[03:02:18] So in a sense

[03:02:20] pre-training and SFT are a subset of reinforcement learning

[03:02:24] But

[03:02:25] these two things do, in this era

[03:02:27] have their differences

[03:02:28] Of course, for me

[03:02:29] the biggest difference lies in the data

[03:02:32] For pre-training data

[03:02:35] what matters more is having a good distribution

[03:02:37] The distribution needs to be broad enough, or aligned well enough

[03:02:40] with the scope you want to cover

[03:02:43] But data quality

[03:02:45] doesn't need to be extremely high

[03:02:47] But for post-training

[03:02:48] it's the opposite

[03:02:49] In terms of distribution, it may be much narrower

[03:02:53] But for the data it does have

[03:02:55] the quality requirements are very high

[03:02:56] Yeah, right

[03:02:56] So for now

[03:02:58] for me

[03:02:58] the most fundamental difference between the two

[03:02:59] is still in the data distribution

[03:03:01] rather than in algorithms or training paradigms

[03:03:04] So how do different labs

[03:03:05] organize these teams?

[03:03:07] Are pre-training and post-training different?

[03:03:08] Or are they the same?

[03:03:09] Anthropic

[03:03:11] and Google are pretty similar

[03:03:12] Both of them

[03:03:13] have one team for pre-training

[03:03:15] and another team for post-training

[03:03:19] OpenAI might be more chaotic

[03:03:23] In the early days

[03:03:26] initially they had three teams

[03:03:28] They had pre-training

[03:03:30] and they also had reinforcement learning

[03:03:33] the Strawberry team

[03:03:34] and they also had a post-training team

[03:03:38] And my

[03:03:39] I never worked there

[03:03:40] but my understanding is

[03:03:41] its post-training wasn't really

[03:03:43] its RL team, Strawberry

[03:03:45] and its post-training

[03:03:47] are actually what other companies call post-training and product

[03:03:50] Oh, so

[03:03:51] they might have divided it in a different way

[03:03:52] and sliced it up

[03:03:53] They treat the later stages as product work

[03:03:55] As part of it

[03:03:56] their post-training is actually intertwined with product

[03:03:58] they're building the product

[03:04:00] Is it just that the name hasn't been updated?

[03:04:03] Not entirely

[03:04:04] Because

[03:04:05] at most companies, the product team

[03:04:06] doesn't really train models anymore

[03:04:08] They mostly communicate the desired

[03:04:11] traits

[03:04:13] the model traits, to the team training the model

[03:04:15] But it seems like their post-training

[03:04:18] is in a sense its own product team

[03:04:20] but it can also train models

[03:04:22] Is that because

[03:04:23] their understanding of product is that

[03:04:24] people who train models should also build the product

[03:04:27] Yeah, yeah, possibly

[03:04:27] It could be a good thing

[03:04:28] Yeah, but their org has also changed a lot since then

[03:04:31] So I don't know what their org looks like now

[03:04:34] You guys have released several models recently

[03:04:36] and I saw you were involved in all of them

[03:04:39] Gemini 3 Deep Think

[03:04:41] Gemini 3.1 Pro

[03:04:42] Well, I think I can only say

[03:04:45] that I was fortunate to be involved

[03:04:47] Hahaha, yeah

[03:04:49] Again, it all feels like collective work

[03:04:51] Hahaha, yeah

[03:04:52] How did you become such a public figure now

[03:04:54] getting singled out and mentioned separately every time

[03:04:57] I don't get it

[03:04:58] I actually don't think it's great

[03:05:00] Every time I see it

[03:05:02] I feel like

[03:05:03] how am I going to face my colleagues in the office tomorrow

[03:05:05] Hahaha

[03:05:08] Does it feel awkward?

[03:05:09] At the office it's fine

[03:05:10] I think my colleagues are just good people

[03:05:13] Like they probably don't care too much about these things

[03:05:18] But honestly

[03:05:19] I feel like every project I've been part of

[03:05:22] whether at Google or at Anthropic

[03:05:24] It would happen even without me

[03:05:26] Would all happen the same

[03:05:27] Effect wouldn't wouldn't wouldn't

[03:05:29] Get worse

[03:05:30] I am I am

[03:05:31] I think everyone now is

[03:05:33] Everyone is a surfer

[03:05:35] Essentially it's a wave

[03:05:36] Not you the surfer

[03:05:38] Mm-hmm, is the wave AI

[03:05:40] Right it's AI

[03:05:41] This thing itself is this wave

[03:05:43] It will move forward

[03:05:44] Whether you surf this wave or not

[03:05:46] This wave will crash on shore

[03:05:48] Just that some people might surf this wave

[03:05:50] Some people might be a bit late

[03:05:52] Didn't catch the crest of the wave

[03:05:54] Okay

[03:05:54] You were fortunate to participate in these two projects

[03:05:56] What

[03:05:58] Mainly probably some some

[03:06:00] Those small details in algorithmic design

[03:06:03] Then we would

[03:06:04] Discuss together

[03:06:04] And

[03:06:06] Some

[03:06:08] Some things on the data side

[03:06:09] But things on the data side

[03:06:10] I think Might have more impact on future work

[03:06:14] Do these models have paradigm changes

[03:06:18] Mm-hmm I don't think any

[03:06:23] No change is big enough to

[03:06:25] From not knowing how to do large-scale reinforcement learning

[03:06:30] To large-scale reinforcement learning

[03:06:31] That level of change

[03:06:32] No change is big enough to that extent

[03:06:34] There are definitely some small changes

[03:06:37] Can you talk about these small changes

[03:06:39] These new models

[03:06:44] There are definitely some small changes

[03:06:49] Recently I feel models are already numb

[03:06:51] A bunch of domestic models

[03:06:53] And many foreign models too

[03:06:55] OpenAI you all

[03:06:57] Mm-hmm domestic GLM, ByteDance

[03:07:02] DeepSeek has been expected but hasn't released yet

[03:07:05] Kimi can you highlight the key points for everyone

[03:07:09] I think

[03:07:12] In a sense

[03:07:14] None are that worth paying attention to

[03:07:15] Hey what are people competing over now

[03:07:17] Feels like chaos

[03:07:20] I think some things people are competing over

[03:07:21] Actually looking at it now

[03:07:22] In this era

[03:07:23] Already not that important

[03:07:24] Because of inertia from the past

[03:07:26] Everyone would compete for first place on various Benchmarks

[03:07:29] To prove their model's basic capability is strong

[03:07:32] This thing

[03:07:33] Actually by now it has reached

[03:07:35] Public attention

[03:07:36] Those Benchmarks are somewhat maxed out

[03:07:40] Actually think about it, earliest everyone paid attention to SWE-bench

[03:07:43] Randomly everyone hit 80-something

[03:07:45] Fortunately no one exceeded 83

[03:07:47] Because recently OpenAI just released a post saying they exceeded 83

[03:07:50] Some of those problems are not well-defined

[03:07:52] Fortunately no one exceeded it

[03:07:53] Whoever exceeds it would be embarrassed

[03:07:54] Anyway

[03:07:55] And before everyone reasoned by finishing AIME then IMO

[03:08:00] After IMO what

[03:08:03] Can't think of RKGI and such

[03:08:05] Benchmark then RKGI

[03:08:06] Mm-hmm before Gemini 3

[03:08:07] Everyone probably forgot the highest

[03:08:10] At that time maybe level 10 or so

[03:08:11] And everyone was like wow

[03:08:13] Hard as climbing to heaven

[03:08:14] Then Gemini 3 made it 30-something

[03:08:17] Then Claude 4.5 or 4.6 became

[03:08:21] 4.6 should have become 60-something

[03:08:23] Then Gemini 3 Deep Think hit 80-something

[03:08:28] So this is also maxed out

[03:08:30] So now it feels like

[03:08:34] Just relying on hitting these publicly recognized model capabilities

[03:08:38] Actually doesn't have much meaning anymore

[03:08:42] And um

[03:08:45] So from this perspective

[03:08:46] I just

[03:08:47] Essentially there aren't too many key points

[03:08:50] Although everyone is releasing very fast

[03:08:52] Mm-hmm

[03:08:54] Releasing fast also shows

[03:08:55] Actually this problem has become easy

[03:08:56] For everyone

[03:08:57] Everyone knows the know-how now

[03:08:59] There are no secrets anymore

[03:09:00] Right, right

[03:09:00] It's still this, it's still that

[03:09:03] It's still that same thing

[03:09:04] The surfing theory, right

[03:09:05] It's still this

[03:09:05] The wave is moving forward

[03:09:09] What's the next goal everyone might be looking for

[03:09:14] What's the next paradigm-level change

[03:09:16] Will there still be one

[03:09:18] Ah, I think

[03:09:19] The two things I just mentioned are

[03:09:21] I think ML coding and long horizon, right

[03:09:24] And these two are

[03:09:26] I think, I think

[03:09:28] Um

[03:09:30] Yes, yes

[03:09:30] I think it might be something that hasn't reached paradigm-level change

[03:09:33] But I think it is

[03:09:35] Something very valuable for Google

[03:09:38] Because first of all, ML coding is

[03:09:41] Because

[03:09:41] Google itself is a major player in AI research

[03:09:44] And it's also the most full-stack in AI research

[03:09:46] That is Not only does it have these model training parts

[03:09:50] It also has hardware design

[03:09:52] The part connecting hardware to models

[03:09:55] If this entire system can be accelerated

[03:09:58] Or better managed

[03:10:01] That could be very valuable for this company

[03:10:03] Long horizon goes without saying

[03:10:04] Everyone knows

[03:10:04] Everyone thinks it's very important

[03:10:07] Right So I think that might be, for me

[03:10:10] Can't say it's paradigm-level

[03:10:11] Definitely not at the paradigm level

[03:10:13] But it's something I think is very valuable

[03:10:15] That needs to be able to, within the next few months

[03:10:19] Show some light at the end of the tunnel, and um

[03:10:25] I think paradigm-level

[03:10:26] Might still be those more uncertain things

[03:10:28] Like multimodal generation, that kind of thing

[03:10:31] I think there might be a hero

[03:10:33] Or a group of heroes

[03:10:35] Haha, and um, right

[03:10:38] That kind of thing might have some

[03:10:40] Um, also talked about a lot is continue learning（持续学习）

[03:10:43] What about world models

[03:10:45] I think continue learning and this kind of long horizon

[03:10:47] Just said there's no fundamental difference with long horizon

[03:10:50] Because, um

[03:10:51] Because people used to think these two things were very different

[03:10:53] It's because Continue learning changes some of the model's weights

[03:10:56] And when you do this kind of

[03:10:58] For example, like open

[03:10:59] Open source

[03:10:59] Everyone does a lot of this kind of

[03:11:01] This kind of context management（上下文管理）

[03:11:02] Doesn't change model weights

[03:11:05] But actually, if you think about it, there's no fundamental difference between these two things

[03:11:06] Because those tokens in the context

[03:11:08] Their own KV cache is also a kind of weight, isn't it

[03:11:11] So

[03:11:11] You think between these two approaches, which one can

[03:11:14] Which one will be more useful

[03:11:15] More useful in the long run

[03:11:16] I think it's unclear

[03:11:17] But essentially they

[03:11:18] Are both for doing what I just mentioned, long horizon

[03:11:20] This type of thing

[03:11:22] And world models

[03:11:26] Ten thousand people have ten thousand world models

[03:11:30] What does that mean? The definition isn't clear

[03:11:31] That is

[03:11:33] First of all, I don't know what a world model is

[03:11:36] And secondly

[03:11:36] When everyone talks about the world models they're building

[03:11:39] They might be talking about different things

[03:11:42] For example, the world model that Gemini builds might be different from

[03:11:44] For example, like

[03:11:45] Fei-Fei Li

[03:11:46] The world models they're building are not the same thing

[03:11:47] Um, sigh

[03:11:49] Describe the difference

[03:11:50] I don't particularly understand what labs like Fei-Fei Li's

[03:11:55] What these labs are doing

[03:11:56] What it's actually like

[03:11:57] But, um

[03:11:58] Gemini's world model is more of a

[03:12:02] It's a kind of end-to-end（端到端） level of training

[03:12:05] The result it wants is that I can, for example

[03:12:08] For example, video generation

[03:12:09] Is that given a description

[03:12:12] Then generate a video

[03:12:13] But the result it wants to achieve is

[03:12:14] Not only can I generate a video

[03:12:15] I am able to generate a scenario

[03:12:17] What is a scenario

[03:12:18] Scenario means I generate

[03:12:20] The state at this moment

[03:12:22] And then I can also give it a condition

[03:12:25] A condition

[03:12:25] This condition is that under this state I did some

[03:12:27] What kind of

[03:12:28] Action

[03:12:29] And then its next moment state

[03:12:30] Will become a function of my previous moment

[03:12:31] State and action

[03:12:33] And it's end-to-end training this kind of capability

[03:12:35] Right so this might be one solution

[03:12:39] And I

[03:12:40] First I don't know

[03:12:41] What result everyone ultimately wants

[03:12:42] And I also don't know what everyone's

[03:12:45] Definition of their own world model is

[03:12:47] So I think it's more of an exploratory state

[03:12:51] We haven't talked about one organization just now, xAI

[03:12:53] We just talked about Anthropic

[03:12:55] Talked about OpenAI

[03:12:56] Talked about DeepMind

[03:12:58] What about xAI

[03:13:00] xAI I don't understand haha

[03:13:02] As a commentator let's talk about it

[03:13:05] Why are they so turbulent recently

[03:13:07] I think they've always been quite turbulent

[03:13:08] Hahaha why so turbulent recently

[03:13:11] I don't know either

[03:13:13] And

[03:13:14] Actually I don't have that much contact with xAI

[03:13:18] And

[03:13:20] Some people I contacted have also left now

[03:13:22] Actually I don't know what happened to them

[03:13:24] Hahaha

[03:13:28] When you were talking about Anthropic just now

[03:13:29] You said

[03:13:30] The technical number one being able to make bets

[03:13:33] Is very important

[03:13:33] Then at Google who is this number one

[03:13:36] Who is this hero

[03:13:38] I think heroes

[03:13:41] Might be different people at different stages

[03:13:43] Mm-hmm but behind every hero there is one person

[03:13:45] Sergey Brin

[03:13:47] Google's cofounder

[03:13:49] Oh right

[03:13:50] I think ultimately many many big decisions

[03:13:56] Might not be decided by him on how to do them

[03:13:58] But in the end he has to be the one to make the final call

[03:14:00] Mm-hmm even now

[03:14:02] What about Demis Hassabis

[03:14:07] I think the person who appears more on the front lines

[03:14:09] Is Koray Kavukcuoglu

[03:14:11] Right

[03:14:12] Yes

[03:14:12] DeepMind CTO

[03:14:13] And he's now also that Google SVP

[03:14:17] Oh what is Demis responsible for

[03:14:19] I think Demis might manage more of those

[03:14:22] Things leaning toward science

[03:14:25] Like for example drug design

[03:14:27] Isomorphic Labs and such things

[03:14:28] Right right right

[03:14:30] Oh Gemini

[03:14:31] He doesn't manage much

[03:14:33] At least from my perspective

[03:14:35] The person I see more is Koray

[03:14:38] Of course it's possible that

[03:14:39] Company management matters

[03:14:40] Actually there are many parts I can't see

[03:14:43] Then I'm not clear about that

[03:14:46] You also mentioned AI is a whole system

[03:14:48] Mm-hmm

[03:14:48] What understanding do you have about how to systematically do AI

[03:14:50] Now

[03:14:53] After these two years of your work

[03:14:55] Several aspects

[03:14:56] One aspect is from the whole system perspective

[03:14:58] It needs a relatively scientific attitude

[03:15:01] That you need to clearly understand like Scaling Law

[03:15:03] You need to clearly understand

[03:15:04] What assumptions you have made

[03:15:06] And when I make a change

[03:15:08] What factors are actually related to it

[03:15:10] What factors are not related

[03:15:12] Right

[03:15:13] And this is from the organizational perspective

[03:15:14] From the people's perspective

[03:15:16] Actually requires people to be very reliable

[03:15:19] Requires very responsible people

[03:15:22] Actually every system

[03:15:25] Every evaluation framework

[03:15:26] Is very easily hacked

[03:15:28] Because you can always do something

[03:15:29] To make your metrics look very good

[03:15:32] But a trustworthy

[03:15:34] Or down-to-earth person

[03:15:35] He would actually think

[03:15:38] If the thing he did works well

[03:15:40] Is it really

[03:15:41] For example effective at large scales

[03:15:43] Did I miss some factors in between

[03:15:45] Right

[03:15:46] Actually doing things systematically

[03:15:48] Sounds like one sentence

[03:15:50] But actually doing it is very complex

[03:15:52] There are many details

[03:15:53] Many resistances

[03:15:54] It actually goes against human nature

[03:15:57] Oh

[03:15:57] Because every individual's human nature

[03:15:59] Might be to make their own things

[03:16:00] Show up better

[03:16:02] But for a company or an organization

[03:16:04] The most beneficial thing

[03:16:05] Is to make the entire company's system

[03:16:07] Very solid systematically

[03:16:08] This is actually the best for you personally

[03:16:09] Because once this system is solid

[03:16:11] You can leverage this system

[03:16:13] To produce more output

[03:16:15] But the bad thing is

[03:16:16] This system will make your individual heroism

[03:16:17] Not shine

[03:16:19] But you can rest assured that others' individual heroism

[03:16:20] Also won't shine

[03:16:25] But if you are in a system

[03:16:27] Where individual heroism can shine

[03:16:28] Then this system might

[03:16:31] Not be particularly stable

[03:16:36] Because one person leaving

[03:16:39] Might cause the entire thing to collapse

[03:16:41] For example like OpenAI

[03:16:45] You say you love to challenge difficult things

[03:16:48] But this industry seems to require

[03:16:50] Doing simple things well repeatedly

[03:16:51] Actually

[03:16:53] I think the so-called simple things

[03:16:54] Doing them well repeatedly

[03:16:56] Is actually a very difficult thing

[03:16:59] Because human nature doesn't like

[03:17:01] Doing repetitive things

[03:17:03] Because the most difficult thing in this industry

[03:17:04] Is actually doing simple things cleanly

[03:17:06] Why

[03:17:10] Because everyone can do simple things

[03:17:11] If you can't do them cleaner than others

[03:17:12] 需要研究员自己对于这个系统

[03:17:14] 怎么运作

[03:17:15] 有一个好的理解

[03:17:16] 然后以及对公司负责任才能做到

[03:17:20] 否则就是你很容易做到一件事

[03:17:22] 就是

[03:17:23] 你可能比如说你在考虑training的时候

[03:17:26] 是比别人好的

[03:17:26] 但你考虑training加sampling时候比别人差

[03:17:29] 你总可以选择你只是有training

[03:17:31] 但这就很糟糕

[03:17:33] 对

[03:17:33] 所以这个就是既需要你个人的负责任

[03:17:35] 又需要说组织所建立的这个体系里

[03:17:39] 能够能尽量的发现这些

[03:17:42] 有意的或者无意的

[03:17:43] 这种边界的事情

[03:17:45] 但是你作为个体的话

[03:17:46] 你不知道怎么样是对全局最好的呀

[03:17:50] 其实是需要

[03:17:51] 我觉得如果一个研究员做不到

[03:17:53] 对全局去考虑的话

[03:17:57] 他就不是一个好的研究员

[03:17:59] 在现在这个时代

[03:18:00] 嗯就是这个

[03:18:02] 我觉得这个

[03:18:03] 和你就是在学术界做research

[03:18:05] 是很不一样的事

[03:18:05] 哦

[03:18:06] 因为在学术界做research

[03:18:07] 本质上是一个人吃饱

[03:18:09] 全家不愁的状态

[03:18:10] 这我为我的项目负责对吧

[03:18:13] 我为我的可重复性负责

[03:18:16] 但是在一个公司里

[03:18:17] 你其实更多的时候是

[03:18:18] 我得为这个公司负责

[03:18:21] 这是两种完全不一样的心态

[03:18:22] 那你这种自觉性从哪里来的

[03:18:26] 不知道哈哈哈哈哈

[03:18:29] 我觉得我可能就是拉不下脸

[03:18:32] 哈哈哈

[03:18:33] 拉不下脸是什么

[03:18:34] 就是

[03:18:35] 你对一个公司负责任

[03:18:36] 是你和这个公司的契约的一部分

[03:18:39] 其实我觉得没什么道理不这么做

[03:18:43] 这么做是没有原因的

[03:18:47] 所以这个人英雄主义会破坏这种整体性

[03:18:51] 我觉得

[03:18:53] If you're just doing it for personal heroism

[03:18:55] and acting on that basis

[03:18:56] it's very likely to undermine the bigger picture

[03:18:59] Of course, in reality you might be very capable

[03:19:00] and you actually become a hero

[03:19:01] that's also possible

[03:19:04] Since you've also been through two organizations

[03:19:06] what kind of organization do you think is better at fostering intelligence

[03:19:08] in this era

[03:19:09] I think

[03:19:12] this is actually a

[03:19:15] very controversial topic

[03:19:17] I mean

[03:19:20] as we were just discussing

[03:19:20] different organizations

[03:19:21] some tend to be more top-down

[03:19:23] some more bottom-up

[03:19:24] so the natural question is

[03:19:25] for example which of these two types fosters more innovation

[03:19:30] The traditional view was

[03:19:31] bottom-up was a necessary condition for fostering innovation

[03:19:33] because everyone needs freedom, right

[03:19:35] only with freedom can there be innovation

[03:19:37] But purely bottom-up

[03:19:39] you find it doesn't actually work either

[03:19:40] because it just becomes chaotic

[03:19:41] That's what Google was like before

[03:19:42] Was it?

[03:19:43] Yes

[03:19:44] At least in my impression

[03:19:45] from what I understand, that's how it was

[03:19:46] It was just chaotic

[03:19:47] People didn't even know

[03:19:48] what the point of what I was doing was

[03:19:50] That might not be great either

[03:19:51] So you probably need someone

[03:19:53] or a small group

[03:19:55] who can blend these two approaches somewhat

[03:19:57] Mm-hmm

[03:19:59] That's why I think

[03:20:01] whether an organization runs well or not

[03:20:04] it looks like an organizational issue

[03:20:06] but ultimately it comes down to the tech leader

[03:20:10] Mm-hmm

[03:20:11] It's about whether this tech leader has the qualities

[03:20:12] to keep the organization running stably

[03:20:16] Because the optimal state

[03:20:18] is often the most unstable one

[03:20:20] It easily collapses toward a worse state

[03:20:23] Right, so you need a leader to control that

[03:20:26] So do you think it should always be the tech leader doing this

[03:20:28] rather than the CEO

[03:20:31] Well of course every company's CEO

[03:20:33] may have different responsibilities

[03:20:34] But there needs to be a leader

[03:20:35] I think you need at least one leader

[03:20:37] who has two qualities

[03:20:40] to be able to do this

[03:20:41] One quality is that they can fight fires themselves

[03:20:47] It's not just talking about what to do

[03:20:49] What to do

[03:20:50] but rather when something really runs into trouble

[03:20:52] they can step in and lead the team

[03:20:54] to solve the problem

[03:20:56] Of course most of the time

[03:20:57] a leader probably

[03:20:58] won't have time to do this

[03:20:59] But at least they have the capability

[03:21:01] The second important quality

[03:21:02] is that they need to understand others

[03:21:07] Even if it's something

[03:21:08] that they wouldn't do themselves

[03:21:09] they can understand

[03:21:11] why what others are doing matters

[03:21:12] They can tolerate and accommodate others

[03:21:14] That might be another quality

[03:21:18] What do you think about Google's TPU

[03:21:19] In what ways does it outperform GPUs

[03:21:22] What are its weaknesses

[03:21:23] I think

[03:21:24] From a purely hardware perspective

[03:21:26] it's hard to say which hardware is truly better or worse

[03:21:28] especially at this kind of large-scale commercial deployment

[03:21:31] Because fundamentally

[03:21:32] GPUs and TPUs

[03:21:34] In terms of usage

[03:21:35] the biggest difference, setting aside

[03:21:37] the hardware differences

[03:21:38] in terms of usage

[03:21:39] the biggest difference is

[03:21:39] GPUs have a better open-source ecosystem

[03:21:42] TPUs don't

[03:21:43] But this actually isn't an issue at large-scale commercial deployment

[03:21:44] It's not a problem

[03:21:45] Because for example, Google itself uses TPUs

[03:21:47] so naturally they'll spend time building

[03:21:49] this infrastructure

[03:21:50] And infrastructure is

[03:21:52] For example, if you're only running a thousand cards

[03:21:54] it could be a heavy burden

[03:21:56] But if you're running a cluster of hundreds of thousands of cards

[03:21:57] then building out the infrastructure

[03:21:58] isn't really that big of a deal

[03:22:01] And in practice

[03:22:02] So basically

[03:22:03] when it comes to large-scale commercial deployment

[03:22:05] neither one is inherently superior or inferior

[03:22:07] But these two do

[03:22:09] have some differences in design philosophy

[03:22:12] Take GPUs, for example

[03:22:13] At least for the more recent GPU generations

[03:22:16] I haven't used them much

[03:22:17] Like the Hopper generation of GPUs

[03:22:19] The H-series GPUs

[03:22:20] The design philosophy is that

[03:22:22] inside one pod (node)

[03:22:23] there might not be that many cards

[03:22:24] say, just eight cards

[03:22:25] and these eight cards can all interconnect with one another

[03:22:27] NVLink (NVIDIA's high-speed interconnect bus) is extremely fast

[03:22:28] So within one pod, there's basically

[03:22:30] no communication bandwidth bottleneck (insufficient bandwidth between GPUs)

[03:22:33] But TPUs take the opposite approach

[03:22:34] It means that

[03:22:35] they've abandoned pairwise interconnection between cards

[03:22:38] but they try as much as possible to

[03:22:39] fit as many cards as possible

[03:22:40] into one big rack

[03:22:42] It has this kind of

[03:22:45] 3D Torus design (3D Torus topology design)

[03:22:47] So each card

[03:22:48] only connects to its three nearest neighbors in three directions

[03:22:50] but the entire cluster can be connected into one big

[03:22:52] Torus

[03:22:53] And if your compilers (compilers)

[03:22:56] or your sharding (data sharding strategy)

[03:22:57] logic is written well enough

[03:23:00] you can take advantage of this architecture

[03:23:02] Effectively speaking

[03:23:04] you get more memory capacity

[03:23:07] and also reduce a lot of communication bounds

[03:23:12] What's the downside?

[03:23:14] I think one downside is that

[03:23:16] compared to GPUs, it definitely

[03:23:19] at least at a small scale

[03:23:21] is more

[03:23:24] of a rigid structure

[03:23:26] So its ease of use

[03:23:28] or its general versatility might not be as strong

[03:23:33] Recently many neo labs have emerged in Silicon Valley

[03:23:35] What do you think of this trend?

[03:23:36] Why are they all leaving

[03:23:38] jumping ship from these big model companies

[03:23:40] to start neo labs

[03:23:41] I don't really get it

[03:23:42] Haha, my feeling is that

[03:23:45] the vast majority of neo labs will die. And

[03:23:50] Well, I think

[03:23:53] some labs genuinely have good people

[03:23:55] And some labs

[03:23:56] might actually be starting to do some real work

[03:23:57] For example, like Thinking Machines

[03:23:59] is still delivering some new things

[03:24:01] But some neo labs

[03:24:04] Please bleep out the names

[03:24:05] Haha, like XXX, that XXX

[03:24:09] I have absolutely no idea what they're trying to do

[03:24:11] And

[03:24:11] These two have actually been away from the field for a long time

[03:24:15] I think in 2026

[03:24:16] China will place a lot of emphasis on the consumer-side narrative

[03:24:19] Who becomes that super app

[03:24:21] What do you think?

[03:24:22] Do you think this

[03:24:22] It seems like nobody in Silicon Valley talks about this

[03:24:25] Right, because American enterprise is just...

[03:24:29] It's companies

[03:24:30] Or rather, the productivity software market is just too big

[03:24:33] and the profit margins are too high

[03:24:36] So for the US

[03:24:37] there was basically only ChatGPT doing consumer before

[03:24:40] and there wasn't much money in it

[03:24:41] Not much profit

[03:24:43] So

[03:24:44] now everyone will probably focus first on

[03:24:46] productivity software

[03:24:46] or enterprise

[03:24:48] And

[03:24:49] So the trends in China and the US have already diverged

[03:24:51] I think

[03:24:52] Not just AI

[03:24:53] The entire internet industry in the past was like this too

[03:24:55] It was all different

[03:24:56] What China is really strong at is the consumer side

[03:24:58] It can come up with, like

[03:25:00] really, really complex product features

[03:25:03] or structures

[03:25:04] and in a way that seems very indirect to you

[03:25:07] In a very unnatural way

[03:25:09] to snowball that profit

[03:25:10] For example

[03:25:11] What do I mean by indirect?

[03:25:12] (laughs)

[03:25:12] Like, take something like Douyin (TikTok)

[03:25:16] It's not like

[03:25:17] you watch a video

[03:25:18] and I charge you 20 cents per video, right?

[03:25:21] It says you can watch videos for free

[03:25:23] but I can quietly slip in ads

[03:25:24] I can quietly do live streaming

[03:25:25] I can quietly do e-commerce

[03:25:28] But that doesn't work for productivity software

[03:25:30] Productivity software is very straightforward

[03:25:32] Like, I help you write code

[03:25:35] My cost is 150 a month

[03:25:36] I sell it to you for 200, I make 50

[03:25:38] It's that straightforward

[03:25:39] Mm

[03:25:40] Yeah, I think what the US has shown in the past

[03:25:43] is that with these very straightforward products

[03:25:45] it can push technology to the extreme

[03:25:47] But there's never been

[03:25:48] a product that felt so sophisticated

[03:25:51] that you can't live without it

[03:25:54] yet you don't feel like it's taking your money

[03:25:56] but it's actually making money from you

[03:26:01] Hearing you say that, I suddenly feel Meta should just copy ByteDance

[03:26:05] Yeah, but I don't think Meta is as strong as ByteDance

[03:26:06] Because Meta can't find its own niche either

[03:26:09] And

[03:26:10] there's no American company doing this

[03:26:13] No one has found the niche that Doubao occupies

[03:26:15] Then Meta should just copy Doubao

[03:26:17] It doesn't need such strong model capabilities either

[03:26:21] But I still think the Americans making products

[03:26:23] fundamentally, the people doing consumer products aren't good enough

[03:26:26] Far behind China

[03:26:29] This is the accumulation of the past decade, right?

[03:26:31] Yeah

[03:26:32] Mm

[03:26:33] Because the positive feedback loop in the US over the past decade

[03:26:36] all came from doing B2B

[03:26:37] A lot of enterprise stuff

[03:26:39] Or it's just too easy to make money in the US

[03:26:42] Mm

[03:26:42] When it's too easy to make money

[03:26:43] you won't rack your brains over how to make money

[03:26:46] Hey

[03:26:46] Haven't a lot of people come to chat with you?

[03:26:48] Any interesting people?

[03:26:51] Oh, well A lot of people from China came

[03:26:53] Tech companies

[03:26:54] I think they're all pretty interesting

[03:26:58] And I did find that Chinese people doing products

[03:27:01] probably think in more sophisticated ways

[03:27:04] More sophisticated

[03:27:05] Yeah, they think more...

[03:27:07] Their thought process is more convoluted

[03:27:09] Yeah, it's a completely different style from the US

[03:27:11] America is like

[03:27:12] As I just said about America

[03:27:14] It's like

[03:27:15] you build something and sell it directly

[03:27:16] Yeah, it's simple

[03:27:18] That's how it is

[03:27:20] You just need this capability

[03:27:23] Once you have it, you just need to be cheaper than others

[03:27:25] Then I can earn more than you

[03:27:28] And you can't do anything about it

[03:27:29] Okay

[03:27:30] What about China?

[03:27:31] China seems to be all about this pattern

[03:27:32] Not making money at first

[03:27:35] But once it starts making money

[03:27:36] you can't stop it

[03:27:37] It's just that

[03:27:38] it can really form that

[03:27:40] that self-sustaining

[03:27:41] that loop

[03:27:43] When it really gets that flywheel spinning

[03:27:44] you can't break in anymore

[03:27:47] Do you think American companies

[03:27:48] understand ByteDance now?

[03:27:51] My feeling is no

[03:27:52] Not yet

[03:27:54] It's already so big

[03:27:56] Oh, you mean whether they take it seriously?

[03:27:58] Of course they do

[03:27:59] Everyone definitely knows

[03:28:00] ByteDance is a severely undervalued

[03:28:02] In terms of its valuation

[03:28:04] It's a severely undervalued company

[03:28:05] I think that's very clear to everyone

[03:28:07] And

[03:28:09] I think it's also clear

[03:28:10] that in the consumer market

[03:28:12] On this end, I actually think

[03:28:13] No American company can compete with ByteDance

[03:28:18] But after all it's a Chinese company

[03:28:22] At least in terms of public perception

[03:28:24] After all it's a Chinese company

[03:28:27] So do people understand it

[03:28:29] I don't think people understand it

[03:28:31] But look at Meta

[03:28:31] It's also actively poaching people from ByteDance

[03:28:34] Mm-hmm, do you have any idols in the AI industry

[03:28:37] Or people you admire

[03:28:40] Although you've been in the AI industry for a short time

[03:28:42] No no no, nothing

[03:28:43] I just feel

[03:28:47] When I came to this industry

[03:28:49] The era of individual heroism had already passed

[03:28:52] So there are no heroes

[03:28:54] Sometimes you even think old-era heroes are a bit stupid

[03:28:57] Ah, right

[03:28:58] So really there's nothing

[03:29:01] Who do you think is quite stupid

[03:29:03] Let's not talk about this

[03:29:04] No comment hahahaha

[03:29:08] Right, I think it's

[03:29:10] Different from doing physics

[03:29:11] I think when doing physics

[03:29:12] There were still some

[03:29:13] People I think really much smarter than me

[03:29:17] Like me

[03:29:18] When I was doing my PhD my young advisor was

[03:29:20] I think he, Douglas Stanford

[03:29:21] I think he's just much smarter than me

[03:29:24] I think he

[03:29:26] Maybe also seeing him

[03:29:27] Made me feel in that field

[03:29:29] Not very useful

[03:29:30] With him around what do they need me for

[03:29:31] Right, haha

[03:29:33] You came to AI to do a dimensionality reduction attack right

[03:29:34] Not a dimensionality reduction attack

[03:29:35] But anyway it feels like AI this thing

[03:29:36] Doesn't really need brains

[03:29:41] Really doesn't need brains

[03:29:42] Then what does it need

[03:29:43] I think this

[03:29:43] The most important trait in this industry

[03:29:46] Is being reliable

[03:29:48] Doing things carefully

[03:29:50] And being responsible for what you do

[03:29:51] This is the most important trait

[03:29:53] You say how much brains those things need, I think

[03:30:00] They're all things undergraduates can do

[03:30:04] But you say AI has no individual heroism

[03:30:06] Now an AI researcher is priced so high

[03:30:09] Like a star player transfer

[03:30:10] I don't know if it's a good thing or bad thing

[03:30:13] For me personally

[03:30:13] Of course I'm very happy

[03:30:15] I benefit from this

[03:30:16] Right hehehe

[03:30:18] But um

[03:30:20] Actually speaking

[03:30:21] I don't know if this thing

[03:30:24] Is a good thing

[03:30:25] Why do you think the price has become so high

[03:30:27] I think maybe on one hand

[03:30:29] Everyone thinks this thing is scarce

[03:30:32] But actually it might not be that scarce

[03:30:34] Because training a person

[03:30:36] Although this thing isn't that hard

[03:30:37] But training a person requires an environment

[03:30:39] You need to have that opportunity to be exposed to this thing

[03:30:42] To learn this thing

[03:30:44] Without that opportunity

[03:30:44] No matter how smart you are it's useless

[03:30:46] Maybe in the past people who could encounter that opportunity

[03:30:49] Weren't that many

[03:30:51] So in the market it might be relatively scarce

[03:30:54] From this perspective

[03:30:54] Mm-hmm

[03:30:56] But I think another aspect is also

[03:30:58] Maybe the hype about people is a bit excessive

[03:31:01] Right

[03:31:02] Really like to mythologize individuals

[03:31:03] Now

[03:31:04] Right I think

[03:31:06] Really

[03:31:08] Just say it again

[03:31:10] This is a collectivist thing haha

[03:31:13] Then many people are also very curious

[03:31:15] Because

[03:31:17] Maybe many companies also want to recruit AI people

[03:31:21] Then you think the most important thing is still being reliable

[03:31:24] What metrics are there for this

[03:31:25] How can you quickly judge whether a person is reliable

[03:31:28] Whether they do things carefully

[03:31:29] Everyone has some methods they use to measure

[03:31:33] I of course also have some of my own tricks

[03:31:36] It's just that I

[03:31:37] I used to design an interview question

[03:31:41] Let me briefly explain it

[03:31:42] This

[03:31:43] It shouldnt be confidential

[03:31:44] So I should be able to talk about it

[03:31:45] Um

[03:31:46] So the interview question is actually quite simple

[03:31:47] I need this person to, within 24 hours,

[03:31:51] complete a reinforcement learning project

[03:31:55] from scratch

[03:31:57] They have to choose on their own

[03:31:59] what kind of model

[03:32:00] I tell them what resources are available

[03:32:01] and they choose what model to use

[03:32:03] what data to use

[03:32:04] what algorithm to use

[03:32:05] and train the model

[03:32:07] Within 24 hours

[03:32:08] I give them 24 hours to get this done

[03:32:11] And after the 24 hours are up

[03:32:12] they'll have a one-hour discussion with me

[03:32:15] So this thing

[03:32:16] isn't that hard in the AI era

[03:32:19] Without AI

[03:32:20] this would be impossible

[03:32:20] No one could do it in 24 hours

[03:32:22] But with AI, it's actually quite easy

[03:32:23] Because AI can do the whole thing for you

[03:32:25] But why still do this?

[03:32:26] There are two reasons

[03:32:27] There are many reasons

[03:32:28] Among them

[03:32:29] Two reasons why it was designed this way

[03:32:31] One reason is that I think in this era, evaluating someone

[03:32:35] like whether they write good code

[03:32:37] is actually useless

[03:32:38] Because most people don't need to write code themselves anymore

[03:32:41] What's more important is

[03:32:44] whether they can effectively leverage AI

[03:32:46] So that's one aspect of evaluating this

[03:32:48] The second aspect is that there's a trap here

[03:32:51] If you let AI do everything

[03:32:53] but you don't really try to understand

[03:32:54] what AI did for you

[03:32:56] you'll be exposed during that one-hour discussion

[03:32:59] That's a

[03:33:00] That's where people fail

[03:33:02] So the other thing this tests

[03:33:04] is whether you've truly formed a collaboration with AI

[03:33:06] Or if you just completely handed it off

[03:33:08] That's something I personally value very much

[03:33:11] That also

[03:33:12] reflects whether this person

[03:33:13] is someone reliable

[03:33:15] Of course, this

[03:33:17] The design of this question itself

[03:33:18] also has some rather dark cleverness to it

[03:33:21] Like why it was designed as 24 hours

[03:33:23] is to see how much this person values this opportunity

[03:33:27] Can they stay up all night

[03:33:27] Right, hahaha

[03:33:29] If they're willing to pull an all-nighter

[03:33:30] they can survive these 24 hours

[03:33:32] If they can't make it

[03:33:34] then it just means

[03:33:34] they probably don't value this opportunity that much

[03:33:36] Haha

[03:33:39] So for people younger than you

[03:33:40] Do you think AI is still

[03:33:42] a blue ocean

[03:33:45] a place with lots of opportunities

[03:33:46] I think purely working on language models

[03:33:48] is no longer a blue ocean

[03:33:50] I think it's too late — the last train has already left

[03:33:53] The last train has already left

[03:33:54] Which last train is that?

[03:33:57] I feel like I got in on that last train

[03:34:00] And there might have been some people after I got in

[03:34:02] some new people

[03:34:03] But I think they won't have the opportunity

[03:34:05] to encounter such good opportunities

[03:34:06] Like being able to

[03:34:08] do something in a relatively small team

[03:34:10] Chances to encounter such opportunities will be rare

[03:34:12] Right, and then

[03:34:14] But I think AI

[03:34:15] is a very vast field

[03:34:18] Language models are just a tiny, tiny part of it

[03:34:20] A very small part

[03:34:21] There are many other things

[03:34:22] Like the multimodal generation we just mentioned

[03:34:24] There may still be many opportunities there

[03:34:26] Robotics probably has even more opportunities

[03:34:28] And even more extreme, there's

[03:34:30] like whether you can use AI

[03:34:32] to help with real scientific problems

[03:34:34] Like helping with

[03:34:37] quantum control and things like that

[03:34:38] Then it might be more blue ocean

[03:34:40] Those are all blue sky things

[03:34:42] Right so

[03:34:43] I think for

[03:34:47] People young enough

[03:34:48] Maybe doing the hottest thing right now

[03:34:50] Is not the right choice

[03:34:52] Doing things no one has done now

[03:34:54] Might be more of a good choice

[03:34:56] Right

[03:34:57] How will you develop in the future

[03:34:59] Will you be at Google for a long time

[03:35:03] I think probably not

[03:35:04] Hahahaha

[03:35:05] Saying this so publicly

[03:35:06] I think probably not

[03:35:10] I think I will still try to challenge myself

[03:35:14] Right and

[03:35:15] Need to torture myself

[03:35:16] Right need to torture myself

[03:35:17] But

[03:35:18] I just might need to find something

[03:35:19] Worth torturing myself for

[03:35:22] If AI is not fundamentally difficult

[03:35:24] Won't you find it boring

[03:35:25] Where is your challenge

[03:35:27] Although it's not difficult

[03:35:28] But knowing and not knowing

[03:35:29] There is still a gap

[03:35:33] From completely not knowing the details

[03:35:36] To slowly understanding the details

[03:35:38] Understanding how it works and such

[03:35:40] These things

[03:35:40] I think still require spending time and effort

[03:35:43] And after you understand

[03:35:43] I think this thing will also be helpful for your future

[03:35:47] Like whether you do product related

[03:35:48] Or develop toward other AI directions

[03:35:51] I think all

[03:35:52] In the long term

[03:35:53] Will be helpful

[03:35:54] Where do you want to develop in the future

[03:35:57] I think anything is possible

[03:35:58] Haha haven't figured out how to torture myself

[03:36:01] You probably won't jump to another big company again

[03:36:04] Probably not

[03:36:05] Mm-hmm

[03:36:06] What differences do you feel between what you learned at Anthropic

[03:36:08] And what you learned at Google DeepMind

[03:36:11] I think they're quite different

[03:36:12] I think Anthropic

[03:36:13] Is where you can understand one thing

[03:36:16] One line, language model

[03:36:17] Every aspect of this line very thoroughly

[03:36:21] It gives you that opportunity

[03:36:24] And at Google

[03:36:24] It's more horizontal

[03:36:26] It has many different aspects

[03:36:28] Many different people

[03:36:29] And you can also see different perspectives

[03:36:30] Also see different research directions

[03:36:33] You can see all of them

[03:36:35] Right

[03:36:35] Anthropic is because it bets firmly enough

[03:36:38] So you can understand more vertically

[03:36:41] Right

[03:36:42] Have you thought about using AI to solve physics problems

[03:36:45] (Your theoretical physics) Someone is doing it

[03:36:48] So I don't think I need to do it haha

[03:36:50] You don't have essential interest in this

[03:36:51] I think this thing

[03:36:52] First

[03:36:53] Currently it's not the highest priority for me

[03:36:57] I think if one day

[03:36:58] I think I solve the highest priority thing on my hands

[03:37:01] And haven't found anything else to do

[03:37:02] I might go do this thing

[03:37:03] What is your highest priority now

[03:37:05] My highest priority now is

[03:37:06] To push the two things I just mentioned

[03:37:07] Oh ML coding and long horizon

[03:37:11] To at least a

[03:37:14] Where colleagues can

[03:37:15] Push it to a relatively

[03:37:16] I think relatively stable state

[03:37:18] That I think is my highest priority

[03:37:21] Of course there might be other priorities later

[03:37:22] But

[03:37:24] Using AI to do physics

[03:37:25] I think is something

[03:37:27] Many people are already trying to do

[03:37:29] One more of me is not too many

[03:37:32] One less of me is not too few

[03:37:33] Might as well let others do it first

[03:37:34] Do you have any physicists you particularly admire

[03:37:37] Not really

[03:37:37] Yes, but there are quite a few

[03:37:40] Don't know where to start

[03:37:41] Hahahaha

[03:37:42] Physicists yes

[03:37:43] AI scientists

[03:37:45] No

[03:37:47] But this is related to a person's growth experience

[03:37:49] I think

[03:37:50] Like

[03:37:51] An adult finds it hard to truly worship a person

[03:37:54] A child might

[03:37:57] Who have you worshipped

[03:37:58] I think in physics

[03:38:00] Actually there are many who are really quite strong

[03:38:05] But those everyone talks about

[03:38:07] People from 100 years ago let's not talk about

[03:38:09] Like Einstein

[03:38:10] Heisenberg and such let's not talk about

[03:38:11] And including everyone later knows

[03:38:13] Like Frank Yang

[03:38:14] Chen-Ning Yang and such let's also not talk about

[03:38:16] And

[03:38:17] Like when I was doing topology before

[03:38:20] Actually there was someone who later also won the Nobel Prize

[03:38:23] That Haldane

[03:38:24] You'll find these people

[03:38:26] Have some abnormal foresight

[03:38:30] They seemed out of place in their era

[03:38:32] But look at Haldane

[03:38:34] When he first did Haldane model and these fractional

[03:38:37] Quantum Hall effect related things

[03:38:40] It was decades away from when everyone finally figured out these topological states

[03:38:42] Many decades later

[03:38:45] Mm-hmm

[03:38:45] At that time he could feel this thing was important

[03:38:47] And kept pushing this thing himself

[03:38:49] I think this is not easy

[03:38:51] Of course I think

[03:38:52] If you really want to find a similar person in AI

[03:38:53] I think maybe Geoffrey Hinton

[03:38:55] When everyone felt this thing

[03:38:58] Was optional or not that certain

[03:39:00] He kept working in this direction

[03:39:01] Then I think

[03:39:02] This might be a hero-level figure

[03:39:05] After him

[03:39:07] AI after that

[03:39:08] I think

[03:39:11] I think there might also be some heroic collectives

[03:39:13] Like for example Transformer

[03:39:15] Noam

[03:39:16] And those

[03:39:18] That

[03:39:19] Ashish

[03:39:20] Niki and them

[03:39:20] That might be a heroic collective

[03:39:23] You said something that made a very deep impression on me

[03:39:26] I don't have any mentors in this industry

[03:39:28] Don't have any old friends

[03:39:29] I can criticize whoever I want

[03:39:31] This might be the benefit of not doing AI

[03:39:33] Hahaha the benefit of not coming from AI

[03:39:35] Right, like

[03:39:38] Really have no burden

[03:39:40] No old-timer is your relative

[03:39:46] So if you think he's stupid

[03:39:47] He is stupid

[03:39:48] Can just say he's stupid directly

[03:39:49] It doesn't matter

[03:39:52] Were you like this before too

[03:39:55] I think I was quite restrained when I was a student

[03:39:57] Oh

[03:39:58] But later I found restraint useless

[03:40:02] No benefit to myself

[03:40:03] No benefit to others either

[03:40:04] Better to be more direct

[03:40:07] Expressing your own ideas is the most critical thing

[03:40:08] I think directly expressing your own ideas

[03:40:10] Is something where in the short term people will definitely hate you

[03:40:13] But in the long term everyone will appreciate

[03:40:17] Who have you heard speaking particularly stupidly recently

[03:40:19] Bleep out that name

[03:40:20] Thank you I think XXX has always been quite stupid, haha

[03:40:25] And consistently stupid, haha

[03:40:29] Could he possibly be the right person

[03:40:32] I think what he says

[03:40:35] In Pauli's words is not even wrong

[03:40:38] Because it's not well-defined

[03:40:39] It's hard to say whether what he says is right or wrong

[03:40:42] Right, like one day

[03:40:44] Maybe a different paradigm happens

[03:40:47] He can jump out and say hey

[03:40:48] I said this this this this back then

[03:40:51] But then you discover

[03:40:51] Maybe if the paradigm were another state

[03:40:53] He could also say the same thing

[03:40:55] This is why I hate this kind of very vague

[03:40:58] Very vague people

[03:41:01] Because a thing being vague is meaningless

[03:41:05] Why do you think he speaks very vaguely

[03:41:06] No correct definition

[03:41:08] Like

[03:41:09] It's kind of ambiguous

[03:41:12] If it has a proper definition

[03:41:13] I can explain why it's properly defined

[03:41:14] But if it doesn't have a proper definition

[03:41:16] I have no way to explain

[03:41:16] Why it isn't properly defined

[03:41:17] Because it really isn't properly defined

[03:41:18] Hahaha

[03:41:20] What about XXX

[03:41:22] I think at least

[03:41:23] I think XXX is still a well-defined thing

[03:41:25] Like, it's trying to do XXX

[03:41:29] And their approach might lean more toward this

[03:41:32] More traditional kind of

[03:41:34] This neural network model approach

[03:41:37] Rather than a more end-to-end approach

[03:41:41] I think at least it's well-defined

[03:41:42] As for whether it's right or wrong

[03:41:44] I think that's something the future will test

[03:41:48] Most old geezers are actually fine

[03:41:49] I think

[03:41:50] I think when people get old

[03:41:51] They don't necessarily turn into old geezers

[03:41:53] When people get old, they split into two types

[03:41:55] One type is the venerable elder

[03:41:58] They might stop nitpicking so much

[03:42:01] And actually put effort into mentoring young people

[03:42:04] The other type is the old geezer

[03:42:05] They don't know what they're talking about

[03:42:06] Yet love to nitpick and boss people around

[03:42:07] Yeah, so getting old doesn't necessarily make you an old geezer

[03:42:10] Hey, who got you all riled up

[03:42:12] I don't even know who got me riled up

[03:42:13] But I've definitely met plenty of old geezers

[03:42:15] Hahaha

[03:42:15] When did you change

[03:42:17] Like, becoming so direct when you speak

[03:42:18] You stopped holding back—you've always thought this way

[03:42:22] But you didn't say it

[03:42:23] I think in the past I might have been pretty direct too

[03:42:27] But not this direct

[03:42:29] But after getting into AI, I became even more direct

[03:42:31] So it's like nothing holding you back, right

[03:42:32] One, there's nothing holding me back

[03:42:33] Two, this field is objective enough

[03:42:36] Like

[03:42:37] You don't really have to worry too much

[03:42:39] About offending people with your opinions

[03:42:40] As long as your views are internally consistent

[03:42:42] Like, you have a coherent framework for your views

[03:42:44] You're not just randomly trashing people

[03:42:45] That would definitely offend people

[03:42:47] You have your own understanding of things

[03:42:50] I think people will actually respect you for it

[03:42:53] Because ultimately, how well you do in this field

[03:42:55] Is judged by objective standards

[03:42:58] Every guest we have recommends a life-changing book

[03:43:02] It has to be a book that genuinely had a major impact on you

[03:43:05] What book would you say

[03:43:07] This is the hardest question of the day

[03:43:12] I feel like you're overestimating my cultural sophistication

[03:43:15] Hahahahaha

[03:43:18] Honestly, I don't really have a life-changing book

[03:43:21] Okay, I read a book recently

[03:43:24] Recently

[03:43:24] Last time Ji Yichao mentioned 'The Line Puppy'

[03:43:29] The book I recently read is Yukawa's autobiography

[03:43:32] Hideki Yukawa's (1949 Nobel Prize in Physics winner) autobiography

[03:43:34] 'Tabibito' (The Traveler)

[03:43:36] And then

[03:43:37] If I had to say, books that left an impression

[03:43:40] First of all, I genuinely don't like reading

[03:43:41] I feel like I'm not very well-read

[03:43:47] And the books I read

[03:43:48] Other than professional ones

[03:43:51] All feel like leisure reading to me

[03:43:53] Like Yukawa's autobiography

[03:43:55] It's essentially leisure reading too

[03:43:57] But I found it quite interesting

[03:43:59] Like

[03:44:00] You get to see

[03:44:02] A scientist who later seemed so successful

[03:44:04] Struggling in his youth

[03:44:07] Very authentic

[03:44:09] And then maybe some other leisure reads

[03:44:13] Like novels and stuff

[03:44:14] There's a novel I really like

[03:44:15] 'From the New World'—it's a Japanese novel

[03:44:19] Yeah, if you really force me to recommend some leisure reading

[03:44:22] I could recommend that one

[03:44:24] Have you watched any movies or anything lately

[03:44:28] TV shows, or played any games

[03:44:32] Nothing at all

[03:44:33] Hahaha

[03:44:35] A favorite food from anywhere in the world

[03:44:37] Sushi, probably

[03:44:39] A favorite place anywhere in the world

[03:44:47] I—I think if you really force me to choose

[03:44:50] I'd probably choose Hawaii

[03:44:52] Because I really love the ocean

[03:44:54] Yeah, but it's hard to say for sure

[03:44:55] Because after I visit more coastal places

[03:44:57] I might have a new favorite

[03:44:58] Hahaha

[03:44:59] Something not many people know

[03:45:00] But probably should

[03:45:05] Don't trust old timers, does that count? Hahaha

[03:45:09] Have you ever been superstitious?

[03:45:12] Hmm

[03:45:14] I

[03:45:15] I haven't, fundamentally

[03:45:16] But I think

[03:45:17] Sometimes superstition can be a way to comfort yourself

[03:45:19] I meant, have you ever been superstitious about old timers?

[03:45:21] Oh, superstitious about old timers

[03:45:22] Never?

[03:45:27] Really, never

[03:45:28] But I probably didn't hate old timers this much before

[03:45:30] Then I started hating them more and more

[03:45:31] Why?

[03:45:35] Maybe it's just that

[03:45:37] When you develop more judgment of your own

[03:45:40] Stupid people just look even stupider

[03:45:42] But they haven't hurt you

[03:45:43] So why hate them?

[03:45:44] It's just stupidity intolerance

[03:45:45] Everyone has stupidity intolerance

[03:45:47] Hey, what's your MBTI?

[03:45:48] No idea

[03:45:50] Why has there been, in recent years,

[03:45:53] I mean, among young people

[03:45:54] Toward older people

[03:45:56] Such an unfriendly term emerging?

[03:45:59] Where does it come from?

[03:46:01] No idea

[03:46:01] No, no, no

[03:46:03] Haven't looked into it

[03:46:03] Could ask Gemini

[03:46:04] Have it do a Deep Research

[03:46:06] See where the term "laodeng" comes from

[03:46:08] So what are the papers that have influenced AI progress the most, in your mind?

[03:46:11] Sequence-to-sequence is one

[03:46:13] And then that

[03:46:15] I think language models

[03:46:17] At the peak of the feature engineering era

[03:46:21] And then

[03:46:24] Scaling Laws is one

[03:46:25] The one by Jared Kaplan

[03:46:27] Their Scaling Laws paper at OpenAI is also one

[03:46:29] It's a paper that introduced this systematic research methodology

[03:46:34] Into the field

[03:46:36] A paper

[03:46:36] Of course, the actual methods in Scaling Laws

[03:46:40] May not have been exactly right

[03:46:43] But it was the first

[03:46:44] To introduce this idea

[03:46:46] I think that's crucial

[03:46:48] Based on your current understanding

[03:46:49] What's a key important bet?

[03:46:52] Long horizon (long-horizon tasks)

[03:46:53] Hahaha

[03:46:55] Our studio is called Language is World Studio

[03:46:57] When you first heard this name

[03:46:58] What were you thinking?

[03:47:00] I think this name is a bit...

[03:47:04] Too normal, too mediocre

[03:47:05] Hahahaha, fair enough, hahahaha

[03:47:10] I think this name is something that

[03:47:13] Maybe ten years ago

[03:47:15] Was a very unique perspective

[03:47:18] But now there's just too much consensus

[03:47:21] I think ten years ago it really was

[03:47:23] Maybe it's been more than ten years now

[03:47:24] Sorry, I feel like I'm getting old too

[03:47:25] Maybe it's been more than ten years

[03:47:26] Like around 2014, 2015

[03:47:30] Everyone thought vision was the most important thing

[03:47:34] Back then

[03:47:35] I think realizing

[03:47:36] That language is an important carrier of intelligence

[03:47:39] Was probably something different

[03:47:41] But I don't think our name

[03:47:43] Was meant in an AI context

[03:47:45] Hmm

[03:47:50] Hahaha

[03:47:52] Well then that's worth deep thought, hahaha

Yao Shunyu: Let Me Go a Little Crazy! Training Models at Anthropic & Gemini, Heroism Is Over

Full Transcript

Full Transcript

Full Transcript (Bilingual)

Summary

Key points

摘要 / Summary (zh-CN)

要点

Cite this page