# Andrej Karpathy: From Vibe Coding to Agentic Engineering

https://www.youtube.com/watch?v=96jN2OCOfLs
Translation: zh-CN

[00:02] We're so excited for our very first special guest.
  我们非常激动能请到我们的第一位特别嘉宾。

[00:06] He has helped build modern AI, then explain modern AI, and then occasionally rename modern AI.
  他帮助构建了现代人工智能，然后解释了现代人工智能，偶尔还给现代人工智能改名。

[00:14] He actually helped co-found open AAI right inside of this office.
  他实际上是在这间办公室里共同创立了Open AAI的。

[00:18] Was the one who actually got Autopilot working at Tesla back in the day, and he has a rare gift of making the most complex technical shifts feel both accessible and inevitable.
  他是在特斯拉早期让Autopilot投入工作的，他有一种罕见的才能，能让最复杂的技术转变既易于理解又显得顺理成章。

[00:30] You all know him for having coined the term vibe coding last year, but just in the last few months, he said something even more startling.
  大家都知道他去年创造了“vibe coding”这个词，但在过去的几个月里，他说了一些更令人震惊的话。

[00:38] That he's never felt more behind as a programmer.
  那就是他从未感觉自己作为一个程序员如此落后。

[00:41] That's where we're starting today.
  这就是我们今天节目的开端。

[00:43] Thank you, Andre, for joining us.
  谢谢你，Andre，加入我们。

[00:44] Yeah. Hello. Excited to be here and to kick us off.
  是的。你好。很高兴来到这里并开始我们的节目。

[00:47] Okay. So, just a couple months ago, you said that you've never felt more behind as a programmer.
  好的。那么，就在几个月前，你说你从未感觉自己作为一个程序员如此落后。

[00:53] That's startling to hear from you of all people.
  听到你这么说真是令人震惊。

[00:55] Um, can you help us unpack that?
  嗯，你能帮我们详细解释一下吗？

[00:57] Was that feeling exhilarating or unsettling?
  那种感觉是令人振奋还是令人不安？

[01:00] Uh yeah, a mixture of both for sure. Uh
  呃，是的，两者兼有，肯定是的。呃

[01:02] Uh yeah, a mixture of both for sure.
  呃是的，肯定两者兼而有之。

[01:05] Uh well, first of all, um well, first of all, um I guess like as many of you, I've been using agentic tools like lot code, adjacent things, uh for a while, maybe over the last year as it came out and it was very good at you know chunks of code and sometimes it would mess up and you have to edit them and it was kind of helpful and then I would say December was this uh clear point where for me I was on a break so I had a bit more time.
  呃嗯，首先，嗯首先，嗯我想像你们中的许多人一样，我一直在使用像 lot code 这样的代理工具，以及相关的东西，呃已经有一段时间了，可能是在过去一年它出来的时候，它在你知道的代码块方面做得很好，有时它会出错，你必须编辑它们，这有点帮助，然后我想说十二月对我来说是一个明确的转折点，因为我在休假，所以我有更多的时间。

[01:22] I think many other people were similar and uh I just started to notice that with the latest models uh the chunks just came out fine and then I kept asking for more and it just came out fine and then I can't remember the last time I corrected it and then I was I just you know trusted the system more and more and then I was vibe coding [laughter] and uh so it was kind of a I do think that it was a very stark transition.
  我想很多人也一样，呃我开始注意到，随着最新的模型，呃代码块就出来了，然后我不断要求更多，它就出来了，然后我记不清上次纠正它是什么时候了，然后我就是，你知道的，越来越信任这个系统，然后我就是在进行随意的编码（笑），呃所以它有点像，我确实认为这是一个非常鲜明的转变。

[01:43] I think that a lot of people actually I tried to I tried to stress this on uh Twitter and or X because I think a lot of people experienced AI last year as ChachiPT adjacent thing.
  我认为很多人实际上，我试图，我试图在推特或 X 上强调这一点，因为我认为去年很多人将 AI 体验为 ChatGPT 的附属事物。

[01:54] Uh but you really had to look again and you had to look as of December uh because things have changed fundamentally and uh especially on this like agentic coherent workflow uh that really started to
  呃但你真的不得不再次审视，而且你不得不从十二月开始审视，呃因为事情已经发生了根本性的变化，呃尤其是在这种代理的连贯工作流程上，呃它确实开始

[02:04] workflow uh that really started to actually work.
  工作流程，嗯，它确实开始奏效了。

[02:07] actually work.
  奏效了。

[02:07] Um, and so I would say that um, yeah, it was just that realization that really uh, uh, had me um, go down their whole rabbit hole of just, you know, infinity side projects.
  嗯，所以我想说，嗯，是的，这只是那个认识，它真的，嗯，嗯，让我陷入了那个无尽的副业项目的整个兔子洞。

[02:14] just, you know, infinity side projects.
  只是，你知道，无尽的副业项目。

[02:16] Uh, my side projects folder is like extremely full with lots of random things and, uh, just, uh, V coding all the time.
  呃，我的副业项目文件夹里有很多随机的东西，嗯，只是，嗯，一直在写代码。

[02:21] things and, uh, just, uh, V coding all the time.
  东西，嗯，只是，嗯，一直在写代码。

[02:23] Uh, so, uh, yeah, that kind of happened in December, I would say, and I was looking at the repercussions of that since.
  呃，所以，嗯，是的，这大概发生在十二月，我想说，我一直在看那件事的后果。

[02:26] was looking at the repercussions of that since.
  一直在看那件事的后果。

[02:28] since.
  以来。

[02:28] >> Um, you've talked a lot about this idea of LLMs as a new computer.
  >> 嗯，你已经谈了很多关于将LLM视为新计算机的想法。

[02:33] um that it isn't just better software, it's a whole new computing paradigm.
  嗯，它不仅仅是更好的软件，它是一种全新的计算范式。

[02:35] And um software 1.0 was explicit rules, software 2.0 was learned weights, software 3.0 is this.
  嗯，软件1.0是显式规则，软件2.0是学习到的权重，软件3.0是这个。

[02:41] learned weights, software 3.0 is this.
  学习到的权重，软件3.0是这个。

[02:43] Um if that's actually true, what does a team build differently the day they actually believe this,
  嗯，如果这是真的，那么团队在真正相信这一点的那一天会有什么不同的构建方式？

[02:48] actually believe this,
  真正相信这一点，

[02:50] >> right? So uh yeah, exactly. So software 1.0, I'm writing code, software 2.0, I'm actually programming by creating data sets and training uh training neural networks.
  >> 对？所以嗯，是的，没错。所以软件1.0，我写代码，软件2.0，我通过创建数据集和训练嗯训练神经网络来实际编程。

[02:57] actually programming by creating data sets and training uh training neural networks.
  通过创建数据集和训练嗯训练神经网络来实际编程。

[02:59] So the programming is kind of like arranging data sets and maybe some objectives and neural network architectures.
  所以编程就像是安排数据集和一些目标以及神经网络架构。

[03:02] like arranging data sets and maybe some objectives and neural network architectures.
  像是安排数据集和一些目标以及神经网络架构。

[03:03] And then what happened is
  然后发生的是

[03:05] architectures.
  架构。

[03:07] And then what happened is that basically if you train one of these GPT models or LLMs on a sufficiently large set of tasks implicit basically um implicitly because by training on the internet you have to multitask all the things that are in the data set.
  然后发生的事情是，基本上，如果你在足够大的任务集上训练这些 GPT 模型或 LLM 中的一个，基本上是隐含的，因为通过在互联网上训练，你必须同时处理数据集中的所有任务。

[03:20] Uh these actually become kind of like a programmable computer in a certain sense.
  呃，这些实际上在某种意义上变成了一种可编程的计算机。

[03:21] So software 3.0 know is kind of about uh you know your programming now turns to prompting and what's in the context window is your lever over the interpreter that is the LLM that is kind of like interpreting your context and uh performing computation in the dig digital information space.
  所以软件 3.0 的知识基本上是关于，你知道，你的编程现在变成了提示，而上下文窗口中的内容是你相对于解释器（即 LLM）的杠杆，它就像在解释你的上下文并进行数字信息空间的计算。

[03:37] So I guess um yeah that's kind of the transition and I think there's a few examples of that really drove it home for me and maybe that might be instructive.
  所以我想，嗯，是的，这大概就是转变，我认为有几个例子对我来说真的很有启发性，也许这可能是有指导意义的。

[03:44] Uh so for example when you when openclaw came out when you want to install openclaw you would expect that normally this is a bash bash script like a shell script.
  呃，所以例如，当 openclaw 出来时，当你想要安装 openclaw 时，你通常会期望这是一个 bash 脚本，像一个 shell 脚本。

[03:52] So run the shell script to run to install open claw.
  所以运行 shell 脚本来运行以安装 open claw。

[03:57] Um but the thing is that in order to target lots of different platforms and lots of different types of computers you might run an open claw.
  嗯，但问题是，为了针对许多不同的平台和许多不同类型的计算机，你可能会运行一个 open claw。

[04:03] This these shell scripts usually balloon up and become extremely complex.
  这些 shell 脚本通常会膨胀起来，变得极其复杂。

[04:05] But the thing is you're still stuck in a
  但问题是你仍然被困在一个

[04:06] thing is you're still stuck in a software 1.0 universe of wanting to write the code.
  事情是你仍然被困在软件1.0的宇宙中，想要编写代码。

[04:09] And actually the open claw installation is a is a copy paste of a b bunch of text that you're supposed to give to your agent.
  实际上，开放式爪子安装是复制粘贴一段文本，你应该把它交给你的代理。

[04:15] Uh so basically it's it's a little skill of uh you know copy paste this and give it to your agent and it will install open claw.
  呃，所以基本上它是一个小技巧，你知道复制粘贴这个然后把它交给你的代理，它就会安装开放式爪子。

[04:20] And the reason this is a lot more powerful is you're working now in the software 3.0 paradigm where you don't have to precisely spell out you know all the individual details of that setup.
  而这之所以强大得多，是因为你现在正在软件3.0范式下工作，你不需要精确地拼写出你知道的那个设置的所有细节。

[04:29] The agent has its own intelligence that it packages up and then it kind of like follows the instructions and it looks at your environment, your computer and it kind of like performs intelligent actions to make things work and it debugs things in the loop and it's just like so much more powerful, right?
  代理有自己的智能，它会打包起来，然后它会遵循指令，它会查看你的环境、你的电脑，然后它会执行智能操作来让事情正常工作，它会在循环中调试事物，这真的强大得多，对吧？

[04:43] So I think that's a very different kind of like way of thinking about it is just like what is the piece of text to copy paste to your agent?
  所以我认为这是一种非常不同的思考方式，就像是你要复制粘贴什么文本给你的代理？

[04:47] That's the programming paradigm.
  这就是编程范式。

[04:50] Now I think one more maybe uh example that comes to mind that is even more extreme than that is when I was building um menu genen.
  现在我认为还有一个可能更极端的例子就是当我构建菜单生成器时。

[04:54] So, menu genen is this idea where you um you come to a restaurant, they give you a menu.
  所以，菜单生成器是这样的想法，你来到一家餐馆，他们给你一份菜单。

[05:01] There's no pictures usually.
  通常没有图片。

[05:03] So, I don't know what any of these things are uh usually like 30% of the things I have
  所以，我不知道这些东西是什么，通常我有的东西有30%

[05:07] uh usually like 30% of the things I have no idea what they are, 50%.
  嗯，通常我不知道的东西占 30%，50%。

[05:09] So, I wanted to take a photo of the restaurant menu and to get pictures of what those things might look like in a generic sense.
  所以，我想拍下餐厅菜单的照片，并获取那些东西大概是什么样子的图片。

[05:16] And so I built I've vcoded this app that basically lets you upload a photo and it does all this stuff and it runs on Verscell and uh it basically rerenders the menu and it gives you like all the items and it gives you a picture that it uses an image um you know generator uh for to basically OCR all the different titles uh use the image generator to get pictures of them and then shows it to you.
  于是我构建了一个应用程序，你可以上传一张照片，它会做所有这些事情，它运行在 Verscell 上，它会重新渲染菜单，给你列出所有项目，并给你一张图片，它使用了一个图像生成器来识别所有不同的标题，然后用图像生成器获取它们的图片，然后展示给你。

[05:39] And then I saw the software 3.0 version of this which is which blew my mind which is literally just take your photo give it to Gemini and say use Nanabanana to overlay the the things onto the menu.
  然后我看到了这个软件的 3.0 版本，它让我大吃一惊，它就是把你的照片交给 Gemini，然后说用 Nanabanana 把这些东西叠加到菜单上。

[05:51] Uh and Nanabanana basically returned an image that is exactly the picture of the menu that I took but it actually put into the pixels it rendered the different things in the menu and this blew my mind because actually all of my menu gen is spirious.
  嗯，Nanabanana 返回了一张图片，和我的菜单图片一模一样，但它实际上在像素中渲染了菜单上的不同项目，这让我大吃一惊，因为我所有的菜单生成都是虚假的。

[06:04] It's working in the old paradigm that app shouldn't exist. uh and uh yeah the
  它还在旧的范式下工作，这个应用程序不应该存在。嗯，是的，

[06:09] app shouldn't exist.
  应用程序不应该存在。

[06:11] uh and uh yeah the software 3.0 paradigm is a lot more kind of raw.
  呃，是的，软件 3.0 范式更加原始。

[06:14] It just um your neural network is doing more and more of the work and your prompt or context is just the image and the output is an image and there's no need to have any of the app in between.
  它只是，嗯，你的神经网络正在做越来越多的工作，而你的提示或上下文只是图像，输出也是图像，而且不需要在中间有任何应用程序。

[06:21] Um so I think that people have to kind of like reframe you know not to work in existing paradigm of what things existed and just think about it as a speed up of what exists.
  嗯，所以我想人们必须重新构想，你知道，不要在已有的事物范式中工作，而只是将其视为现有事物的加速。

[06:33] It's actually like new things are available now.
  实际上，现在有新的事物可用了。

[06:36] And going back to your programming question, it's not even I think that's also an example of working in the in the old mindset because it's not just about programming and programming becoming faster.
  回到你的编程问题，我认为这也是一种旧思维模式的工作方式，因为它不仅仅是关于编程和编程变得更快。

[06:42] This is more general information processing that is automatable now.
  这是现在可以自动化的更通用的信息处理。

[06:47] So um it's not just even about code.
  所以，嗯，这甚至不仅仅是关于代码。

[06:49] So previous code worked over kind of like structured data, right?
  所以以前的代码是针对结构化数据工作的，对吧？

[06:53] And uh you write code over structured data.
  嗯，你编写针对结构化数据 else 的代码。

[06:55] But like for example with my LLM knowledge basis project um basically you get LLMs to create wikis for your organization or for you in person etc.
  但例如，通过我的 LLM 知识库项目，嗯，基本上你可以让 LLM 为你的组织或你个人创建维基等。

[07:03] This is not even a program.
  这甚至不是一个程序。

[07:04] This is not something that could exist before because there was no there was no code that would create a knowledge base based on a bunch of
  这是以前不存在的东西，因为没有代码可以根据一堆...

[07:09] knowledge base based on a bunch of facts.
  基于大量事实的知识库。

[07:11] But now you can just take these documents and uh basically uh recompile them in a different way and uh reorder them and create something that is uh new and interesting uh as a reframing of the data.
  但现在你可以直接获取这些文档，然后基本上以不同的方式重新编译它们，并重新排序它们，创造出一些新的、有趣的东西，作为对数据的重新阐释。

[07:22] And so these are new things that weren't possible.
  所以这些是以前不可能实现的新事物。

[07:24] Uh and so I think this is uh something that I keep trying to get back to as to not only what can we do that existed that is faster now but I think there's new opportunities of just things that couldn't be possible before and I almost think that that's more exciting.
  所以我认为这是我一直试图回归的东西，不仅仅是我们可以做什么已经存在的东西，现在更快了，但我认为有新的机会，只是以前不可能实现的事情，我几乎认为这更令人兴奋。

[07:37] exciting.
  令人兴奋的。

[07:37] I love the menu genen progression and dichotomy that you laid out and I think even I'm sure many folks here followed your own progression of programming from last October to early January February this year.
  我喜欢你提出的菜单生成进展和二分法，我想即使是这里的许多人也关注了你从去年十月到今年一月、二月初的编程进展。

[07:51] Um, if you extrapolate that further, what is the 2026 equivalent um, for building websites in the '90s, building mobile apps in the 2010s, building SAS um, in the last cloud era, what will look completely obvious in hindsight that is still mostly unbuilt today?
  嗯，如果你进一步推断，那么2026年相当于90年代建网站、2010年代建移动应用、最后一个云计算时代建SaaS的等价物是什么？回想起来会完全显而易见，但今天仍然大部分未被构建出来的东西是什么？

[08:08] today?
  今天？

[08:10] Um, well, going with the example of menu, I guess, uh, so a lot of this code shouldn't exist and it's just neural network doing most of the work.
  嗯，好吧，以菜单为例，我想，呃，所以很多这样的代码都不应该存在，它只是神经网络在做大部分工作。

[08:15] Um I do think that the extrapolation looks very weird because you could basically imagine I don't I yeah so you could imagine completely neural computers in a certain sense.
  嗯，我认为外推看起来非常奇怪，因为你基本上可以想象，我不，是的，所以你可以想象在某种意义上完全是神经网络计算机。

[08:25] You feed raw videos like imagine a device you takes raw videos or audio into basically what's a neural net and uh uses diffusion to render a UI that is kind of like you know unique for that moment in a certain sense.
  你输入原始视频，就像想象一个设备，它将原始视频或音频输入到一个基本上是神经网络的东西中，然后使用扩散来渲染一个在某种意义上对那个时刻来说是独一无二的用户界面。

[08:37] And um I kind of feel like in the early days of computing actually people were a little bit confused as to whether computers would look like calculators or computers would look like neural nets.
  而且，嗯，我有点觉得在计算的早期，人们实际上有点困惑，计算机是看起来像计算器还是像神经网络。

[08:46] And in 50s and 60s it was not really obvious which way would go and of course we went down the calculator path and ended up building classical computing.
  在 50 年代和 60 年代，并不清楚会走向哪条路，当然，我们选择了计算器这条路，最终建立了经典计算。

[08:53] And then neural nets are currently running virtualized on existing computers.
  然后神经网络目前在现有计算机上虚拟运行。

[08:58] But you could imagine I think that uh a lot of this will flip and that the neural net becomes kind of like the host process and uh the CPUs become kind of like the co-processor.
  但你可以想象，我认为这很多都会翻转，神经网络将成为类似主进程的东西，而 CPU 将成为类似协处理器的东西。

[09:05] So we saw the diagram of you know intelligence compute is going to of neural networks is going.
  所以我们看到了那个图表，你知道智能计算将是神经网络的。

[09:10] is going to of neural networks is going to take over and become the dominant.
  神经网络将接管并成为主导。

[09:13] to take over and become the dominant spend of flops so you could imagine.
  接管并成为主导的计算量，所以你可以想象。

[09:14] spend of flops so you could imagine something really weird and foreign when.
  计算量，所以你可以想象一些非常奇怪和陌生的事情，当。

[09:17] something really weird and foreign when where neural nets are doing most of the.
  一些非常奇怪和陌生的事情，当神经网络正在做大部分工作时。

[09:18] where neural nets are doing most of the heavy lifting.
  神经网络正在做大部分繁重的工作。

[09:20] heavy lifting.
  繁重的工作。

[09:20] They're using tool use as this like you know um historical.
  它们使用工具使用，就像你知道的，嗯，历史上的。

[09:22] this like you know um historical appendage for some kinds of like.
  就像你知道的，嗯，历史上的附属物，用于某些类型的。

[09:24] appendage for some kinds of like deterministic tasks.
  附属物，用于某些类型的确定性任务。

[09:25] deterministic tasks.
  确定性任务。

[09:25] Uh but what's really running the show is these uh neural nets that are in a certain way.
  呃，但真正起主导作用的是这些嗯，以某种方式运行的神经网络。

[09:27] really running the show is these uh neural nets that are in a certain way.
  真正起主导作用的是这些嗯，以某种方式运行的神经网络。

[09:29] neural nets that are in a certain way.
  神经网络以某种方式运行。

[09:29] Um so you can imagine something.
  嗯，所以你可以想象一些东西。

[09:31] Um so you can imagine something extremely foreign as the extrapolation.
  嗯，所以你可以想象一些极其陌生的东西作为推断。

[09:33] extremely foreign as the extrapolation but I think we're going to probably get.
  极其陌生的作为推断，但我认为我们可能会。

[09:34] but I think we're going to probably get there uh sort of piece by piece.
  但我想我们可能会一点一点地到达那里。

[09:36] there uh sort of piece by piece.
  那里，嗯，一点一点地。

[09:36] Um and I don't yeah that that progression is.
  嗯，我不知道，是的，那个进展是。

[09:39] I don't yeah that that progression is TBD I would say.
  我不知道，是的，那个进展有待确定，我会这么说。

[09:41] TBD I would say.
  有待确定，我会这么说。

[09:41] >> [snorts]
  >> [哼哧]

[09:41] >> I'd like to talk a little bit about um.
  >> 我想稍微谈谈嗯。

[09:43] I'd like to talk a little bit about um uh this concept of verifiability, the.
  我想稍微谈谈嗯，呃，这个可验证性的概念，

[09:45] uh this concept of verifiability, the fact that AI will automate faster and.
  呃，这个可验证性的概念，人工智能将更快地自动化，并且。

[09:47] fact that AI will automate faster and more easily domains where the output can.
  人工智能将更快、更容易地自动化输出可以的领域。

[09:49] more easily domains where the output can be verified.
  更容易的领域，其中输出可以被验证。

[09:52] be verified.
  被验证。

[09:52] Um if that framework is right, what work is about to move much.
  嗯，如果那个框架是正确的，什么工作即将大大推进。

[09:54] right, what work is about to move much faster than people realize and what.
  正确的，什么工作将比人们意识到的更快地推进，以及什么。

[09:56] faster than people realize and what professions do we have that people.
  比人们意识到的更快，以及我们有哪些职业，人们。

[09:58] professions do we have that people actually think are safe but that are.
  职业，人们实际上认为安全但实际上。

[10:00] actually think are safe but that are actually highly verifiable?
  实际上认为安全但实际上高度可验证？

[10:02] actually highly verifiable?
  实际上高度可验证？

[10:02] Uh yes.
  呃，是的。

[10:02] So I I spent uh some time writing about verifiability and um.
  所以我嗯，花了一些时间写关于可验证性，以及嗯。

[10:05] So I I spent uh some time writing about verifiability and um basically like traditional computers can.
  所以我嗯，花了一些时间写关于可验证性，以及嗯，基本上像传统计算机可以。

[10:07] writing about verifiability and um basically like traditional computers can easily automate what you can specify in
  写关于可验证性，以及嗯，基本上像传统计算机可以轻松自动化你可以指定的内容

[10:12] easily automate what you can specify in code and uh kind of this latest round of LLMs can easily automate what you can uh verify in a certain in a certain sense.
  可以轻松地自动化您可以在代码中指定的内容，以及呃，这一轮最新的大型语言模型可以轻松地自动化您可以在某种意义上验证的内容。

[10:19] because the way this works is that when frontier labs are training these LLMs these are giant reinforcement learning environments.
  因为它的工作方式是，当前沿实验室在训练这些大型语言模型时，它们是巨大的强化学习环境。

[10:24] So they are given verification rewards and then because of the way that these models are trained they end up basically uh progressing and creating these like jagged entities that really peak in capability in kind of like verifiable domains like math and code and adjacent and kind of like stagnate and are a little bit um you know rough around the edges when uh things are not kind of like in that in that space.
  所以它们会获得验证奖励，然后由于这些模型的训练方式，它们最终基本上会呃，进步并创造出这些锯齿状的实体，这些实体在像数学和代码以及相邻的可验证领域的能力方面达到顶峰，并且有点停滞不前，而且在呃，当事情不像那样处于那个空间时，它们有点呃，你知道，粗糙。

[10:45] So I think the reason I wrote about verifiability is I'm trying to understand why these things are so jagged.
  所以我想我写关于可验证性的原因是我试图理解为什么这些东西如此锯齿状。

[10:49] Um and some of it has to do with how the labs train the models but I think some of it also has to do with um the focus of the labs and what they happen to put into the data distribution.
  嗯，其中一些与实验室如何训练模型有关，但我认为其中一些也与呃，实验室的重点以及它们碰巧放入数据分布的内容有关。

[10:58] Uh because some things basically are significantly more valuable in economy and end up creating more environments because the labs wanted to work in those settings.
  呃，因为有些东西在经济上基本上更有价值，并最终创造出更多的环境，因为实验室想在那些环境中工作。

[11:05] So I think code is a good example of that.
  所以我想代码是其中的一个好例子。

[11:06] There's probably lots of verifiable environments they could think about that happen not to make it into the mix because they're just not that useful to
  可能有很多可验证的环境，它们可以想到，但没有被纳入其中，因为它们根本没有那么有用。

[11:13] because they're just not that useful to have the capability around.
  因为它们只是不那么有用，拥有这项能力。

[11:15] Um, but I think to me the big um I guess like the big mystery is uh the favorite example for a while was that how many letters are are in a strawberry and the models would famously get this wrong and it's an example of jaggedness.
  嗯，但我想对我来说，最大的嗯，我猜想，最大的谜团是，嗯，一段时间以来，最喜欢的例子是草莓中有多少字母，而模型却会出名地弄错，这是锯齿状的一个例子。

[11:27] Uh the models now patch this I think but the new one is I want to go to a car wash to wash my car and it's 50 meters away.
  嗯，我认为模型现在已经修复了这一点，但新的一个是我要去洗车场洗我的车，它离这里有50米远。

[11:32] Should I drive or should I walk?
  我应该开车还是走路？

[11:36] And state-of-the-art models today will tell you to walk because it's so close.
  而如今最先进的模型会告诉你走路，因为它太近了。

[11:41] How is it possible that state-of-the-art Opus 4.7 will simultaneously refactor a 100,000 like [laughter] codebase line codebase or find zero day vulnerabilities and yet tells me to walk to this car wash?
  最先进的 Opus 4.7 如何能够同时重构一个 100,000 行的代码库，或者找到零日漏洞，却又告诉我走路去这个洗车场？

[11:52] This is insane.
  这太疯狂了。

[11:56] And to whatever extent these uh models are remain jagged, it's an indication that number one maybe something's slightly off or um number two you need to actually be in the loop a little bit and you need to treat them as tools and you do have to kind of stay in touch with what they're doing.
  而无论这些嗯模型在多大程度上保持锯齿状，这都表明第一，也许有些东西有点不对劲，或者嗯，第二，你需要稍微参与其中，你需要把它们当作工具来对待，而且你确实需要与它们正在做的事情保持联系。

[12:11] And so I think all of my writing long story short about
  所以我想我所有的写作，长话短说，关于

[12:14] of my writing long story short about verifiability is just trying to verifiability is just trying to understand um why these things are jacked.
  我的写作长话短说，关于可验证性，只是试图理解，嗯，为什么这些东西会被搞砸。

[12:18] Is there any pattern to it?
  有什么规律吗？

[12:20] And I think it's some kind of a combination of verifiable plus labs care.
  我认为这是可验证性和实验室关怀的某种结合。

[12:25] Maybe one more anecdote that is instructive is uh from GPT 3.5 to GPT4 people noticed that chess improved a lot and I think a lot of people thought oh well it's just a progression of the capabilities but actually it's it's more that uh I think this is public information I think I saw it on the internet um a huge amount of like um data of chess made it into the pre-training set and just because it's in a data distribution uh basically the model improved a lot more than it would just by default.
  也许还有一个有启发性的轶事是，从 GPT 3.5 到 GPT4，人们注意到国际象棋有了很大的进步，我认为很多人认为哦，这只是能力的进步，但实际上，我认为这是公开信息，我在网上看到过，大量的国际象棋数据被纳入了预训练集，仅仅因为它们存在于数据分布中，模型就比默认情况下有了更大的改进。

[12:53] So someone at OpenAI decided to add this data and now you have a capability that just peaked a lot more.
  所以 OpenAI 的某个人决定添加这些数据，现在你拥有了一个能力，它得到了极大的提升。

[12:58] And so that's why I think I'm stressing this um dimension of it as we are slightly at the mercy of whatever the labs are doing, whatever they happen to put into the mix.
  所以这就是为什么我认为我强调了它的这个方面，因为我们有点听凭实验室的摆布，无论他们往里面放什么。

[13:06] And you have to actually explore this thing that they give you that has no manual.
  你必须真正探索他们给你的这个没有说明书的东西。

[13:10] And it works in certain settings, but maybe not in some settings.
  它在某些设置下有效，但在某些设置下可能无效。

[13:11] And you have to kind of um explore it a little bit.
  你必须有点儿，嗯，稍微探索一下它。

[13:16] of um explore it a little bit.
  嗯，稍微探索一下。

[13:17] And uh if you're in the circuits that were part of the RL, you fly.
  而且，如果你在RL的电路中，你就会成功。

[13:19] And if you're in the circuits that are out of the data distribution, uh you're going to struggle and you have to kind of figure out which which circuits you're in in your application.
  如果你在数据分布之外的电路中，你将会举步维艰，你必须弄清楚你在应用程序中的哪些电路。

[13:26] And if you and if you're not in the circuits, then you have to really look at fine-tuning and doing some of your own work because it's not going to necessarily come out of the LLM out of the box.
  如果你不在这些电路中，那么你必须真正关注微调并做一些自己的工作，因为它不一定会开箱即用地从LLM中出来。

[13:36] I'd love to come back to the concept of jagged intelligence in a little bit.
  我很想稍后再回到“锯齿状智能”的概念上来。

[13:40] Um, if you are a founder today and thinking about building a company, you are trying to solve a problem that you think is tractable, something that uh is a domain that is verifiable, but you look around and you think, "Oh my gosh, well, the labs have really really started uh getting to escape velocity in the ones that seem most obvious, math, coding, and others."
  嗯，如果你今天是一位创始人，正在考虑创办一家公司，你试图解决一个你认为可以解决的问题，一个可验证的领域，但你环顾四周，你会想，“我的天哪，实验室在那些最明显的领域，比如数学、编码和其他领域，真的真的开始达到逃逸速度了。”

[14:00] What would your advice be to to the founders in the audience?
  你会给在场的创始人什么建议？

[14:08] Um so I think maybe that comes to the previous question of I do think that verifiability because it um let me think.
  嗯，所以我想这可能涉及到之前的问题，我确实认为可验证性，因为它，嗯，让我想想。

[14:12] So verifiability makes something
  所以可验证性使某事

[14:17] think.
  思考。

[14:17] So verifiability makes something tractable in the current paradigm tractable in the current paradigm because you can throw a huge amount of RL at it.
  因此，可验证性使事物在当前范式中易于处理，易于处理，因为你可以投入大量的RL。

[14:24] Um so maybe one way to see it is that uh that remains true even if the labs are not focusing on it directly.
  嗯，所以也许一种看待它的方式是，即使实验室没有直接关注它，这一点仍然成立。

[14:28] So if you are in a verifiable setting where you could create these RL environments or examples then that actually sets you up to potentially do your own fine tuning and you might benefit from that.
  所以，如果你处于一个可验证的环境中，在那里你可以创建这些RL环境或示例，那么这实际上会让你能够进行自己的微调，你可能会从中受益。

[14:36] But that is fundamentally technology that just works.
  但那从根本上说是有效即可的技术。

[14:38] You can pull a lever if you have huge amount of diverse data sets of RL environments etc.
  如果你有海量多样化的RL环境等数据集，你就可以采取行动。

[14:43] Uh you can use your favorite fine-tuning framework and um and uh pull the lever and get something that actually uh works pretty well.
  呃，你可以使用你最喜欢的微调框架，然后呃，然后呃，采取行动，得到一些实际上效果相当不错的东西。

[14:51] So um I don't know what the examples of this might be.
  所以，嗯，我不知道这方面可能的例子是什么。

[14:55] Um, but I do think there are some very valuable uh reinforcement learning environments that people could think of that I think are not part of the
  嗯，但我确实认为有一些非常有价值的呃强化学习环境，人们可以想到，我认为它们不属于

[15:01] Yeah, I don't want to give away the answer, but there is one domain that I think is very uh Oh, okay.
  是的，我不想给出答案，但有一个领域我认为非常呃哦，好的。

[15:04] Sorry, I don't mean to vape post on on the stage, but there are some examples of this.
  抱歉，我不是想在舞台上吹嘘，但有一些这方面的例子。

[15:09] >> On the flip side, what do you think still feels automatable only from a distance?
  >> 另一方面，你认为什么仍然可以从远处自动化？

[15:15] >> I do think that ultimately almost
  >> 我确实认为最终几乎

[15:17] I do think that ultimately almost everything can be made uh verifiable to some extent.
  我认为最终几乎所有东西在某种程度上都可以得到验证。

[15:19] Some things easier than others.
  有些东西比其他东西更容易。

[15:23] Um because even for like things like writing or so on, you can imagine having a council of LLM judges and probably get get to some get get something uh reasonable out of the um from from this kind of an approach.
  嗯，因为即使是像写作这样的事情，你也可以想象有一个LLM法官委员会，并且可能从中得到一些合理的东西，从这种方法中。

[15:33] So it's more about what's easy or hard.
  所以这更多是关于什么容易或困难。

[15:36] Um so I I do think that ultimately um uh yeah, I think uh everything is automatable.
  嗯，所以我想，最终，嗯，是的，我认为一切都是可自动化的。

[15:45] Amazing. Okay. Um, so last year you coined the term vibe coding and today we're in a world that feels a little bit more serious, more regent engineering.
  太棒了。好的。嗯，所以去年你创造了“vibe coding”这个词，而今天我们身处一个感觉有点更严肃、更具代理工程的世界。

[15:54] What do you think is the difference between the two and what would you actually call what we're in today?
  你认为两者之间有什么区别，你今天会把我们所处的状况称为什么？

[15:57] Uh, yeah. So I would say vibe coding is about raising the floor for everyone in terms of what they can do in software.
  呃，是的。所以我想说，“vibe coding”是关于提高每个人在软件方面能力的门槛。

[16:03] So the floor rises, everyone can vibe code anything and that's amazing, incredible.
  所以门槛提高了，每个人都可以进行“vibe coding”，这真是太棒了，不可思议。

[16:06] But then I would say agentic engineering is about preserving the quality bar of what existed before in professional software.
  但我想说，代理工程是关于保持专业软件中先前存在的质量标准。

[16:11] So you're not allowed to introduce vulnerabilities due to VIP coding.
  所以你不允许因为“vibe coding”而引入漏洞。

[16:13] Um you are um you're
  嗯，你，嗯，你

[16:18] to VIP coding.
  到 VIP 编码。

[16:20] Um you are um you're still responsible for your software just still responsible for your software just as before, but can you go faster?
  嗯，你仍然负责你的软件，就像以前一样，但你能更快吗？

[16:22] And spoiler is you can but how do you how do you do that properly?
  而且剧透一下，你可以，但你如何正确地做到这一点？

[16:24] And so to me agentic engineering when I call it that because I do think it's kind of like an engineering discipline.
  所以对我来说，我称之为智能工程，因为我认为它有点像一门工程学科。

[16:28] You have these agents which are these like spiky entities.
  你有这些代理，它们就像尖锐的实体。

[16:31] They're a bit fable, a little bit stocastic, but they are extremely powerful.
  它们有点虚构，有点随机，但它们非常强大。

[16:35] is how do you how do you coordinate them to go faster without sacrificing your quality bar and doing that well and correctly um is the the realm of agentic engineering um so I kind of see them as as different like one is about maybe raising the raising the floor and the other is about um you know extrapolating and what I'm seeing I think is there is a very high ceiling on agentic engineer uh capability and you know people used to talk about the 10x engineer previously I think that this is magnified a lot more 10x is uh is not uh the speed up you gain.
  是如何协调它们以更快地进行，而又不牺牲你的质量标准并正确地做好？嗯，这就是智能工程的领域，嗯，所以我有点把它们看作是不同的，就像一个可能是关于提高基础，而另一个是关于，你知道，推断，而我看到我认为智能工程能力有一个非常高的上限，你知道，人们以前谈论 10 倍工程师，我认为这被放大了许多， 10 倍不是你获得的加速。

[16:46] Um and I think uh it does seem to me like people who are very good at this um peak a lot more than 10x uh from from my perspective right now.
  嗯，我认为，在我看来，那些非常擅长此道的人的峰值表现远超 10 倍。

[17:18] >> I really like that framing. Um one thing
  >> 我真的很喜欢这个框架。嗯，有一件事

[17:21] I really like that framing.
  我真的很喜欢那个框架。

[17:23] Um one thing that when Sam Alman came to AIN last year, one memorable thing he said was that people of different generations use chatpt differently.
  嗯，有一件事是当萨姆·阿尔特曼去年来到 AIN 时，他说的令人难忘的一件事是，不同代人使用 chatpt 的方式不同。

[17:29] So if you're in your 30s, you use it as a Google search replacement.
  所以如果你在 30 多岁，你会把它当作谷歌搜索的替代品。

[17:32] But if you're in your teens, tragic is your gateway to the internet.
  但如果你在十几岁，悲剧是你通往互联网的门户。

[17:37] What is the parallel here in coding today?
  那么今天在编程方面有什么相似之处呢？

[17:39] If we were to watch two people code using OpenClaw, Claude Code, Codeex, one you'd consider mediocre at it and one you would consider fully AI native.
  如果我们观察两个人使用 OpenClaw、Claude Code、Codeex 编写代码，你会认为其中一个平庸，而另一个是完全的 AI 原生。

[17:47] How would you describe the difference?
  你会如何描述这种差异？

[17:51] I mean I think it's a just trying to get the most out of the tools that are available utilizing all of their features investing into your own um kind of setup.
  我的意思是，我认为这只是在尝试充分利用可用工具，利用其所有功能，投入到你自己的某种设置中。

[18:02] Uh so just like previously all the engineers are used to basically getting the most out of the tools you use either it's vim or v code or now it's you know cloth code or codec or so on.
  嗯，就像以前一样，所有工程师基本上都习惯于充分利用你使用的工具，无论是 vim 还是 v code，或者现在你知道的 cloth code 或 codec 等等。

[18:13] So um just investing into your setup um and um utilizing a lot of the you know uh tools that are available to you.
  所以，嗯，只是投入到你的设置中，嗯，并利用了很多你知道的，嗯，可用的工具给你。

[18:18] Um and I think it just kind of looks like that.
  嗯，我认为它看起来就是这样。

[18:20] I do think that um maybe
  我确实认为嗯，也许

[18:23] like that.
  就像那样。

[18:26] I do think that um maybe related thought is um a lot of people are maybe hiring um for this right
  我认为，嗯，也许相关的想法是，嗯，很多人可能正在为此招聘，对吧？

[18:29] because they want to hire strong agentic engineers.
  因为他们想聘请强大的代理工程师。

[18:34] I do think that um what I'm seeing is that uh the you know most people have still not refactored their um their hiring process for a gentic engineer capability right
  我认为，嗯，我所看到的是，嗯，你知道，大多数人仍然没有为代理工程师能力重构他们的招聘流程，对吧？

[18:44] like if you're giving out puzzles to solve and this is still the old paradigm I would say that hiring have to has to look like give me a really big project and see someone implement that big project
  就像如果你在出谜题来解决，而这仍然是旧范式，我会说招聘必须看起来像给我一个非常大的项目，然后看某人实现那个大项目。

[18:53] like let's write say a Twitter clone uh for agents and then uh make it really good make it really secure and then have some agents uh simulate some activity uh on this Twitter
  比如让我们写一个代理的Twitter克隆，然后，嗯，把它做得非常好，让它非常安全，然后让一些代理在这个Twitter上模拟一些活动。

[19:03] and then I'm going to use 10 codecs 5.4x for X high to try to break your break your um uh this website that you deployed
  然后我将使用10个codec 5.4x for X high来尝试破解你部署的这个网站。

[19:15] and they're going to try to basically break it and they should not be able to break it.
  他们将尝试基本上破解它，而且他们应该无法破解它。

[19:18] And so maybe it looks like that, right?
  所以也许它看起来是那样的，对吧？

[19:20] And so yeah, watching people in that that setting and building bigger uh projects and uh
  所以，是的，看着人们在这种环境下构建更大的项目，嗯，

[19:25] building bigger uh projects and uh utilize utilizing the tooling is maybe what I would uh look at for the most part.
  构建更大的项目，利用工具可能是我最看重的地方。

[19:29] And as agents do more, what human skill do you think becomes more valuable, not less?
  随着代理能力的增强，您认为哪些人类技能会变得更有价值，而不是更不重要？

[19:34] Uh so um yeah, it's a good question.
  嗯，所以，嗯，是的，这是一个好问题。

[19:37] I think um well right now the answer is that the agents are kind of like these intern entities right so it's remarkable um you basically still have to be in charge of the aesthetics the the judgment the taste and a little bit of oversight maybe one one of my favorite examples of like the the weirdness of agents is um for menu genen uh you sign up with a Google Google account but you um purchase credits using a stripe account and both of them have email addresses and my agent actually tried to basically um like when you purchase credits, it assigned it using the email address from Stripe to the Google email address like there wasn't a persistent user ID that that uh for people it was trying to match up the email addresses, but you could use different email address for your Stripe and your Google and basically would not associate the funds.
  我认为，嗯，嗯，目前答案是，代理有点像这些实习实体，所以这是非凡的，嗯，你基本上仍然需要负责美学、判断、品味以及一点监督。也许我最喜欢的关于代理奇怪之处的一个例子是，对于 menu genen，嗯，你用一个谷歌账户注册，但是你嗯，使用一个 Stripe 账户购买积分，它们都有电子邮件地址，我的代理实际上试图，基本上，嗯，当你购买积分时，它使用 Stripe 的电子邮件地址分配给谷歌电子邮件地址，就像没有持久的用户 ID 一样，嗯，对于人们来说，它试图匹配电子邮件地址，但是你可以为你的 Stripe 和你的谷歌使用不同的电子邮件地址，基本上不会关联资金。

[20:26] basically would not associate the funds. And so this is the kind of thing that

[20:28] And so this is the kind of thing that these agents still will make mistakes

[20:29] these agents still will make mistakes about is like why would you use email

[20:31] about is like why would you use email addresses to try to crossorrelate the

[20:33] addresses to try to crossorrelate the funds? They can be arbitrary. You can

[20:34] funds? They can be arbitrary. You can use different emails, etc. Like this is

[20:36] use different emails, etc. Like this is such a weird thing to do. So I think

[20:39] such a weird thing to do. So I think people have to be in charge of this

[20:40] people have to be in charge of this spec, this plan. And um I actually don't

[20:43] spec, this plan. And um I actually don't even like the plan mode. I I would I

[20:46] even like the plan mode. I I would I mean obviously it's very useful, but I

[20:47] mean obviously it's very useful, but I think there's something more general

[20:48] think there's something more general here where you have to work with your

[20:49] here where you have to work with your agent to design a spec that is very

[20:51] agent to design a spec that is very detailed and maybe it's uh maybe

[20:53] detailed and maybe it's uh maybe basically the docs and then get the

[20:55] basically the docs and then get the agents to write them and you're in

[20:56] agents to write them and you're in charge of the oversight and the top

[20:58] charge of the oversight and the top level categories, but the agents are

[21:00] level categories, but the agents are doing a lot of the under the hood. And

[21:02] doing a lot of the under the hood. And um so I think you're not caring about

[21:04] um so I think you're not caring about some of the details. So as an example

[21:05] some of the details. So as an example also with um arrays or tensors in neural

[21:09] also with um arrays or tensors in neural networks. Um there's a ton of details

[21:11] networks. Um there's a ton of details between PyTorch and NumPy and all the

[21:13] between PyTorch and NumPy and all the different like pandas and so on for all

[21:14] different like pandas and so on for all the different little API details. And I

[21:17] the different little API details. And I I already forgot about the keep dims

[21:18] I already forgot about the keep dims versus keep dim or whether it's dim or

[21:20] versus keep dim or whether it's dim or axis or reshape or permute or transpose.

[21:22] axis or reshape or permute or transpose. I don't remember this stuff anymore,

[21:24] I don't remember this stuff anymore, right? Because you don't have to. This

[21:25] right? Because you don't have to. This is the kind of details that are handled

[21:26] is the kind of details that are handled by the intern because they have very

[21:28] by the intern because they have very good recall and but you still have to

[21:30] good recall and but you still have to know for example that um you know

[21:32] know for example that um you know there's underlying tensor there's an

[21:33] there's underlying tensor there's an underlying view and then you can

[21:35] underlying view and then you can manipulate view of the same storage or

[21:37] manipulate view of the same storage or you can have different storage which

[21:38] you can have different storage which would be less efficient and so you still

[21:40] would be less efficient and so you still have to have an understanding of what

[21:41] have to have an understanding of what this stuff is doing and some of the

[21:43] this stuff is doing and some of the fundamentals um so that you're not

[21:45] fundamentals um so that you're not copying memory around unnecessarily and

[21:47] copying memory around unnecessarily and so on but uh the details of the APIs are

[21:50] so on but uh the details of the APIs are now handed off so it um you're in charge

[21:53] now handed off so it um you're in charge of the taste the engineering the design

[21:55] of the taste the engineering the design um and that it makes sense and that

[21:57] um and that it makes sense and that you're asking for the right things and

[21:58] you're asking for the right things and that you're saying that okay that these

[21:59] that you're saying that okay that these have to be unique user IDs that we're

[22:01] have to be unique user IDs that we're going to tie everything to um and so

[22:03] going to tie everything to um and so you're doing some of the design and

[22:06] you're doing some of the design and development and the engineers are doing

[22:07] development and the engineers are doing the fill in the blanks and that's

[22:08] the fill in the blanks and that's currently kind of like where we are and

[22:10] currently kind of like where we are and I think that's what everyone of course

[22:11] I think that's what everyone of course is seeing I think right now

[22:13] is seeing I think right now >> do you think there's a chance that this

[22:15] >> do you think there's a chance that this um taste and judgment matters less over

[22:18] um taste and judgment matters less over time or will the ceiling just keep

[22:20] time or will the ceiling just keep rising

[22:21] rising >> um yeah it's a good question I would

[22:22] >> um yeah it's a good question I would Okay.

[22:25] Okay. Um, I mean, I'm hoping that the that it

[22:28] Um, I mean, I'm hoping that the that it improves. I think probably the reason it

[22:30] improves. I think probably the reason it doesn't improve right now is again, it's

[22:31] doesn't improve right now is again, it's not part of the RL. There's probably no

[22:33] not part of the RL. There's probably no aesthetics cost or reward or it's not

[22:36] aesthetics cost or reward or it's not good enough or something like that. Um,

[22:39] good enough or something like that. Um, I do think that when you actually look

[22:41] I do think that when you actually look at the code, sometimes I get a little

[22:42] at the code, sometimes I get a little bit of a heart attack because it's not

[22:44] bit of a heart attack because it's not like super amazing code necessarily all

[22:46] like super amazing code necessarily all the time and it's very bloaty and

[22:47] the time and it's very bloaty and there's a lot of copy paste and there's

[22:48] there's a lot of copy paste and there's awkward abstractions that are brittle

[22:50] awkward abstractions that are brittle and like it works but it's just really

[22:52] and like it works but it's just really gross. Um, and I do I do hope that this

[22:55] gross. Um, and I do I do hope that this can improve in future models. Um, a good

[22:57] can improve in future models. Um, a good example also is this uh you know micro

[22:59] example also is this uh you know micro GPT project which where I was trying to

[23:02] GPT project which where I was trying to simplify uh LLM training to be as simple

[23:04] simplify uh LLM training to be as simple as possible. The models hate this. They

[23:06] as possible. The models hate this. They can't do it. I tried to I keep I kept

[23:08] can't do it. I tried to I keep I kept trying to prompt an LLM to simplify more

[23:10] trying to prompt an LLM to simplify more simplify more and it just can't you feel

[23:13] simplify more and it just can't you feel like you're outside of the RL circuits.

[23:15] like you're outside of the RL circuits. It feels like you're obviously you know

[23:18] It feels like you're obviously you know you're pulling teeth. It's not like

[23:20] you're pulling teeth. It's not like light speed. So I think um I do think

[23:23] light speed. So I think um I do think that people are still remain in charge

[23:25] that people are still remain in charge of this. But I do think that there's

[23:26] of this. But I do think that there's nothing fundamental again that's

[23:27] nothing fundamental again that's preventing it. It's just the labs

[23:28] preventing it. It's just the labs haven't done it yet almost.

[23:30] haven't done it yet almost. >> Yeah.

[23:31] >> Yeah. >> So I'd love to come back to this idea of

[23:33] >> So I'd love to come back to this idea of uh jagged forms of intelligence. you

[23:36] uh jagged forms of intelligence. you wrote a little bit about this with a

[23:38] wrote a little bit about this with a very thoughtprovoking piece around

[23:39] very thoughtprovoking piece around animals versus ghosts. Um, and the idea

[23:42] animals versus ghosts. Um, and the idea is that we're not building animals, we

[23:44] is that we're not building animals, we are summoning ghosts. Um, and these are

[23:46] are summoning ghosts. Um, and these are jagged forms of intelligence that are

[23:48] jagged forms of intelligence that are shaped by data and reward functions, but

[23:51] shaped by data and reward functions, but not by intrinsic motivation or fun or

[23:54] not by intrinsic motivation or fun or curiosity or empowerment. Uh, things

[23:57] curiosity or empowerment. Uh, things that kind of came about via evolution.

[24:00] that kind of came about via evolution. um why does that framing matter and what

[24:02] um why does that framing matter and what does it actually change about how you

[24:04] does it actually change about how you build and deploy and evaluate or even

[24:07] build and deploy and evaluate or even trust them?

[24:08] trust them? >> Uh yeah, so yeah, I think the reason I

[24:12] >> Uh yeah, so yeah, I think the reason I wrote about this is because I'm trying

[24:13] wrote about this is because I'm trying to wrap my head around what these things

[24:15] to wrap my head around what these things are, right? Because if you have a good

[24:16] are, right? Because if you have a good model of what they are or are not, then

[24:18] model of what they are or are not, then you're going to be more competent at uh

[24:20] you're going to be more competent at uh using them. Um and I do think that um I

[24:23] using them. Um and I do think that um I don't know if it has I'm not sure if it

[24:25] don't know if it has I'm not sure if it actually has like real power. [laughter]

[24:28] actually has like real power. [laughter] I think it's a little bit of

[24:29] I think it's a little bit of philosophizing. Um, but I do think that

[24:33] philosophizing. Um, but I do think that um

[24:34] um I think it's just um coming to terms

[24:36] I think it's just um coming to terms with the fact that these things are not,

[24:38] with the fact that these things are not, you know, animal intelligences. Like if

[24:40] you know, animal intelligences. Like if you yell at them, they're not going to

[24:41] you yell at them, they're not going to work better or worse or it doesn't have

[24:43] work better or worse or it doesn't have any impact. Um, and uh it's all just

[24:46] any impact. Um, and uh it's all just kind of like these statistical

[24:48] kind of like these statistical simulation circuits where the the

[24:50] simulation circuits where the the substrate is pre-training so like

[24:53] substrate is pre-training so like statistics and then but then there's RL

[24:55] statistics and then but then there's RL bolting on top. So, it kind of like

[24:57] bolting on top. So, it kind of like increases the dispendages and um maybe

[25:00] increases the dispendages and um maybe it's just kind of like a mindset of what

[25:02] it's just kind of like a mindset of what I'm coming into or what's likely to work

[25:04] I'm coming into or what's likely to work or not likely to work or how to modify

[25:05] or not likely to work or how to modify it. But I don't actually I don't know

[25:07] it. But I don't actually I don't know that I have like here are the five

[25:09] that I have like here are the five obvious outcomes of how to make your

[25:11] obvious outcomes of how to make your system better. It's more just being

[25:12] system better. It's more just being suspicious of it and um

[25:14] suspicious of it and um >> figuring out over time.

[25:16] >> figuring out over time. >> That's where it starts. Um okay, so you

[25:18] >> That's where it starts. Um okay, so you are so deep in working with agents that

[25:20] are so deep in working with agents that don't just chat. They have um real

[25:22] don't just chat. They have um real permissions. They have local context.

[25:24] permissions. They have local context. they actually take action on your be

[25:26] they actually take action on your be your behalf. What does the world look

[25:28] your behalf. What does the world look like when we all start to live in that

[25:30] like when we all start to live in that world?

[25:31] world? >> Uh yeah, I think I think every a lot of

[25:34] >> Uh yeah, I think I think every a lot of people probably here are excited about

[25:35] people probably here are excited about what this agent uh you know native

[25:38] what this agent uh you know native agentic environment looks like and

[25:40] agentic environment looks like and everything has to be rewritten.

[25:41] everything has to be rewritten. Everything is still fundamentally

[25:42] Everything is still fundamentally written for humans and has to be moved

[25:44] written for humans and has to be moved around. I still use most of the time

[25:46] around. I still use most of the time when I use uh different frameworks or

[25:48] when I use uh different frameworks or libraries or things like that, they

[25:49] libraries or things like that, they still have docs that are fundamentally

[25:51] still have docs that are fundamentally written for humans. This is my favorite

[25:53] written for humans. This is my favorite pet peeve. Like I don't uh why are

[25:55] pet peeve. Like I don't uh why are people still telling me what to do? Like

[25:57] people still telling me what to do? Like I don't want to do anything. What is the

[25:58] I don't want to do anything. What is the thing I should copy paste to my agent?

[26:00] thing I should copy paste to my agent? [laughter] Like uh so it's just um every

[26:02] [laughter] Like uh so it's just um every time I'm told, you know, go to this URL

[26:04] time I'm told, you know, go to this URL or something like that, it's just like

[26:06] or something like that, it's just like ah [laughter]

[26:07] ah [laughter] you know. [snorts] So um everyone is I

[26:10] you know. [snorts] So um everyone is I think excited about how do we decompose

[26:12] think excited about how do we decompose the workloads that need to happen into

[26:14] the workloads that need to happen into fundamentally sensors over the world,

[26:16] fundamentally sensors over the world, actuators over the world. How do we make

[26:18] actuators over the world. How do we make it agent native? Uh basically describe

[26:20] it agent native? Uh basically describe it to agents first. um and then have a

[26:23] it to agents first. um and then have a lot of automation around um you know the

[26:27] lot of automation around um you know the um yeah around data structures that are

[26:30] um yeah around data structures that are very legible to the LLMs. Uh so I think

[26:32] very legible to the LLMs. Uh so I think um yeah I'm hoping that there's a lot of

[26:34] um yeah I'm hoping that there's a lot of agent first um infrastructure out there

[26:36] agent first um infrastructure out there and that you know for Menuguen famously

[26:39] and that you know for Menuguen famously when I wrote the uh not I'm not sure how

[26:40] when I wrote the uh not I'm not sure how famously but when I wrote the blog post

[26:42] famously but when I wrote the blog post about Menuguen [laughter]

[26:44] about Menuguen [laughter] um a lot of the work a lot of the

[26:46] um a lot of the work a lot of the trouble was not even writing the code

[26:47] trouble was not even writing the code for Menugen it was deploying it in

[26:48] for Menugen it was deploying it in versell because I had to work with all

[26:50] versell because I had to work with all these different services and I had to

[26:51] these different services and I had to string them up and I had to go to their

[26:52] string them up and I had to go to their settings and the menus and you know

[26:54] settings and the menus and you know configure my DNS and it was just so

[26:56] configure my DNS and it was just so annoying and so that's a good example of

[26:59] annoying and so that's a good example of I would hope that menu gen that I could

[27:01] I would hope that menu gen that I could give a prompt to an LLM build menu genen

[27:04] give a prompt to an LLM build menu genen and then I didn't have to touch anything

[27:05] and then I didn't have to touch anything and it's deployed in that same way on

[27:07] and it's deployed in that same way on the internet. Uh I think that would be a

[27:09] the internet. Uh I think that would be a good kind of a test for whether or not

[27:12] good kind of a test for whether or not uh a lot of our infrastructure is

[27:13] uh a lot of our infrastructure is becoming more and more agent native. And

[27:14] becoming more and more agent native. And then ultimately I would say yeah I I do

[27:17] then ultimately I would say yeah I I do think we're going towards a world where

[27:19] think we're going towards a world where um there's agent representation for

[27:21] um there's agent representation for people and for organizations and um you

[27:25] people and for organizations and um you know I'll have my agent talk to your

[27:26] know I'll have my agent talk to your agent uh to figure out some of the

[27:28] agent uh to figure out some of the details of our meetings or or things

[27:30] details of our meetings or or things like that. So, [laughter]

[27:33] like that. So, [laughter] um I do think that that's uh roughly

[27:34] um I do think that that's uh roughly where things are going, but um yeah, I

[27:36] where things are going, but um yeah, I think everyone here is excited about

[27:37] think everyone here is excited about that.

[27:38] that. >> I really like the visual analogy of

[27:40] >> I really like the visual analogy of sensors and actuators. I actually hadn't

[27:41] sensors and actuators. I actually hadn't thought of that. That's super

[27:42] thought of that. That's super interesting,

[27:43] interesting, >> right?

[27:43] >> right? >> Um okay, I think we have to end on a

[27:45] >> Um okay, I think we have to end on a question about education. Um because you

[27:47] question about education. Um because you are probably one of the very best in the

[27:49] are probably one of the very best in the world at making complex technical

[27:51] world at making complex technical concepts simple and deeply thoughtful

[27:53] concepts simple and deeply thoughtful about how we design education around it.

[27:56] about how we design education around it. Um, what still remains worth learning

[27:59] Um, what still remains worth learning deeply when intelligence gets cheap as

[28:02] deeply when intelligence gets cheap as we move into the next a era of AI?

[28:05] we move into the next a era of AI? >> Yeah. Uh, there was a tweet that blew my

[28:07] >> Yeah. Uh, there was a tweet that blew my mind recently and I keep thinking about

[28:09] mind recently and I keep thinking about it like every other day. It was

[28:10] it like every other day. It was something along the lines of um, you can

[28:12] something along the lines of um, you can outsource your thinking but you can't

[28:14] outsource your thinking but you can't outsource your understanding.

[28:16] outsource your understanding. And um,

[28:17] And um, >> I think that's really nicely put. I so

[28:21] >> I think that's really nicely put. I so yeah because I still I'm still part of

[28:23] yeah because I still I'm still part of the system and I still I still have to

[28:25] the system and I still I still have to somehow information still has to make it

[28:26] somehow information still has to make it into my brain and I feel like I'm

[28:27] into my brain and I feel like I'm becoming a bottleneck of just even

[28:29] becoming a bottleneck of just even knowing what are we trying to build why

[28:30] knowing what are we trying to build why is it worth doing uh how do I direct you

[28:32] is it worth doing uh how do I direct you know how do I direct my my agents and so

[28:34] know how do I direct my my agents and so on so I do still think that ultimately

[28:37] on so I do still think that ultimately something has to direct the thinking and

[28:39] something has to direct the thinking and the processing and so on and um that's

[28:43] the processing and so on and um that's still kind of fundamentally constrained

[28:44] still kind of fundamentally constrained somehow by understanding and this is one

[28:46] somehow by understanding and this is one reason I also was very excited about all

[28:47] reason I also was very excited about all the LM knowledge bases because I feel

[28:49] the LM knowledge bases because I feel like that's that's a way for me to

[28:51] like that's that's a way for me to process information and anytime I see a

[28:53] process information and anytime I see a different projection onto information. I

[28:54] different projection onto information. I always like feel like I gain insight. So

[28:56] always like feel like I gain insight. So it's really just a lot of prompts for me

[28:58] it's really just a lot of prompts for me to do synthetic data generation kind of

[29:00] to do synthetic data generation kind of over over some fixed data. Uh so I I

[29:03] over over some fixed data. Uh so I I really enjoy uh whenever I read an

[29:05] really enjoy uh whenever I read an article I have my uh you know my wiki

[29:06] article I have my uh you know my wiki that's being built up from these

[29:07] that's being built up from these articles and I love asking questions

[29:09] articles and I love asking questions about things or um and I I think that

[29:12] about things or um and I I think that ultimately these are tools to enhance

[29:15] ultimately these are tools to enhance understanding in a certain way and this

[29:17] understanding in a certain way and this is still kind of like a bit of a

[29:18] is still kind of like a bit of a bottleneck because then you can't direct

[29:20] bottleneck because then you can't direct the you can't be a good director if you

[29:22] the you can't be a good director if you still uh because the LM certainly don't

[29:25] still uh because the LM certainly don't excel at understanding you still are

[29:26] excel at understanding you still are uniquely in charge of that. So, uh,

[29:28] uniquely in charge of that. So, uh, yeah, I think, uh, tools to that effect,

[29:31] yeah, I think, uh, tools to that effect, I think are incredibly interesting and

[29:32] I think are incredibly interesting and exciting.

[29:33] exciting. >> I'm excited to be back here in a couple

[29:34] >> I'm excited to be back here in a couple years and to see if we've been fully

[29:36] years and to see if we've been fully automated out of the loop and they

[29:38] automated out of the loop and they actually take care of understanding as

[29:40] actually take care of understanding as well. Uh, thank you so much for joining

[29:41] well. Uh, thank you so much for joining us, Andre. We really appreciate it.

[29:42] us, Andre. We really appreciate it. [applause]
