# Stanford CS153 Frontier Systems | Jensen Huang from NVIDIA on the Compute Behind Intelligence

https://www.youtube.com/watch?v=tsQB0n0YV3k

[00:08] I would like to welcome back Preacher Huang.
[00:19] We have been now in a locked in a global race way faster than NASCAR racing and it's partly your fault.
[00:30] Jensen's been the preacher that's given us all the power we need, all the energy, and some more to have what I think has been the craziest 12 months of my life, certainly for many of you.
[00:43] And we're just getting started.
[00:45] um the energy with which you approach every single thing you do including the class last year and then all every time I've had the chance to hang out with you, you've given so much time to the students, the founders.
[00:58] Thank you.
[01:01] Should we jump right in?
[01:02] Yeah, let's go.
[01:03] All right, we're going to rapid fire.
[01:05] What is code design and why is it so important?
[01:12] I'll answer that in a second.
[01:13] Yes, please.
[01:16] Um but but uh this is a great time to be in computer science and obviously the reason is because computing is being reinvented for the first time as dramatically as as it is for the first time really in about 60 plus years.
[01:30] uh the computer that we know of that you all use and our computing model, our mental model of a computer, the architecture of a computer, how you write the program, run the program, how you think about even taking computers to market, what it's used for.
[01:46] For 64 years, it has been largely the same since the IBM system 360.
[01:48] In fact, my first architecture book for learning about computer architecture was the system 360's manual.
[01:59] And so so a lot has changed um as we went from PCs to internet and mobile and cloud and all those things.
[02:07] But the fact of the matter is the computing model the fundamental part of computer science has
[02:11] largely remained the same until now.
[02:15] You know for the first time uh the way you write the software how you process the neuronet network versus the software and what the applications can do has now tra dramatically changed.
[02:26] Everything is fundamentally different.
[02:30] At the highest level, you know, one simple way to think about it is is um uh computing as we knew it before was largely pre-recorded.
[02:39] It's content that we've we pre-recorded, images, videos, you know, software that we largely pre-recorded, but now everything is generated.
[02:47] And the nice thing about generating everything in real time is that it could be contextually consistent con contextually relevant to what what it is that you're dealing with.
[02:59] And of course um it can respond uh to your intention not just explicitly uh to the things that you instruct and and so so the computer the computer is is um uh fundamentally
[03:13] different in that way.
[03:15] Now the question is what does that mean at every single layer of the stack from from uh how the computer.
[03:21] how the software is now developed the methodology of it how you organize your company to be able to develop software of today completely changed and so the methodology the tools we use uh the approach that we think about software coding uh completely changed uh how we run the software.
[03:39] neuronet network versus compiled binaries um very very different And so what does that mean to the computer system, the network, the storage?
[03:49] Um what does that mean to the software stack and the cloud services that sit on top of that?
[03:53] And of course, you know, everything about the applications, what did it open up?
[03:57] And uh somebody just uh somebody just came and said this piece of software we just opened up called Alpamo.
[04:06] And I've been working on self-driving cars now for about 13 years.
[04:08] and and um I and the and the and the days of robo
[04:14] Taxis are going to be literally everywhere.
[04:17] You know, everything that moves will be robotic.
[04:19] And and that's an example of an application that that uh we wouldn't consider doing until deep learning and artificial intelligence came along.
[04:27] That was such a big unlock u that you know I said hey aha uh this all of these problems that we wanted to solve in the past that we needed computer vision for uh really really uh are now fundamentally unlocked.
[04:43] And so so it's how you think about every single stage of that.
[04:48] What is you know what is a software engineer?
[04:50] How do you organize the company?
[04:53] Uh what is a computer for the age of AI?
[04:56] How do you architect that?
[04:59] All the way to what you can use it for and therefore uh therefore um where you would deploy it.
[05:05] Um all of that has fundamentally changed and and uh for me the journey really started about 15 years ago and uh uh I had the benefit of of seeing some early works in in the
[05:15] area and uh as all Stanford students do.
[05:19] uh you break the problem down you reason about it from first principles and you come to the conclusion literally everything has changed and so here you are you know computer scientist students.
[05:29] uh this is really the first generation of AI becoming uh useful and um where we a couple years ago was in the generative part of AI and and as you guys know generative AI not only made it made it cool for us to do image generation and text summarization and translation and whatnot but generative generative AI also enabled us to think and so when I saw generative AI uh you know when what other people saw was that it was able to generate images and I and I surely would appreciated that as well.
[06:04] Uh but the fact that you can generate thoughts it in the form of images but you can generate thoughts uh you can also reason with it and the ability for AI to think after GPT uh was was very very obvious.
[06:18] Now the question is is uh how would you train how would you fine-tune an AI uh to be able to reason step by step by step and how would you teach it how to do so at at fairly large scale in a kind of semi-supervised way and so those are kind of the engineering problems you had to solve uh but the moment you see GPT you say aha uh thinking is just around the corner and thinking is generating tokens that you consume internally and uh generating tokens that you consume externally uh would be called tool full use.
[06:48] And so the idea that that after GPT happened two years ago that we would be at this moment was fairly easy to predict.
[06:56] Now of course an an you know unbelievable amount of technology was invented and a lot of amazing people did amazing work but you could almost see that moment here.
[07:06] And so here we are you now have agentic systems and so now the question is is what's next?
[07:11] And what happens in a world where a computer is not uh not responsive to what you ask it
[07:21] to do? It's not it's not bought on demand.
[07:26] You know, today's computing is really ondemand computing.
[07:28] The word on demand was actually gen created in in our generation to talk about how you think about using computers.
[07:35] Uh timesharing computers that you would use on demand became cloud computers.
[07:40] And cloud computing of course is on demand but uh in your new world of agentic system uh these the computers are now continuously running.
[07:48] And so what happens in a world where the computers are continuously running uh what happens to cloud services?
[07:55] What happened to your personal computer?
[07:59] What happens to you know all of these differenti systems?
[08:01] Now there's a great great opportunity again to rethink all of that.
[08:05] And so so what I you know is kind of my my my introduction to everything about um computer science has changed and and everything about every field of science has changed because of the things that we've changed and and so this good time to go to school.
[08:20] Okay, that's it.
[08:21] What was your question?
[08:24] You know what? I'm just going to turn it over to the kids.
[08:27] Codeesign code. Code.
[08:29] Let's just go into the the students have questions.
[08:31] They've all been asking questions in Discord.
[08:32] Are they all voting on each other?
[08:34] Codeesign is really interesting, but it's not it.
[08:36] Codeesign is super interesting and and basically codeesign says said back in the old days we abstracted computing so that so that um uh the people who designed microprocessors designed microprocessors.
[08:49] People who uh worked on compilers worked on compilers and people who worked on languages worked on languages and so on so forth.
[08:56] You guys know that and we actually had different fields.
[08:59] Um but the problem and in fact this happened at Stanford uh what's the what's the beauty of risk?
[09:05] What was the beauty of the work that John Hennessy did?
[09:07] Um it it the beauty of it is that you got to think about compilers and microprocessor architectures harmoniously co-designed because otherwise you could end up creating a microprocessor that's super super tight and you know everything is is maximally
[09:26] optimized but unfortunately it's hard to compile.
[09:28] It's difficult.
[09:31] It's not compilable.
[09:33] And so they created a a simpler instruction set that exposed simplicity to compilers so that compilers could do a better job generating code.
[09:41] And it turns out a simpler machine co-designed with a compiler creates better performance than two systems that were optimized individually.
[09:51] That's you know that's very Stanford.
[09:53] Okay.
[09:55] And this is this is part of your heritage as all of your in and John Hennessy's you know trail of amazing work that's left behind.
[10:03] And so you take you take that and you think about well what happens in in the post world of general purpose computing.
[10:09] Why is it that every problem in in computer science would be solvable by a general purpose instrument?
[10:17] At some level you know you could say well if you had a general purpose instrument you prefer that.
[10:22] However, there are some extreme problems whether it's computer graphics back in the old days or molecular
[10:26] dynamics or quantum chemistry or you know you know fluid dynamics and large multiscale mess scale multi-ysics problems or deep learning.
[10:35] These problems are so computationally intense.
[10:37] Why would you use a general purpose computer to go do that?
[10:40] And so there the big insight is what if you understood the algorithms, understood the computer systems, understood the you know if you will the compilers, the frameworks and understood the architecture of chips and you were optimizing all of it at the same time.
[11:00] And so the the facts here are the facts.
[11:02] This is what happens when you do it what I just described.
[11:03] Nvidia is probably the first computer systems company that's extreme code design.
[11:09] We we literally co-design across all of that and including CPUs, GPUs, networking and switches and everything and storage.
[11:17] And so the question is what do you get?
[11:18] Well, Moore's law back in the old days, you guys all know about that.
[11:21] Moore's law was about uh 2x every 18 months.
[11:25] So call it 10x every
[11:28] 5 years.
[11:28] Okay?
[11:28] So 10x every 5 years is 100x every 10 years.
[11:34] And that's that was in the good old days of Moore's law.
[11:35] And for all the computer computer scientists in the room, you know, you know that Moors law was underpinned by a concept called dinard scaling and dinar scaling ran out of steam um several years ago.
[11:43] Probably about a decade ago in fact and we we kept squeezing it.
[11:50] But over the course of last 10 years, if you just allowed microprocessors to continue to scale and you just don't touch the software and just benefited from the speed up of semiconductors and microprocessor design, you at best case you would have gotten 100x but probably because Dinard scaling slowed down and Moore's law largely ended, you know, you probably got something along the lines of 10x over the course of 10 years.
[12:13] Well, in the case of NVIDIA and code design, we got 1 millionx over 10 years.
[12:17] 1 millionx.
[12:20] And so somewhere between 100,000x and a millionx.
[12:23] Okay.
[12:23] So there when you're talking about numbers that big, it really doesn't matter.
[12:27] And so a
[12:29] millionx over 10 years, it got we we were able to get scaling and computation scale so large so fast that AI researchers say, why don't we just take all of the internet?
[12:42] Why even worry about what data to go curate and what what data to create?
[12:47] Let's just take all of the world's data and just give it to the computer.
[12:50] And that's really the big breakthrough when you're able to to to do something so insanely fast.
[12:54] You know, for example, if you were able to travel at the speed of light, uh where we choose to live is doesn't matter.
[13:02] uh if you were able to go from New York to California in 10 minutes uh you know our freedom everything about society would change right and so if you were able to do computing a million times faster
[13:16] everything about computing computing changed and that's really the big breakthrough because of codeesign because of the way Nvidia approached it we accelerated computing by so so far that it created all this infinite abundance opportunity for everybody to
[13:30] to think about the future and so anyways.
[13:32] here we Cool.
[13:34] I have a bunch of follow-up questions, but I'm not going to ask.
[13:36] That one word led to that.
[13:39] GPT 10.
[13:39] That's what it's like to work on a video.
[13:41] You give me one word and you get ranted at for about half an hour because I got too much to share with you.
[13:50] The question is, how should education evolve in response to the industry?
[13:53] Is it changing?
[13:54] Yeah.
[13:54] And that's a really excellent question and and I think the answer clearly is uh AI should be part of your curriculum not just in learning about AI but using AI for the curriculum.
[14:07] The the problem with with textbooks as you know it takes an enormous amount of effort to do and when I was taking classes at Stanford Professor Hennessy was still writing his textbook.
[14:13] It was all handwritten out and and each week it seemed like he was writing a chapter.
[14:23] I don't even know how he writes a chapter a week, but every week he was writing about a chapter and and then over time all of those notes turned into
[14:30] a textbook into the first edition.
[14:32] And that must have taken several years.
[14:34] And so I I think I think um it's not it's not possible for universities for, you know, pre-recorded textbooks to keep up with information and knowledge that's being generated in real time by AI.
[14:50] And so I think the future probably has to be some union of the two.
[14:52] And and I I don't know about you guys, but I can't learn anymore without AI.
[14:57] And so not only do I have the AI read the papers, um, but I also have once it reads the papers, I might ask it to go read, you know, a whole bunch of the other papers that are associated with it.
[15:09] And then now it becomes a super researcher.
[15:11] And then I can I can first I ask it to summarize.
[15:14] um I ask you some basic questions and then after that you interact with that paper as if it's a researcher that's dedicated to you and so most people don't realize that you know I think a lot of people still think that you you summarize a document but in the process of summarizing the document that AI learned a lot and so and I I um I think
[15:34] that in the future I I do hope that that curriculums are are tightly integrated.
[15:38] Um I I will say in defense of the textbooks though I will say that the first principles don't change you know in the final analysis uh uh me and Conway is still as solid of of fundamental methodology as as before it.
[15:53] it is true that the scaling process that led to um constant constant current density um constant uh power density all of that all of those design optimizations associated with modern semiconductors design, you know, the the we've we've kind of exhausted all of that.
[16:12] None of that is ISO anything anymore.
[16:14] And so um but it's still good to know where we came from you know and so I I would still encourage that to appreciate the first principles and and you know while while I was going to to Stanford I was al already working at AMD and um and I was designing microprocessors at the time and it was still it was still really good to to see simultaneously
[16:35] um how how we design things in practice.
[16:39] versus uh the first principal methods.
[16:42] associated with learning about.
[16:43] eventually.
[16:44] uh how to design these things and and um.
[16:47] I I I really enjoyed having you know feet on both sides of it and I I ended up learning a lot more and so what that means is when you're using AI which is real world it's contextually relevant now um it's it's contemporary and meanwhile you have first principle knowledge that you're learning at the same time you're kind of getting the same thing that I experienced.
[17:08] the question is what are your thoughts on open source how do we how does open source stay at the frontier.
[17:13] yeah there's really the question of closed source versus closed proprietary software versus open source.
[17:19] There's a question of my intentions with open source.
[17:21] And so I'll start with my intentions of open source.
[17:22] Um first of all,
[17:27] Nvidia uses more anthropic and open AI tokens than just about anybody, right?
[17:33] And and the reason for that is obviously we do a lot of coding.
[17:36] design and 100% of our engineers are now agent agentically supported.
[17:39] And so so I want them to be working with agents using the latest generation tools and re and remodel modernize how Nvidia does work altogether.
[17:50] Okay.
[17:53] So number one if you can use uh open AI and anthropic I would highly recommend you use it and the reason for that is because it's useful.
[17:58] It works really well.
[18:00] It's getting better all the time and it's you know as as you know large language models is the technology inside but cloud is a product and cloud code is a whole harness around it and that harness is getting better all the time the model is getting better all the time it's not it's not likely that anybody open source go to GitHub download something it's going to work nearly as well okay so so I I highly recommend and we do um use offtheshelf frontier AI models the question is why is it there that we're advancing ing and working so hard on open on on open models.
[18:34] The reason for that is because language models are very
[18:37] important because they represent the the
[18:40] cotification of our intelligence and and
[18:42] um we want to automate ourselves
[18:44] especially is a very important part but
[18:46] you you you need to know that know that
[18:48] AI is about learning the representation
[18:51] the meaning the structure of information
[18:53] and so the question is where is
[18:55] information well we're living in
[18:57] information right now as we speak the
[18:59] reason why there's structure is the
[19:00] reason why every day you show up it's
[19:02] kind of largely the same otherwise be
[19:04] like practically white noise.
[19:06] And so the fact that biological systems and
[19:09] physical systems have structure and from
[19:11] that structure I must be able to learn
[19:13] higher level representation.
[19:15] And if I can learn the representation, then I
[19:17] could manipulate it.
[19:19] Then I can Does that make sense?
[19:21] And so just because I can learn the representation of of
[19:23] language, I can then generate it.
[19:25] I can manipulate it, you know, I could put it
[19:27] to use.
[19:28] And so I want to do the same
[19:31] thing for chemicals and and proteins and
[19:34] genes and uh physics and physical
[19:37] systems, robotics for example.
[19:37] And so notice the way you represent all of
[19:39] those things are fundamentally different
[19:41] because the structure is different
[19:43] and the dimensionality is different.
[19:46] How you train it is fundamentally different, right?
[19:48] Because you don't have a whole bunch of internet corpus of human language on it.
[19:51] So you you got to come up with new new strategies for all of that stuff.
[19:57] And so we decided that we would dedicate ourselves in some fundamental pillars of and because we have the company the company has the talent and the scale we have the ability to put the first piece of artifact out in the world data model how to train it in several different domains.
[20:15] And so some of the domains I care very much about uh one of them is called of course Neotron's language and I'll come back to that in a second.
[20:20] why it is that we're doing it.
[20:21] And then second is Bioneo, that's for biology.
[20:24] And uh um we have Alpamo, somebody mentioned it earlier for autonomous vehicles.
[20:29] Uh basically artificial intelligence uh navigation
[20:36] and then and then um uh we have Groot, which is uh humanoid articulation,
[20:41] robotics, general artificial general
[20:43] robotics uh and and and then we have
[20:46] climate science, you know, basically
[20:48] messoscale multifysics. Okay. And so all
[20:51] of these different area these different
[20:52] domains uh we decided that that we
[20:55] should go and pioneer it. And the reason
[20:57] for that is because otherwise the
[20:59] scientists in these different domains
[21:01] they simply won't have the scale and the
[21:03] technology necessary to go build that
[21:05] foundation model. And so we decided that
[21:07] we would do that. Okay. So that and and
[21:09] as a result of doing that we activated
[21:12] healthcare, we activated life sciences.
[21:14] We act we're working with every single
[21:15] self-driving car company in the world.
[21:17] Doesn't matter which one it is. you know
[21:19] there's Nvidia in there somewhere and so
[21:21] we're we we enabled that entire
[21:23] ecosystem to really flourish and and
[21:26] we're working with robotics right now
[21:27] and you know so on so forth okay without
[21:30] us making that first effort and building
[21:32] a foundation model it's hard to activate
[21:34] the whole industry downstream and so
[21:36] it's about really about expanding AI and
[21:39] and democratizing this capability the
[21:42] the reason why we do language models is
[21:44] because two reasons one there are too
[21:47] many too many societ societies where the
[21:50] scale of their language is not big
[21:51] enough for somebody else to decide to
[21:54] make it a high priority. They'll
[21:56] understand Sweden, Swedish, but making
[21:59] Swedish a top priority is not not not
[22:02] likely because the country is is big but
[22:04] not so big. Uh Chinese of course well
[22:07] taken care of. uh Indian certain
[22:10] dialects very well taken care of but as
[22:11] you know you have like 230 others and so
[22:14] there are too many others unless you
[22:16] deeply care it's never going to be great
[22:20] and human intelligence no matter the
[22:22] size of your population uh you somebody
[22:25] should care and so we created a large
[22:28] language model that's near frontier
[22:29] neatron is close to frontier and we we
[22:32] make everything available so that if
[22:34] somebody wants to then fine-tune it into
[22:36] whatever language of their choice
[22:38] they got no trouble doing that. Okay.
[22:40] And so and then the second reason is
[22:42] very important is because we want to
[22:44] also take these language models and fuse
[22:48] it with the domain specific models
[22:51] because of human priors. So for example,
[22:55] Elpio is a language model fused with a
[22:59] uh world model. And so on the one hand
[23:02] it's really designed to detect cars and
[23:04] roads and things like that. But on the
[23:06] other hand, we also believe that if the
[23:09] AI model, if Alamayo, the self-driving
[23:11] car model can reason, reason like a
[23:13] human and it could reason with human
[23:15] priors, then uh the number the amount of
[23:19] experiences it needs to have before it
[23:21] could be an extremely good and safe
[23:23] driving car is dramatically reduced. the
[23:25] amount of training data is reduced and
[23:27] we've proven that Alpha Mayo is probably
[23:29] one of the most effective self-driving
[23:33] car systems in the world and it's really
[23:35] only experienced you know a few million
[23:38] miles not billions of miles and so that
[23:39] kind of tell the system actually works
[23:42] okay so anyways I just gave you a
[23:43] long-winded answer for I broke it all
[23:45] down you can't just ask a
[23:48] >> simple question
[23:48] >> what we talked about
[23:49] >> but open models is really important and
[23:51] then and one one more thing okay if
[23:53] there's not that wasn't enough one more
[23:54] Okay, if you want if you care to have AI
[23:58] be safe and secure, it has to be open.
[24:00] And the reason for that is you can't
[24:02] defend against a black box and you can't
[24:05] secure a black box. And you can't put a
[24:07] black box of some cap incredible
[24:09] capability into your system with it
[24:12] completely completely opaque. Now, of
[24:14] course, there's a lot of different ways
[24:15] you could solve the opaqueness. For
[24:17] example, you could say before it do does
[24:20] anything, you have to reason about it to
[24:22] me step by step. before you do anything
[24:24] at all, you have to come up with a plan.
[24:25] You have to reason about it step by
[24:26] step, but you could always lie. And so,
[24:29] so the ability for the the the nice
[24:32] thing about transparent systems is that
[24:34] then, you know, we everybody gets to
[24:36] interrogate it. Uh if you have a
[24:37] transparent system, then researchers get
[24:39] to use it. If you have a transparent
[24:41] system, open system, then the way you
[24:43] defend against super agentic systems in
[24:46] the future for cyber security is
[24:47] obviously not to go into a battle of who
[24:50] gets the better one. You know, you come
[24:52] up with some model, a model 7.0, and the
[24:56] only way I combat against that, I'm
[24:58] completely vulnerable until I come back
[25:00] with a 8.0, and then you got to come
[25:02] back with a 9.0 and just go back back
[25:04] and forth driving each other nuts. And
[25:06] obviously, that's not that that's
[25:07] obviously not the smartest way to do it.
[25:10] The smartest way to do it is you're
[25:11] going to you're going to create these
[25:13] incredible cyber security systems and or
[25:16] you're going to these cyber security
[25:17] threats. And what we're going to do is
[25:18] we're going to have millions, billions,
[25:21] swarms of cheap AIs and we're going to
[25:24] systematically surround it. And so it's
[25:26] kind of, you know, if you will, a giant
[25:28] dome. So, for example, Neimotron Nano is
[25:31] being used for cyber security. And so
[25:33] all these cyber security firms take Neo
[25:35] Neatron Nano because it's so fast and so
[25:37] so cost-effective, you can train it to
[25:40] detect cyber cyber attacks and then just
[25:43] deploy trillions of them. Yeah. Um on on
[25:48] the topic of open scaling, you know, we
[25:51] hung out in January and we
[25:52] >> I feel like you know that one scene in
[25:54] Thor, do you remember he was just
[25:56] hanging and he kept rotating in that
[25:59] direction?
[26:00] >> It's zero gravity here at AI Coachella.
[26:02] We got no gravity
[26:04] Thor Ragnarok. Do you guys remember
[26:06] that?
[26:06] >> We can move a little bit back so you can
[26:07] hear this. Okay.
[26:08] >> You guys don't watch movies.
[26:10] >> Well, we had a whiteboard too if you
[26:11] want to get off and walk. But um so in
[26:13] in January we met we talked about this
[26:15] topic open scaling. We talk about
[26:17] bottlenecks. We talked about um data as
[26:20] one bottleneck, comput as another
[26:21] bottleneck. Um you know there's at least
[26:24] one experiment that uh we announced at
[26:26] GTC together which was the coalition
[26:28] scaling idea. The second is on how to
[26:30] improve utilization on compute which is
[26:33] increasingly scarce. Uh it came out last
[26:35] week that there was a memo at XI that
[26:38] said their uh Memphis cluster pool is
[26:42] running at 11% MFU utilization which I
[26:45] think like corresponds to something like
[26:46] 11 billion or something of unutilized
[26:48] MFU flops. How can the open space well
[26:51] like maybe you could talk a little bit
[26:52] about why coalition scaling is an
[26:54] experiment we're trying and we have Ryan
[26:56] coming actually in the final office
[26:57] hours to talk about progress and then
[26:58] how do we get utilization
[27:01] to be better for open the open ecosystem
[27:03] when you don't have full like sort of
[27:05] fully integrated companies that can
[27:06] optimize up and down the stack.
[27:08] >> Yeah. Um did you do you guys know know
[27:11] what M MFU is? And so FU you guys know
[27:15] you guys don't use that anymore. So, MFU
[27:19] is just simply wrong. Okay, it's a it's
[27:21] the the amount of of uh the percentage
[27:24] of of uh flops basically uh that you
[27:29] consume while doing your work. All
[27:31] right.
[27:32] >> Model flops utilization.
[27:33] >> Yeah. And so, so it it's unfortunately
[27:36] with every metrics uh depending on what
[27:38] you measure, you could be measuring the
[27:40] wrong thing. And so, let me tell you
[27:42] why. Uh if you ask me do I want to be at
[27:46] at um high MFU personally or low MFU, I
[27:50] would like to be at low MFU all the
[27:52] time. And the reason for that is because
[27:53] I want to be so smart I'm
[27:55] overprovisioned for it to work.
[27:57] >> Okay? Because I'm overprovisioned. I got
[27:59] so many flops and sitting idle. And the
[28:02] reason for that is because the way that
[28:04] computing works in these large scale
[28:06] data centers is you have flops, you have
[28:09] memory bandwidth, you have memory
[28:10] capacity, you have network capacity at s
[28:12] any given point in time something is
[28:14] bottlenecked at any given point in time
[28:17] something is bottlenecked. And so what
[28:19] you want to do is you want to provision
[28:21] every overprovision on everything
[28:23] >> so that you could avoid AMD doll's law
[28:26] >> otherwise you're fighting AMD's law all
[28:27] the time. But then if you're
[28:29] provisioning for peak and not your base
[28:31] loads, then you're going to have a bunch
[28:32] of those flops sitting while while
[28:33] overprovisioned when you don't need them
[28:35] because it's spiky. But there at at the
[28:37] right time, it goes to 100% MFU,
[28:40] >> but only for a short period of time.
[28:42] >> And if that short period of time, you
[28:44] don't get you don't get all that
[28:46] overprovisioned flops, right?
[28:47] >> Then during that short period of time,
[28:48] it become becomes a long period of time.
[28:50] >> And so what are you seeing for teams
[28:52] that are trying to
[28:52] >> trans and flops are cheap?
[28:55] >> No, flops are cheap. H100s are going up
[28:58] in price.
[28:59] >> Well, not because of its flops, but
[29:00] because of H100 Hopper, you know, it's
[29:02] it bandwidth, its architecture, it's
[29:04] everything else, not just its flops.
[29:06] >> What what is so should we think about
[29:09] compute as not a scarce resource?
[29:12] >> No, no, that's not that's not the
[29:13] question. It's like this uh uh when you
[29:16] when you ask about a car, uh back in the
[29:18] old days when we were unsophisticated,
[29:20] we used to say how many horsepower is
[29:22] your car,
[29:22] >> right?
[29:23] >> But these days, who does that?
[29:24] >> So, what's the right measure you think
[29:25] we should be thinking about? performance
[29:27] >> uh and what when you tell the teams guys
[29:29] this is the perf we got to hit next year
[29:31] what are you finding is the eval you're
[29:33] you're reaching for more and more
[29:34] >> you have to come up with a real eval a
[29:36] really serious eval
[29:39] because otherwise you be like improving
[29:41] your flops you know it's it's no you
[29:43] figure out something that that you guys
[29:44] can improve and you're improving that
[29:46] number doesn't make you smarter you're
[29:48] improving that number doesn't make you
[29:49] more successful
[29:50] >> and so it it's there's nothing wrong
[29:54] >> there's nothing wrong with having a lot
[29:56] of flops. Um, but it's not the complete
[29:59] necessary, not sufficient. That's all.
[30:01] >> In one sense, you could think about the
[30:02] output of tokens as intelligence. So, it
[30:05] should be some unit of intelligence per
[30:08] watt or
[30:08] >> Yeah. Yeah.
[30:09] >> Notice notice the tokens per watt is
[30:13] more than
[30:15] flops. And in fact,
[30:17] >> we know that now because for decoding
[30:20] these large language models, the single
[30:22] most important thing for generating
[30:23] tokens per watt, right, is actually the
[30:26] aggregate bandwidth across the MVLink 72
[30:30] and the MFU is incredibly low because
[30:32] the prefill is not that much. It's
[30:34] mostly decode,
[30:35] >> but you can decouple decoding and
[30:36] pre-fill.
[30:36] >> It's disagregated. And so notice I just
[30:39] delivered incredibly high tokens per
[30:41] watt with extremely low MFU. But but not
[30:44] all tokens are born equal, right? And so
[30:46] how do we account for that? Like when
[30:48] you're designing the systems of the
[30:49] future, how do we account how what is
[30:51] the right way to measure without a
[30:52] standard measure of intelligence when
[30:54] you have coding tokens being more
[30:55] valuable per what than I don't know some
[30:57] other kind of token? Does does that does
[30:59] that question make sense?
[31:00] >> Makes perfect sense. You always have to
[31:01] come back to not just optimizing for SAT
[31:05] scores,
[31:07] you're optimizing for something bigger.
[31:09] And so so that's that's basically it.
[31:11] It's the same idea. you're you have to
[31:13] decide what evaluation as you know eval
[31:16] how you evaluate success matters a lot
[31:19] in how people perform and so what Nvidia
[31:23] does extremely well inside the company
[31:25] is the systems that we create for
[31:27] evaluating architectures and and flops
[31:30] is too too contrived
[31:32] >> because it was that easy then
[31:34] >> and so do we have
[31:35] >> I wouldn't be here
[31:37] >> you have a hard job which is to try to
[31:39] design an index of different
[31:40] intelligences is right because you like
[31:43] I think when when I'm building when I'm
[31:44] when our teams are researching on the
[31:46] NVIDIA architecture we've got one lab
[31:48] doing coding another one pushing the
[31:49] frontier of superc conductivity and so
[31:51] on and they got all they all have
[31:52] completely different evals they're
[31:54] measuring for but they're all using
[31:55] Nvidia chips
[31:56] >> so h like how do you how do you solve
[31:58] that problem when your customers all
[32:00] have their own evals yeah
[32:01] >> but the architecture of the underlying
[32:03] platform
[32:03] >> that's why it's so hard and and here it
[32:06] is true it's that is is that hard the
[32:08] problem is this if you if you build
[32:10] something that's too overfit for
[32:13] something. You could be incredibly good
[32:15] at it.
[32:16] >> And so you're overfit for this one
[32:17] problem. You're insanely amazing at it,
[32:20] but then the problem is is that market,
[32:22] you know, that problem space may not be
[32:24] good, may not be big enough to fund a
[32:26] sufficiently large R&D, right?
[32:28] >> And so you want to be good at many
[32:30] domains, multi-dommain on the one hand.
[32:34] On the other hand, if you're good at
[32:35] everything, then you're good at nothing.
[32:37] You became general purpose. And so that
[32:40] writing that balance by the way is
[32:42] artistry you know it's that's what I do
[32:45] for a living. What should we not do?
[32:47] What should we double down on? What
[32:49] should we 10x on that you know that's
[32:51] that requires some amount of vision
[32:53] strategy you know some amount of trial
[32:56] and error some just personal enjoyment
[32:59] and entertainment uh you know iteration
[33:03] all of that. Can we talk about the
[33:04] canvas of Fineman, which is a trip I'm
[33:06] very excited about, but it's been hard
[33:08] to get info on it. What's the canvas
[33:10] telling you now about what that your art
[33:12] piece is going to look like for the
[33:14] Fineman?
[33:14] >> Well, I can tell you the journey that we
[33:15] came on. And so, so if you look at
[33:17] Hopper, Hopper was designed for a
[33:21] problem space that was rather new. It
[33:23] was called pre-training. And so
[33:24] pre-training uh came along and we we
[33:27] came to the conclusion that that um uh
[33:31] although the generation before it was
[33:33] was uh fairly significant already that
[33:36] we should build even larger ones
[33:37] tremendously larger ones larger than any
[33:41] of the largest science scientific
[33:44] supercomputers in the world.
[33:45] >> Okay. Okay, so that's a very big deal
[33:46] that that the the largest supercomputer
[33:49] in the world was about $350 million and
[33:51] we we thought you know what uh
[33:53] pre-training is going to be such a large
[33:54] domain and such an important problem. We
[33:56] should design systems that could be
[33:58] multi-billion dollars
[33:59] >> at the time that we're thinking about
[34:01] doing this sounds insane. You know, you
[34:03] you would have precisely zero customers.
[34:05] And the reason for that is because the
[34:07] most expensive thing that has ever been
[34:08] sold was $350 million and you're
[34:11] building something that's multiple
[34:12] billions of dollars. So you're pre
[34:14] you're building for a precisely c a
[34:16] marketplace of zero but we went and did
[34:18] it anyways on first on first reasoning
[34:20] and so hopper was designed for
[34:22] pre-training and that was a great call.
[34:24] The second thing that we did was we said
[34:25] okay well after after training and we'll
[34:28] keep we're going to keep making training
[34:30] better but the goal is not of AI isn't
[34:32] training the goal of AI is inference
[34:35] >> and and um and what kind of a system
[34:37] would inference really care about and so
[34:39] we created a system called MVLink72
[34:42] and the reason we did that was because
[34:44] decode the in in in processing the
[34:47] neuronet network there's the prefill
[34:48] which is really context processing and
[34:50] things like that and attention
[34:51] processing and then the decode which
[34:53] generating all these tokens, the
[34:55] generation of the tokens requires really
[34:57] high uh memory bandwidth and the amount
[35:00] of memory bandwidth you need is way more
[35:03] than one chip can possibly provide. And
[35:06] so we said why don't we gang up like 72
[35:08] of these things and so we had to invent
[35:10] all kinds of new systems for switching
[35:12] and interconnects and uh created all
[35:14] kinds of new certities and and we
[35:16] created essentially the world's first
[35:18] rack scale computer. It's called Grace
[35:20] Blackwell MVLink72. the speed up over
[35:23] the previous generation 50 times. In two
[35:26] years, we improved something by 50
[35:27] times, Moors law would have improved it
[35:29] by 2x. Okay, so the architecture and the
[35:33] insight uh was fantastic and decode and
[35:36] inference and large language models and
[35:39] token generation it you know all of that
[35:41] kind of landed at exactly the time that
[35:43] Grace Blackwell came out and boom took
[35:45] off. So Grace Blackwell uh another
[35:47] incredible generation. Now the question
[35:48] is what happened to Vera Rubin and
[35:52] what's the big idea? Well, the big idea
[35:54] is that that the goal isn't just to
[35:56] think. The goal is to do work. And so,
[35:59] Vera Rubin is designed for agents. And
[36:02] so, the question is what is the compute
[36:04] pattern? What is the processing pattern
[36:05] of agents?
[36:06] >> And uh agents of course uh you have to
[36:09] you have to load a fair amount of
[36:11] memory. Uh long memory. He's got working
[36:13] memory. So, long-term memory we put into
[36:15] storage. And we got that storage needs
[36:17] to be able to directly communicate with
[36:19] the GP. you can't be copying copying
[36:21] that that that the data off of the the
[36:23] uh the network storage, but you want
[36:25] storage to be connected right into the
[36:27] processor itself. And so we we have we
[36:29] have storage that's connected to to the
[36:30] fabric. We have we have um we're going
[36:33] to use a lot of tools and so CPUs are
[36:35] going to be really important. But the
[36:37] CPUs of the last of the current
[36:38] generation was really designed for cloud
[36:40] computing. And so you have these CPUs
[36:43] with hundreds of cores like you know 200
[36:45] cores. Well, the CPUs of agents because
[36:50] because the AI is this multi-billion
[36:52] dollar system and it sends off uh an
[36:56] instruction to use a tool and that tool
[36:58] is going to run on the CPU. Meanwhile,
[37:00] this a this computer this GPU
[37:03] supercomput this multi-billion dollar
[37:05] system is waiting for this one CPU and
[37:08] so that CPU really wants to have
[37:09] extremely low latency. So we designed
[37:12] Vera which is the for for current
[37:15] generation for singlethreaded you know
[37:18] multiple core single threaded code it is
[37:20] by far the most most uh performant and
[37:22] so we created a CPU just for that.
[37:24] Notice notice the way you solve this
[37:26] problem intuitively is you you kind of
[37:28] think about what is the computing
[37:29] pattern. Um how is it different than the
[37:31] past? Um you have to have some mental
[37:34] model about it and you create a system
[37:36] uh that you can you can go and uh go
[37:38] build uh to to uh run that. And so so
[37:41] now agents are here. We're going to run
[37:43] that on Vera Rubin and and and hopefully
[37:45] when Fman gets here, it's going to be
[37:48] it's going to be like all software uh a
[37:51] we call them agents today, but you know,
[37:53] it could be modules in the past or, you
[37:55] know, subm modules and and so in the
[37:57] future, you're going to clearly have
[37:58] systems of agents and agents with sub
[38:00] aents and sub aents with sub aents and
[38:03] and so you're going to have um you know,
[38:05] this swarm of agents and and what what
[38:08] kind of computer, you know, does that
[38:09] does that manifest? And so that's that's
[38:11] likely what Flyman's about. I have one
[38:12] more follow-up question on that which is
[38:14] you know one of the things you've always
[38:16] done well is kind of spot bottlenecks
[38:18] one generation ahead and then try to
[38:19] sort of presolve for that the supply
[38:20] chain a year ago that was um photonics
[38:24] end up becoming a huge solution um as we
[38:26] look about looked to energy as a
[38:28] bottleneck you know copper wires
[38:29] literally copper wires are one of the
[38:31] the transmission sort of bottlenecks how
[38:34] does that get solved in your view
[38:37] >> um energy is just everywhere for well
[38:40] first the first thing that we could do
[38:41] that that um uh that is in our control.
[38:45] You know, as with everything in life,
[38:47] whatever the problem is uh whatever the
[38:50] external external concerns are uh you
[38:53] should do something that's in your
[38:54] control and in our control is energy
[38:56] efficiency. So if you look at look at
[38:58] tokens per watt uh we improved it by 50x
[39:01] and then we'll have to keep on improving
[39:03] it by you know by significant factors
[39:05] and it compounds. So that's that's the
[39:07] first thing we can do. we can control
[39:08] that through code design architectures
[39:10] and things like that. And the second
[39:12] thing that we could do the thing we
[39:14] could um inspire people to and that's
[39:16] through a lot of education inspire the
[39:18] ecosystem to get ready for this and and
[39:21] um uh and and I've been over the last
[39:23] last half decade uh helping people
[39:26] understand the amount of compute that's
[39:27] likely to be coming and I just told you
[39:29] guys something about how I reason
[39:31] through uh how much energy is going to
[39:33] be necessary. The amount of energy that
[39:35] we need for compute for computing is
[39:37] likely um you know probably a thousand
[39:40] times more than we currently have and
[39:41] that's an enormous amount of energy. Um
[39:43] however the way to think about that is
[39:45] in the future computers are going to be
[39:47] two things. is always going to be
[39:49] generated because it's intelligent. It's
[39:51] contextually aware. So, it's going to be
[39:53] generated. And the number two is going
[39:54] to be continuous. And so, you this
[39:56] generative computing in a continuous way
[39:59] um compared to pre-recorded
[40:02] retrievalbased computing that is only um
[40:06] initiated on, you know, per use. The
[40:08] question is how do you how do you think
[40:10] about the amount of energy necessarily
[40:11] that? So, I I think if you if you say we
[40:13] need it a thousand times, uh I I
[40:15] wouldn't be surprised if we're off by a
[40:16] couple of orders of magnitude. and and
[40:18] so we need a lot more compute. we need a
[40:20] lot more energy and so you got to go and
[40:21] explain this to people and so I I you
[40:23] know you got to explain it to people in
[40:25] a way that's kind of common sense and
[40:27] and and they can observe it and there
[40:29] you know indicators along the way that
[40:31] that in fact this is happening and and
[40:33] notice just as I was breaking it down
[40:35] for you guys you know I'm reasoning
[40:36] about it for you so that so it's common
[40:38] sense to you and so the amount of energy
[40:40] is is high and then lastly the source of
[40:43] energy
[40:45] we there there's there's a there's all
[40:48] kinds of sources of energy But
[40:49] unfortunately because of of great
[40:52] concerns about about um uh the cost of
[40:55] sustainable energy, we underinvested in
[40:57] in sustainable energy. Um but this is
[41:00] the best time ever in the history of
[41:02] humanity to go and invest in sustainable
[41:04] energy. And the reason for that is
[41:06] because the market forces are so strong.
[41:08] Back in the old days, you needed
[41:10] government subsidies to go build solar
[41:13] farms and government subsidies to go
[41:15] build nuclear plants. And now you can
[41:17] just market will pay you to do it. And
[41:20] so market forces are so powerful right
[41:22] now. This is our best chance to upgrade
[41:25] our grid, our you know archaic grid. Um
[41:28] add add sustainable energy of all kinds.
[41:30] And you know this is a great time
[41:32] >> in terms of education. What I've learned
[41:34] as well, we designed the class for the
[41:35] students here. Turns out a lot more
[41:37] people especially a lot of investors and
[41:39] capital allocators are watching this
[41:41] right.
[41:42] >> Oh sh Why don't we put it up?
[41:44] >> Yeah. Um, and so if there's
[41:47] >> I'm just kidding.
[41:48] >> If there's education you'd like to do to
[41:49] that audience, feel free to drop in. You
[41:51] know,
[41:52] >> repeating yourself after a while with
[41:54] with capital allocators can get
[41:56] >> repetitive. I don't mind that.
[41:58] >> So, if you'd like to transmit, feel free
[41:59] feel free to um what is the next
[42:01] question we should take?
[42:03] >> The question is how best to spend their
[42:05] mental faculties over the next few
[42:07] years.
[42:07] >> Yeah. I so so first of all on the pain
[42:10] and suffering comment um there there's a
[42:13] there's there's some kind of there's
[42:15] some advice that says you should you
[42:17] should choose what you love and what
[42:19] you're passionate about. That's what
[42:22] your career should be. And and I think
[42:24] that's terrific. I think that's
[42:26] terrific. You know, if you're if you
[42:27] happen to to to know what you're
[42:29] passionate about, if you happen to know
[42:31] what you love. Um uh uh but I think
[42:34] there are a lot of people who don't know
[42:36] what they're passionate about and they
[42:37] don't know what they love. And the
[42:38] reason for that is because nobody knows
[42:40] everything. How could you not how could
[42:42] you know what you don't know? So in a
[42:45] lot of ways um the idea that you would
[42:48] only do you would only choose careers
[42:50] that give you passion that gives you you
[42:52] know it gives you get makes you happy um
[42:55] is a bar that I think is too too high.
[42:57] number one. Um, and the reason for that
[42:59] is because whatever you decide to do for
[43:02] a living, whether it's you found
[43:03] something that you're passionate about,
[43:05] uh, or this is your job.
[43:09] And in my case, you know, it used to be
[43:11] cleaning toilets and busting tables. It
[43:13] was my job and I will do the best I can
[43:17] in my job. Whatever you give me as a
[43:19] job, I will do the best I can possibly
[43:22] do. And I do that today. And now
[43:25] there's a misunderstanding that that
[43:28] somehow CEOs we love our job and and and
[43:32] you know many co oh I I'm passionate
[43:34] about my job I love my job they're
[43:36] they're lying there there's not there's
[43:40] not one CEO who who I who can say that
[43:43] you know from the moment I wake up to
[43:45] the moment I go to bed is just zippity
[43:47] dooah the fact of the matter is uh I
[43:50] really love doing 10% of my work and 90%
[43:53] of my work is hard and I do it to to the
[43:57] best of my ability anyhow and I suffer
[44:00] through it. I literally suffer through
[44:02] it. I prefer to do something else that
[44:05] other 10%. But that other 10% there's
[44:07] only so much quantity of that and and
[44:09] every company has abundance of problems
[44:11] and there comes in different types and
[44:13] you're going through life you're going
[44:14] to have abundance of problems that going
[44:16] to come in different types and you just
[44:18] have to learn how to condition yourself
[44:20] to want to get to a better state. no
[44:23] matter how hard to get better no matter
[44:26] how hard. And that's suffering. You
[44:28] know, you're you don't like doing it,
[44:30] but you're doing it with all your might
[44:31] anyways. What do you call that? That's
[44:33] suffering. And so, so I think that when
[44:36] you when you suffer and you have the
[44:38] benefit of struggle and you you're been
[44:41] presented with many opportunities like
[44:43] that, it teaches you resilience. And
[44:46] when it when the time comes and and the
[44:48] world or your family or your company or
[44:50] your colleagues, they need you to be
[44:52] tough. They need you to be resilient.
[44:54] They just need you to be able to fight
[44:56] through it. You can't have those. You
[44:59] don't have that character about you. You
[45:01] don't have that muscle unless you've
[45:04] gone through it a whole bunch of times.
[45:05] And so, you know, I'm I'm advising that
[45:09] that that you not you not seek for just
[45:12] joy, that you also seek for some some
[45:16] pain, some suffering, because you're
[45:18] going to need it someday. And and then
[45:21] lastly, it's also it's just your job.
[45:23] You know,
[45:24] >> as preacher Hong once said, don't wake
[45:25] up with a loser mindset.
[45:29] The question is, what's your favorite
[45:31] order of Denny's? Yeah, Corllis really
[45:34] should have a Dennis. Um,
[45:38] you know, after all these years,
[45:39] frankly, it's about time, right? And so,
[45:42] there was that there was that that one
[45:44] Chinese restaurant.
[45:46] Um, and uh and Woodstocks, of course,
[45:49] right? Corvalis Woodstocks Pizza. It's
[45:51] still pretty good, isn't it? Woodstocks.
[45:53] >> It's all I like American Dream better.
[45:55] >> American Dreams better. Okay. All right.
[45:57] I I'll I'll be back there soon enough.
[45:59] And so so um uh Denny's I would say
[46:04] surprisingly the fried chicken is really
[46:05] good. So you know it's a slightly on the
[46:07] on the sweet side. Uh Superbird is
[46:09] excellent and done right. And um and
[46:13] then another one if they're willing to
[46:14] make it for you. Make it like a
[46:16] Superbird. Okay. But is a grill ham and
[46:19] cheese with tomato and mustard. And if
[46:22] they're willing to make it for you that
[46:23] they're willing to make it for me. And
[46:25] so,
[46:28] but that's because I'm not not because
[46:30] because I'm an alum. They know that.
[46:32] Hey, use the bus tables here. Yeah.
[46:34] Yeah. We'll make special for you. Uh but
[46:37] but those are all good. You know, the
[46:39] Grand Slam, you know, I enjoy it. I like
[46:42] a pigs in a blanket. So, that's pretty
[46:44] good. Um there's a whole bunch of stuff.
[46:47] Good. I go all day. I at Denny's I had
[46:50] my first fudge hot fudge sandwich. I had
[46:53] my first uh apple pie with cheese on
[46:56] top. I that's like for for a Chinese
[46:59] kid, it's like what is that about? That
[47:00] doesn't make any sense. And but now you
[47:02] think about it makes perfect sense, you
[47:03] know, apple and cheese. But anyways, I I
[47:06] had a whole bunch of it was I had my
[47:07] first milkshake when I was at Denny's.
[47:09] Um I had a whole bunch of firsts. Yeah,
[47:12] Denny's Denny's was eye opening for me,
[47:14] >> man. Before we lose you to the to memory
[47:16] lane, next question, please.
[47:19] >> Those are some of the most important
[47:21] questions. Agreed. Yes, the questions
[47:23] about your thoughts on adversarial
[47:25] countries getting access to uh Nvidia
[47:27] chips.
[47:29] >> Uh first of all, so you you know what we
[47:31] we make for a living? We make GPUs and
[47:34] and um uh GPUs are used for uh video
[47:37] games. Uh they're used for uh delivering
[47:40] soy sauce. They're used for medical
[47:41] imaging. Uh if you had a CT scan done
[47:45] yesterday, I'm fine. Uh but that behind
[47:47] it was Nvidia. Uh Nvidia is in every
[47:49] single medical imaging system in the
[47:51] world. And uh and so the question is
[47:54] what is it that you build? Um what I'm
[47:56] what I'm fundamentally against and it
[47:58] makes no sense. It makes no sense in
[48:00] this moment is to compare Nvidia GPUs to
[48:02] atomic bombs.
[48:06] There are billion people with Nvidia
[48:07] GPUs. I advocate Nvidia GPUs to all of
[48:10] you. Uh I advocate Nvidia GPUs to my
[48:12] family, to my kids, uh to people I love,
[48:14] but I don't advocate atomic bombs to
[48:16] anybody.
[48:18] >> So that analogy is stupid.
[48:22] And so, so if you start from there, you
[48:25] can't finish a thought.
[48:27] If you start from believing that, you
[48:29] can't finish the rest of the thoughts.
[48:31] Um, the second the second idea that I I
[48:33] consider completely ridiculous.
[48:36] Uh, why should American companies go
[48:38] compete in foreign countries? You're
[48:40] going to lose it anyways.
[48:43] You're going to lose it anyways. So, why
[48:45] go? Well, if you guys all apply that
[48:48] same philosophy, why wake up in the
[48:49] morning?
[48:51] And so, I don't I don't prescribe to we
[48:54] are going to lose anyways. I don't
[48:56] prescribe to that. If you want me to
[48:58] lose, you're going to have to deal it to
[48:59] me. But, you know, I'm going to have to
[49:02] put up a fight.
[49:04] And I put up a lot of fights over the
[49:06] years. I'm doing okay. And so, so I
[49:08] think that and and and as you know, the
[49:11] battle, the competition uh serves
[49:13] markets. It enhances and enhances your
[49:16] company. I'm not a little bit afraid of
[49:18] having to go and compete in the
[49:20] marketplace, but the idea that I'm going
[49:22] to lose anyway, so why go compete makes
[49:24] no sense to me. And then lastly, uh the
[49:27] idea that that somehow we should deprive
[49:30] certain countries of general purpose
[49:33] computing and we can all acknowledge now
[49:34] Nvidia is a general purpose computing
[49:36] company. I just gave you a whole bunch
[49:37] of general purpose use cases. Is a
[49:39] general purpose computing company to be
[49:41] deprived of that. so that one or two
[49:43] companies uh could benefit from
[49:46] depriving other people of it. That makes
[49:48] me makes no sense either. Why should one
[49:50] industry suffer so that another one
[49:52] company benefits, another one or two
[49:54] companies benefit? Entire American the
[49:56] the American technology industry is one
[49:59] of our national treasures. You are going
[50:01] to be part of it.
[50:04] And if I do my job, when you are done
[50:07] graduating, you're going to graduate
[50:09] into the mightiest computer industry and
[50:12] the mightiest industry in the history of
[50:14] humanity.
[50:16] But if we give it up for some reason or
[50:19] we through policy decide that we can't
[50:21] go and sell and concede twothirds of the
[50:24] market to the twothirds of the world to
[50:28] other companies, by the time that you
[50:30] graduate, you would have gone into a
[50:32] shell of an industry. that shell of an
[50:35] industry we've seen before. A long time
[50:37] ago, the same arguments went went
[50:40] against America in telecommunications.
[50:44] Today, America has no telecommunications
[50:46] fundamental technology anymore. It was
[50:49] all lit. It was all completely policied
[50:52] out of our country. And so, somebody has
[50:54] to put up a fight for that. some of
[50:56] these reasoning systems to to to say
[50:59] that AI is AI is going to come and it's
[51:01] going to be a singularity moment that
[51:04] singularity that moment the moment it
[51:05] comes it's going to be the most powerful
[51:07] thing in the world it's come come as a
[51:09] flash we have no idea whether it's going
[51:12] to come on Wednesday or Thursday at 7:00
[51:15] but when it comes it's going to be game
[51:17] over some percentage chance that it'll
[51:20] be the end of society as we know it come
[51:22] on we all watch Dune
[51:26] we don't have to repeat it. And and so I
[51:29] think that living living their fantasies
[51:32] out, their science fiction fantasies out
[51:36] uh in in in in public uh demonstration
[51:39] when everybody is relying on their words
[51:41] and believe in the words is
[51:42] irresponsible. It is not true. It is not
[51:46] true that we have no idea how these
[51:48] systems work. It is not true. It is not
[51:50] true that the technology is going to
[51:52] some somehow uh in some nancond become
[51:56] infinitely powerful and therefore it's
[51:57] going to take over the world. It is not
[51:59] true. It is not true. There's no way to
[52:01] defend against it. It is not true. These
[52:04] things are all being made up and it's
[52:07] made up in a way that unfortunately even
[52:09] harms all of you.
[52:12] You're in computer science. You're
[52:13] hoping that when you graduate people
[52:15] care about computers.
[52:20] We want to create a future that is
[52:22] optimistic about the technology that you
[52:24] are learning to master.
[52:27] We want to create that future. We want
[52:30] to make sure that America, we want to
[52:31] make sure that everybody benefits from
[52:33] AI. Everybody should have AI. Nobody
[52:35] should have nuclear bombs. Can you guys
[52:37] agree with that?
[52:38] >> And so, okay.
[52:42] And so, so young man, young man, thank
[52:45] you for triggering me. I'm just kidding.
[52:48] >> Okay,
[52:49] >> I'm just kidding. I'm just kidding. I
[52:51] just wanted to get get it out.
[52:52] >> So, we're rational optimists here on at
[52:54] AI Coachella. So, we believe in
[52:55] optimism. I'm gonna push back a little
[52:57] bit on a different angle. I completely
[52:58] agree. Reasoning by analogy is a
[53:00] problem. Once you start with bombs of
[53:03] first principles, what we are observing
[53:05] is that compute, we are computed in
[53:08] America. Independent teams, startups,
[53:11] universities, they can't get compute. So
[53:14] from a preference order perspective,
[53:16] shouldn't America get first priority to
[53:18] a scarce resource before we start
[53:19] shipping it off? Absolutely.
[53:21] >> But that's not happening.
[53:22] >> Absolutely not.
[53:26] There's the gotcha. Yeah. Absolutely and
[53:28] absolutely not. And the question is why
[53:30] not?
[53:30] >> Uh there's plenty of chips. You guys, if
[53:33] some if if the president of Stanford
[53:35] places an order, I promise you I'll
[53:36] deliver it.
[53:37] >> I have no absolutely.
[53:39] >> You guys heard it here.
[53:41] >> All right.
[53:43] ahead of this is not funny. This is not
[53:47] funny. We are dying out there.
[53:48] >> No. No. This is not funny. That's right.
[53:50] This is a serious matter. Um it is not
[53:52] it is not true. It is not true that
[53:55] people are giving me orders, placing
[53:57] orders, and we're not delivering chips.
[53:58] It is just not true. You got to you got
[54:00] to place orders. The fact of the matter
[54:02] is the fundamental problem is actually
[54:04] something very different.
[54:06] >> The Stanford needs compute.
[54:10] Science needs compute.
[54:12] The fundamental problem is the system is
[54:15] no longer built to be able to deliver
[54:19] massive scale compute. And the reason
[54:21] for that is because just think all of
[54:23] the all of the research departments here
[54:25] at Stanford, they're all in different
[54:27] departments. You all raise your own
[54:29] funding. You all get your own grants.
[54:31] Nobody's going to go share their grants.
[54:33] But none of the grants are big enough to
[54:35] have a large enough compute that you use
[54:38] some of the time, but when you use it,
[54:40] you need it to be incredible.
[54:42] You've got the world moved away from
[54:44] those centralized computing environments
[54:47] towards everybody just using laptops.
[54:50] That's this is today's computing
[54:51] environment and fundamentally
[54:55] these all the universities Stanford is
[54:57] not alone. You don't have a budget for a
[54:59] billion dollar compute. It doesn't
[55:00] exist.
[55:02] >> But whose fault is that?
[55:03] >> Stanford's.
[55:05] And the reason the reason why you have
[55:07] to say that is because I'm empowering.
[55:10] When somebody is at fault, you empower
[55:13] them to solve it. Do you agree? When you
[55:15] Oh, yeah. It's not your fault, son. It's
[55:16] not your fault. You're a failure. It's
[55:18] not your fault.
[55:20] >> It's not your fault.
[55:20] >> Talking to me right here, you know. Uh
[55:23] yeah. Hey son, you're an idiot. It's not
[55:26] your fault.
[55:27] >> No, it's absolutely your fault. And and
[55:29] so by saying that it's absolutely your
[55:31] fault, you're also empowering yourself
[55:33] to solve it. Isn't that right?
[55:35] you're empowering yourself to solve it.
[55:37] And so the question that you just talked
[55:39] to somebody who kind of feels, you know,
[55:42] I can do something about my future.
[55:45] You're talking to somebody who who's who
[55:47] believes in that. Okay? And so if I were
[55:49] Stanford, you just have to you have to
[55:51] find a way to to to change the way you
[55:53] do budgeting, the way you deal with
[55:55] computing. You have to find a way to
[55:57] aggregate and build yourself a linear
[56:00] accelerator just like Stanford has done
[56:01] in the past. We need to build campuswide
[56:04] supercomputers that everybody share.
[56:06] Now, you could also go and just contract
[56:08] somebody else to do it. I mean, that's
[56:09] that's all possible. But you do need to
[56:11] have, you know, a billion dollars. You
[56:13] need to have some reasonable fund to go
[56:16] be bu build something like this because
[56:17] that's how much it costs. But that's
[56:19] that's just what it takes.
[56:20] >> I mean, last I checked, we've got a what
[56:21] $40 billion endowment here. How would
[56:23] you put that to use if you were if you
[56:24] were stepping cut a billion dollars of
[56:26] it right away and give it to somebody as
[56:28] a cloud service and have every single
[56:30] student and every researcher here uh
[56:32] have access to to uh to uh uh AI
[56:35] supercomputers. I would do that right
[56:37] away. Now, of course, of course, you've
[56:39] got to go plan things. You don't if you
[56:41] want to buy a billion dollars worth of
[56:42] tomatoes, you don't show up to the
[56:44] grocery store and hire and then and then
[56:46] and then they don't have a billion
[56:47] dollars of tomatoes and you go, "Aha,
[56:50] you're withholding tomatoes from me.
[56:54] That's just ridiculous. And so so you
[56:57] know, so you got to do some planning.
[56:58] And so what you got to do is you got to
[57:00] say, "Next year we need to have a
[57:01] billion dollars worth of computing for
[57:02] Stanford and and uh we'll go build it."
[57:07] >> All right. You know what? We'll move on.
[57:08] But thank you for that.
[57:09] >> Yeah.
[57:11] >> Yeah. Yeah. EXACTLY.
[57:17] >> We'll come back to that one.
[57:20] >> Yeah. What is the best and worst part of
[57:22] your job? When you're when you're CEO of
[57:24] a company, you you have the benefit you
[57:26] have the benefit of of a lot of uh
[57:28] really fun things. Like for example, you
[57:30] you're really the person who has to
[57:32] conceive of the intersection between
[57:35] vision and strategy and execution. Okay?
[57:37] And so so you have to live in that in
[57:39] that world. And it gives you and when
[57:41] you're a company with capability and I'm
[57:43] surrounded by amazing computer
[57:44] scientists and many of them from from
[57:46] Stanford and when you're surrounded by
[57:48] people like that when you have a vision
[57:50] it's very realizable and because you're
[57:52] with amazing people your vision is more
[57:54] ambitious. Okay. And so so I think I
[57:57] think that's the fun part. The not fun
[57:59] part. So and so that fun part I get to
[58:02] do almost all the time. I'm always
[58:04] constantly um updating my my my view of
[58:07] the future and my vision of the future
[58:09] and and what our role in it and and how
[58:12] we ought to reinvent ourselves so that
[58:13] we could you know contribute more to
[58:15] that future or or go invent that future
[58:17] in the first place and and so as a CEO
[58:19] you have you get to live in that world
[58:21] and that's fun. You're it's very
[58:23] imaginative. It's very strategic. It's
[58:26] you know highly complicated. There's no
[58:28] right answer. uh and in a lot of ways
[58:31] it's it's creativity at at at its most.
[58:34] Okay. On the other hand, what comes with
[58:37] that power is the responsibilities for a
[58:40] bunch of people who joined you in that
[58:42] spaceship that joined you in that in
[58:44] that vessel and they want to be they
[58:47] want to help you create this future and
[58:49] they're part of your team and you feel a
[58:51] deep responsibility for their
[58:53] well-being. And so when the company's
[58:55] not doing well or the company in the
[58:57] older days, you know, when we were in
[58:59] the beginning trying to find our way,
[59:01] uh, we probably nearly went out of
[59:03] business, you know, four or five times.
[59:05] I mean, literally almost went out of
[59:07] business and we were on fumes or or
[59:09] we're really flat on our back. And so
[59:12] during those times, it's embarrassing,
[59:15] it's humiliating, it's hard. Um, you
[59:18] don't know what the answer is. Often
[59:19] times you're in the dark. Uh, you're
[59:22] afraid. uh you know all of those the
[59:24] feelings that that we have as humans
[59:27] just multiplied by you know a thousand a
[59:30] million and and uh uh you know when
[59:33] you're a public CEO uh your face is
[59:35] always out there and when you do well uh
[59:38] people are happy when you don't do well
[59:39] they're fast to tell you and and um and
[59:42] so you're you know and so it's a
[59:45] vulnerable you know for me it's it's a
[59:47] highly vulnerable profession and and so
[59:51] Yeah, you're not naked, but you feel it.
[59:54] You know,
[59:54] >> question is, what's the biggest mistake
[59:56] you made in the early days of Nvidia and
[59:57] what you learn from it?
[59:58] >> Um,
[01:00:00] let me let me give you an example of of
[01:00:03] what somebody might say and and I will
[01:00:05] say I I won't I'll say that that's not
[01:00:08] and so so anybody who knows our history
[01:00:10] would know that the first generation of
[01:00:12] our products uh the architecture the
[01:00:15] technology we used was completely wrong.
[01:00:19] It's not like a little bit wrong. is
[01:00:21] like completely wrong. The fact that
[01:00:23] that that smart engineers and
[01:00:26] professionals and we were actually
[01:00:27] funded and we created this thing and
[01:00:29] it's like check it out doesn't work at
[01:00:31] all you know and so uh that that using
[01:00:36] curved surfaces instead of triangles no
[01:00:38] Zbuffer instead of Zbuffer forward
[01:00:41] texture mapping instead of inverse
[01:00:42] texture mapping we did everything wrong.
[01:00:44] We did everything wrong. No floating
[01:00:46] point inside. We did everything wrong.
[01:00:48] And so we made a lot of tremendously bad
[01:00:50] choices. Um,
[01:00:53] and I I'll say that that uh those are
[01:00:56] technical bad choices, but it led to
[01:00:58] strategic
[01:01:00] genius moves. Um, how do you take a
[01:01:03] company that um had that reputation and
[01:01:07] wasted a bunch of money and a bunch of
[01:01:08] time two and a half years doing it the
[01:01:11] wrong way and surrounded by competition
[01:01:13] and now here we are
[01:01:16] the only one remain. Okay. And so so
[01:01:19] that that transformation taught me a lot
[01:01:21] about the importance of technology is
[01:01:24] important but strategy is so important
[01:01:29] and so how you see the world uh how you
[01:01:31] approach competition how do you approach
[01:01:33] the market uh how do you conserve
[01:01:35] resources and apply resources th those
[01:01:39] decisions um I learned more in my early
[01:01:42] 30s through that deep failure I and the
[01:01:45] company almost vaporizing I learned so
[01:01:48] much about strategy and strategic
[01:01:50] thinking and and maneuvering and things
[01:01:52] like that and it's lasted a whole whole
[01:01:54] long time. The mistake that I made that
[01:01:57] I I would say um was a genuinely
[01:02:01] straightup mistake is when the PC
[01:02:06] or or when mobile devices took off uh we
[01:02:09] were approached by very important
[01:02:11] companies that that are in important in
[01:02:13] the mobile space uh to work on some
[01:02:15] mobile devices and and um
[01:02:20] I
[01:02:22] and the choices that that I made Um I I
[01:02:26] think the answer when they approached us
[01:02:30] the answer should have been nah not
[01:02:32] interested
[01:02:34] but we decided to shift a bunch of our
[01:02:37] resources to go build mobile devices and
[01:02:40] um I and I thought that we could add a
[01:02:42] lot of value but it turn you know I
[01:02:44] think if I would have thought through it
[01:02:46] a couple more clicks uh the amount of
[01:02:48] value you could really deliver in for
[01:02:50] for the things that we know how to do
[01:02:52] and what we're good at is probably
[01:02:54] marginal. at best. And so, uh, I shifted
[01:02:56] the company to go into mobile devices.
[01:02:58] Uh, it grew into a billion dollar
[01:03:00] business and and that kind of positive
[01:03:02] reinforcement. And then shortly after,
[01:03:05] uh, during the 3G to 4G transition, uh,
[01:03:08] we were just 100% locked out and and,
[01:03:11] um, uh, Qualcomm was the leader in that
[01:03:14] 3G to 4G modem mode and that's the most
[01:03:17] important part of the phone. Not the
[01:03:19] SoC, not computer graphics, not even the
[01:03:21] application processor. The phone is
[01:03:23] obviously the most important part. And
[01:03:25] so during that transition, uh, they were
[01:03:27] able to block us out, I could have
[01:03:29] probably called it, you know, to to if
[01:03:32] you if that circumstance were to happen
[01:03:34] again. I would have said, "Yeah, it's it
[01:03:36] it would be a really interesting
[01:03:38] opportunity for a couple years, but
[01:03:39] we're going to get shut out after that,
[01:03:40] so what's the point? Why? Let's go
[01:03:42] conserve our resources somewhere else."
[01:03:44] But the the recovery, so we got shut
[01:03:46] out. We built it up to about a billion
[01:03:48] dollars and it went back to zero. But
[01:03:50] the recovery was I took all of that
[01:03:52] expertise that extreme low power and
[01:03:54] energy efficiency expertise and I
[01:03:56] shifted all to um an application that
[01:04:00] didn't exist at the time called
[01:04:01] robotics. And so all of the the somebody
[01:04:04] mentioned Thor uh Thor is the great
[01:04:07] great great great grandson of the chip
[01:04:09] that we were using um uh in mobile
[01:04:11] devices. and that that entire
[01:04:14] genealogy and all the teams and all the
[01:04:16] expertise that we we built up was really
[01:04:19] helpful to getting here. And so it
[01:04:21] doesn't
[01:04:22] that's rationalization.
[01:04:25] Um going into that market in the first
[01:04:26] place was a waste of time and so that
[01:04:28] that I think is a strategic mistake. Um
[01:04:31] on strategy is there you know sometimes
[01:04:34] strategy is about forecasting sort of
[01:04:36] precisely enough. Is there uh from a
[01:04:39] systems perspective, what do you think
[01:04:40] you've updated your priors on or what
[01:04:42] what is the forecasting mechanism you've
[01:04:44] developed to give yourself some
[01:04:46] confidence that like this fog of war
[01:04:47] here? Don't quite know where things are
[01:04:49] going to go, but generally speaking,
[01:04:51] we're like shooting in the right
[01:04:52] direction. Is there any is there sort of
[01:04:53] a systems piece of systems design advice
[01:04:56] you'd give folks on when the shape of
[01:04:58] the future is not entirely clear?
[01:05:00] >> Yeah, and and in fact in fact you used
[01:05:02] all this the right words already. Um the
[01:05:05] first thing I do is is I what am I
[01:05:07] observing? What am what am I observing?
[01:05:10] And um based on what I observe, uh let's
[01:05:13] reason about it back to first
[01:05:15] principles. Break it all back down and
[01:05:17] ask ourselves uh so what's going to
[01:05:20] happen next and first so what is this a
[01:05:23] big deal? Hey, deep learning, computer
[01:05:24] vision, AlexNet, you know, big deal. Is
[01:05:27] that a big deal or not a big deal? And
[01:05:29] so the big deal part of it is my
[01:05:31] goodness. uh in just one you know here
[01:05:34] here's two engineers right Alex and and
[01:05:37] Ilia and uh and and Hinton of course and
[01:05:40] they came up with a neuronet network
[01:05:41] model and boom it crushed the the
[01:05:44] computer vision capabilities of all the
[01:05:46] computer scientists you know decades
[01:05:48] before them in one shot and so is that a
[01:05:50] big deal is that a big deal um the the
[01:05:54] the step up in in quality and
[01:05:56] performance was a big deal now the next
[01:05:58] question is so what's going to happen
[01:06:00] next how far can you take And then if
[01:06:02] you could do it in this way, what else
[01:06:04] can you what else can you solve? Um and
[01:06:06] if if this was able to solve some really
[01:06:08] amazing problems, uh what does that mean
[01:06:10] to computers and computing? And so you
[01:06:12] just keep asking yourself these
[01:06:13] questions, right? And so you just
[01:06:15] iterating it like that all the way to
[01:06:16] first principles. And then from that you
[01:06:19] create a mental model about the future
[01:06:21] of computing and uh where is it going to
[01:06:24] be? What can it do? For example,
[01:06:25] self-driving cars and robotics. um this
[01:06:28] uh how large would models become and uh
[01:06:31] if if so what would computers look like?
[01:06:33] uh what with processing neuronet
[01:06:35] networks how's that different than
[01:06:36] processing you know floatingoint numbers
[01:06:39] and integers and first principle
[01:06:41] mathematics you know we express
[01:06:42] everything in FP64 FP32 but obviously
[01:06:45] neuronet networks don't have to do that
[01:06:46] and so so you you reason through it kind
[01:06:48] of like this and then you build up a
[01:06:51] mental model of a future you know of the
[01:06:53] future and then where your your company
[01:06:55] where you are going to be within it and
[01:06:57] then you just work backwards from there
[01:06:59] and and then and then now the question
[01:07:01] of course is you could be wrong and
[01:07:03] often times you're, you know, if you
[01:07:05] reason about things properly, you're not
[01:07:07] completely wrong, but you're not
[01:07:08] completely right. And so I tend to I
[01:07:11] tend to be very comfortable saying,
[01:07:13] okay, these are the things that that
[01:07:16] will likely happen and these are things
[01:07:18] will absolutely happen and these things
[01:07:20] may happen. And based on that, I think
[01:07:22] we got to go in that general direction.
[01:07:24] We'll feel our way through. And now that
[01:07:26] now that the the skill of of building
[01:07:28] companies then of being successful along
[01:07:30] the way is you're going into this
[01:07:32] direction and it's going to take energy.
[01:07:34] It's going to take time. It's going to
[01:07:36] take money and and everything that time,
[01:07:39] energy and money that takes away from
[01:07:40] something else, right? So the cost the
[01:07:43] the the opportunity cost of pursuing a
[01:07:46] strategy is the real cost. And so you
[01:07:49] just got to ask yourself how can you be
[01:07:52] smart enough such that the opportunity
[01:07:55] cost is reduced and your optionality is
[01:07:58] increased. And so you're trying to think
[01:07:59] through all of that stuff all the time.
[01:08:01] You know it's no simple answer but but
[01:08:04] um in a lot of ways uh you're trying to
[01:08:06] get the journey to pay for itself
[01:08:10] given uh everybody's going to mob you
[01:08:12] for more signatures. That's where we're
[01:08:14] going to end.
[01:08:15] >> Thank you. Thank you very much.