# Class #2 | MS&E435: Economics of the AI Supercycle Stanford University Spring '26 Apoorv Agrawal

https://www.youtube.com/watch?v=4faCRNl9Bi4

[00:00] The premise of the class today, we're going to talk about everybody knows how software ate the world.
[00:06] Software produced had near zero incremental cost of distribution.
[00:11] That is not the case with AI.
[00:12] More users on AI apps require a lot of compute, and so they're not it's not near zero.
[00:20] The that's the topic of our discussion.
[00:22] We're going to do a presentation by the by the group.
[00:25] We're going to do a fireside chat, and then we'll open it up for questions.
[00:30] So, without further ado, I'm really excited for today's guests.
[00:33] Our first guest is Brad Gerstner.
[00:36] Brad is the founder and CEO of Altimeter.
[00:37] Brad started Altimeter in 2008 with a few million dollars from friends and family.
[00:44] Today, 18 years later, Altimeter manages over $15 billion across public and private markets.
[00:50] You know, Brad, I've known you for a little bit, and the single consistent thing I've known about you is that the best investors have invested across super cycles, across up markets, down
[01:01] markets, recession, crisis, COVID.
[01:04] And Brad has done all of that and more.
[01:06] Brad started his career, trained as a lawyer,
[01:08] helped start General Catalyst back in the dot-com era,
[01:10] started a couple of businesses after that,
[01:12] Altimeter being the fourth.
[01:15] And at Altimeter, you know, early in the internet, early to Google, early with mobile and and meta and and many others, early to cloud and software, led our investment in Snowflake, Confluent, GitLab.
[01:31] And now with AI, one of the largest investors in in OpenAI and Anthropic, which I know you guys from last week loved, in Nvidia, and he's on the board of Cerebras, and led investment in Groq, which we're going to get into deep today.
[01:44] And outside all of that, perhaps the most important movement that Brad has started is Invest America.
[01:50] Can I get a quick show of hands?
[01:51] How many of you have heard about Invest America?
[01:54] Wow, look at that.
[01:57] Got a lot Got a lot lot of opportunity it looks like.
[01:59] That's right.
[02:00] In brief, Invest America is a
[02:03] a federal legislation creating an investment account at the time of birth for every child born in America.
[02:10] The biggest impact Invest America is going to have according to me is independence away from dependence from our state and making every child in America an owner of our economy.
[02:21] Brad, I have the great honor of calling you my mentor, coach, and partner.
[02:23] Thank you so much for doing it.
[02:25] Please join us.
[02:27] It's great to be here.
[02:29] Thanks for having me.
[02:31] And you know, a special thanks to Dr. Goel for green-lighting the class.
[02:37] I think this is a really important one and you know, I'm lucky enough to have my son junior in high school sitting here today and you know, in in a lot of schools today in particular colleges on the East Coast as well, you know, there's this people don't really know to do with AI.
[02:50] And I say all the time you got to make yourself bionic with AI, right?
[02:55] Like you can't consume enough AI today because it doesn't you know, it used to be you go to this school, you get a job with a place like Grok or
[03:04] Altimeter and today I don't really care where you went to school.
[03:08] I want somebody who shows up and delivers abnormal value, bionic value, right?
[03:14] And the way you do that is going to be leveraging the latest technology.
[03:17] So I'm glad that you are enabling the students to sit at the intersection of such important topics.
[03:22] You know, I'm going to introduce Sunny and then I'm going to share a couple slides and then I'll invite Sunny up.
[03:27] But you know, I was thinking Sunny and I been great friends for a long time.
[03:31] You know, we we play we we're going to play poker tonight in the all-in poker game here in Silicon Valley.
[03:38] So we're buddies inside and outside of work, but I was thinking about the introduction and then I asked ChatGPT and Claude to give me an introduction and you know, ChatGPT's wasn't great to be perfectly honest and Claude's blew me away.
[03:48] So I I I'd just read to you, you know, what what what Claude had to say about Sunny.
[03:55] So, our next guest is a serial entrepreneur who apparently can't stop getting acquired by bigger and bigger companies.
[04:01] And honestly, the trajectory is incredible.
[04:05] He co-founded Xtreme Labs, a mobile development shop acquired by Pivotal.
[04:09] Then he co-founded Automa- Autonomic, a smart mobility platform acquired by Ford, which he became the VP running Ford X, their internal in- innovation lab.
[04:19] Then he co-founded Definitive Intelligence, that was acquired by Grok, where he helped where he became president and helped launch Grok Cloud.
[04:27] And then Nvidia bought the platform, of course, recently for $20 billion, their largest acquisition ever.
[04:31] So, if you're keeping score at home, Pivotal, Ford, Grok, Nvidia, the man's career is basically a SPAC that only goes up, according to Claude, unlike Chamath, okay?
[04:46] He has a computer engineering degree from the University of Ottawa, which proves that even Canadians can disrupt things when they put their minds to it.
[04:55] Please welcome Sunny Madra.
[04:57] Thank you.
[05:03] So, I'd like to share a couple slides
[05:05] to to set the context for the for the moment that we're living in.
[05:10] And inference, the conversation we're going to have today is really a subset of this important conversation.
[05:17] But but this is global GDP per capita over the course of the last 2,000 years.
[05:23] All right?
[05:24] And if you look at that, you realize that basically for 1,800 years, nothing happened.
[05:29] It was survival.
[05:32] Right?
[05:32] There was no excess productivity from a fixed amount of labor and capital, right?
[05:38] It was what we could use to survive.
[05:40] And then all of this stuff starts happening in the 1800s and 1900s.
[05:44] The number of years it takes to double GDP, and think like, I like to say GDP is what creates the excess in life for enjoyment, right?
[05:52] It's beyond survival.
[05:56] Right?
[05:56] The surplus that we all have.
[05:58] And so, the number of years it takes to double GDP, right?
[06:00] Has plummeted and now we're doubling global GDP or you might think of it as quality of life every 25
[06:08] years.
[06:09] But you may say, "Well, Brad, what does GDP have to do anything?"
[06:12] Well, it has to do with everything.
[06:13] So, what happens when you have higher rates of of GDP?
[06:18] You have lower rates of poverty.
[06:19] You have higher rates of basic education.
[06:21] You have higher rates of literacy.
[06:23] You have more democracy, more freedom, higher rates of vaccination, you know, lower child mortality.
[06:30] So, it turns out that innovation in and of itself is a societal good.
[06:36] And it happens to be correlated and accelerating.
[06:40] So, technology as an investor has gone from 5% of global GDP to about 13% of global GDP.
[06:45] And if I asked you guys 10 years from now, are we going to be at below 13% of global GDP or above 13% of global GDP?
[06:55] I think you would all say that technology as a percentage of global GDP is going to be a lot bigger number.
[07:02] We're sitting in the heart of Silicon Valley.
[07:04] Technology out-earns non-technology.
[07:07] So, the the dotted blue
[07:09] line here is this is the NASDAQ has compounded earnings per share at 15% for the last 10 years compared to 6% for non-tech companies.
[07:21] So, why do technology companies tend to be better investment than non-tech companies?
[07:25] Because they compound their earnings per share faster.
[07:27] Again, I think we'll be accelerated by AI.
[07:32] And of course, AI is going to massively accelerate all of this because when we look at all the knowledge work in the world, the TAM for it is measured in the trillions.
[07:43] Demis said it well.
[07:46] It'll be 10x the impact of the industrial revolution, but happening at 10x the speed, probably unfolding in a decade rather than a century.
[07:55] So, I think that is the context.
[07:57] Like, what we're doing here and the acceleration that will come with AI should be something that's better for all of society.
[08:03] We're going to have to talk about the guardrails and the societal change we're going to have to make to to to be that.
[08:08] But, sitting at
[08:11] The very root of all of this is compute.
[08:14] You know, you guys all know that the atomic unit of AI or intelligence is the token, right?
[08:22] And there's nobody better able to talk about the production um of this atomic unit than Sunny.
[08:30] And so, Sunny, I want to kind of go back in the wayback machine a little bit.
[08:34] You know, tell us what Grok is, right?
[08:38] And what your observations were in 20 '23 and '24 about what was going to happen um with inference.
[08:50] Yeah, uh so, a little bit of a quick background.
[08:52] So, Grok was founded by uh Jonathan Ross.
[08:55] Jonathan Ross was the creator of the TPU at Google.
[08:58] Um and Jonathan Ross's background is interesting.
[09:01] He was a high school dropout.
[09:03] Uh not cuz he couldn't complete it cuz it was like probably too boring for him.
[09:06] And went straight from being a high school dropout and probably
[09:11] complete his complete his GED or something.
[09:14] He went straight into like a PhD math program at NYU.
[09:17] Um and then gets recruited into Google and like every great engineer over the last 20 years was made to work on like ad optimization or ad testing, which is terrible um in some ways.
[09:28] But, what he did was um he listened to a talk by Jeff Dean.
[09:32] And Jeff Dean had come in and basically he "Hey, um good news, bad news.
[09:36] Good news, we've I think we've found an algorithm to solve automatic speech recognition, which could be useful in many places.
[09:42] Bad news, there's not powerful enough compute, so we can never run it."
[09:45] And Jonathan took it amongst himself to come up with a design, um and coming, you know, from a completely different area, a design using an FPGA for the first version of what became the TPU.
[09:58] And then ultimately, you know, Jonathan left Google because he thought he, you know, the rest of the world should have this, it shouldn't just be embedded inside Google.
[10:04] And so he left and started it.
[10:06] And so quickly, you know, what Groq is, um and it continues to be inside Nvidia
[10:12] as well, it's a chip that's the designed with a data flow architecture.
[10:16] And what makes it very significantly different than any other computer architecture is that it's fully deterministic.
[10:20] So hand in hand with the architecture is a compiler, and a compiler which predetermines where all the calculations are going to happen.
[10:30] And that last bit is really important because the, you know, the underlying thing to any AI problem in token generation is lots and lots of math.
[10:38] And that's why we're seeing this compute explode.
[10:40] And I I, you know, implore everyone to go look at the following, uh you know, we talked about one of Brad's great investment, Snowflake.
[10:47] Snowflake is a, you know, database retrieval company, right?
[10:50] And so you have to go get a record and bring it back.
[10:53] And if you look at the number of cycles it takes to do that in a compute cycles, you can really see and it's not a really large amount.
[10:59] But you look at the number of tokens it takes to generate a number of compute cycles or flops it takes to generate a single token, it's mind-boggling.
[11:06] And the best way to think about it is it's usually the parameter size of the model times
[11:13] the context length squared.
[11:15] Right? And so that's for each token.
[11:17] And, you know, you're doing something and have lots and lots of tokens.
[11:20] So we're in this era where we have this incredible technology, but it's incredibly compute intensive, several, several orders of magnitude larger than any other computing paradigm we've had before.
[11:30] You're at your own startup, right?
[11:32] You have a conversation with Jonathan about merging.
[11:36] I was an investor in in Cerebras, which is also building a fast inference chip.
[11:39] Groq was building a fast inference chip.
[11:43] These two companies had been in existence for upwards of 10 years.
[11:47] Yep.
[11:48] Um and the extraordinary thing is like in year nine, they're they're they're both fighting for survival.
[11:54] They're not thriving.
[11:56] Right? Like it they're they're they're they're building for a market that didn't really exist, okay?
[12:03] But you saw something, you know, Jensen came on my podcast BG2 and he said, "Everything just changed."
[12:09] I said, "What do you mean?"
[12:10] He said, "Inference time reasoning."
[12:12] He said, "We've gone from
[12:14] pre-training models to inference time.
[12:16] reasoning and inference is about to 1 billion X.
[12:20] So, not 10 X, not 100 X, not a million X, it's going to a billion X and our systems of compute are not designed for what's coming.
[12:27] I remember you and I had a conversation and shortly thereafter, you helped broker, you know, kind of this vision for Jonathan that said, "Jonathan, I think I see your future more clearly than you do."
[12:38] So, tell us about that moment.
[12:40] Yeah, I think, you know, at that moment a couple things are happening, right?
[12:43] So, um the market had been dominated by Nvidia um because Nvidia is what the researchers used to create the models.
[12:50] And so, naturally, you know, in part of creating a model, inference is the forward pass of training, right?
[12:58] And the back prop, that's what's different.
[12:59] And so, um you're always doing inference when you're when you're creating models and so, it's very natural to just run it on the same hardware that you've created the model on.
[13:07] And so, one of the things that, you know, we saw with the Groq architecture was that we could complete inference much more efficiently, right?
[13:14] And so, if you look at our V1 chip,
[13:18] which we, you know, put into the cloud and I'll get back to in a second, that's a silicon designed in 2018, silicon from 2019, 14 nanometer, and super competitive against Hoppers, right?
[13:30] Which is, you know, five generations newer in terms of silicon technology.
[13:34] And so, really what we saw was we thought it would be very difficult for convincing people to buy our hardware and use it, but if we built the cloud and put it in the cloud, developers really don't care, right?
[13:44] If there's an API, and developers, you know, we've seen they're quite fungible there.
[13:48] So, our big insight was take these things, start putting them in the cloud, run the data centers, and make them available via an API, and make the best open-source models available for everyone, and even including OpenAI models, like OpenAI Whisper, which was open-source from the beginning.
[14:04] And so, we had put a lot of those models there, and that's what really took off, and we launched the cloud, and within a few weeks we went to a couple hundred thousand users, and today it's like something at 4 million users, and it took Nvidia almost 17 years to get to,
[14:17] you know, 7 million users.
[14:19] So, effectively, reasoning models come along.
[14:23] Reasoning models have are much more voracious in their token consumption.
[14:28] This is even before we get to agents.
[14:30] This is just deeper thinking than what, you know, one-shot pre-trained models were doing.
[14:36] And so, when you looked at token consumption curves, they were just going parabolic, and our hardware, our clouds were starting to break.
[14:45] You know, OpenAI only had a gigawatt of compute, Anthropic only had a gigawatt of compute.
[14:50] So, we had to figure out how to make both more token-efficient models, but also more token-efficient architectures.
[14:57] So, now remember, Cerebras and Nvidia were big-time competitors.
[15:02] Nvidia and Groq were perceived as big-time, you know, competitors.
[15:07] So, Sonny, you sent me a text and said, "I have an idea."
[15:11] We we We at the time were major shareholders, and still are, in Nvidia.
[15:16] And um and and good friends with
[15:19] Jensen.
[15:21] And you had an idea, and the reason I want to point this out is just like how one person's idea, you know, we see these big transactions, but sometimes we don't unpack like that it's just one decision on one day that causes these things to occur.
[15:35] So, what was your you know, you had a vision that was pretty orthogonal when thinking about Nvidia.
[15:42] Tell me tell us about how that came to be.
[15:44] Yeah, and you know, Brad, you did lead that email, so that that was awesome.
[15:51] But basically, when we were looking at the problem of inference, even as Groq, what became obvious to us is if you started to dissect how inference works, there's first a dissection which happens between say prefill and decode, right?
[16:04] And so, many people were starting to do that where you basically, you know, use a separate set of machines for, you know, prefill and another set of machines for decode, and you can basically get some efficiency a lots of efficiency by doing that.
[16:16] What we further did, and this is like a good lesson for everyone, we further
[16:20] looked at, you know, prefill and decode.
[16:22] and within the decode we realized that we could disaggregate the decode.
[16:26] Because within the decode there's many different functions that are happening,
[16:29] and some of those functions are compute intensive,
[16:30] and some of those functions are memory bandwidth intensive.
[16:32] And so, one of the big differences with Groq over a GPU is GPUs have lots and lots of compute and lots of external memory,
[16:42] which you know, is HBM for them, which is slower.
[16:44] We don't have a lot of compute on Groq chips, we have a lot of SRAM, and that SRAM is very high bandwidth, almost more than an order of magnitude faster.
[16:54] And so, typically on a CPU you'd see that as like your L1 cache.
[16:55] But we have you know, we have lots of that in our chips, and so, when we looked at the problem, and what the email to to Jensen was about was basically connecting to their chips via something they call NVLink.
[17:05] So, Nvidia chips speak to each other via protocol called NVLink, and that allows you to basically not run something on a single GPU, you can run it on lots and lots of GPUs together.
[17:14] I think today we have 72 you can do, and we're scaling up to 576.
[17:20] Groq has a similar protocol, and we've
[17:21] been running thousands of chips together.
[17:22] In fact, we had many models that we were running on 4 to 8,000 chips at a time.
[17:26] So, basically, NVLink Fusion was a way for us to allow our chips to speak to the NVIDIA chips so we could take part of the problem, which we knew the Groq chips were faster at and more performant at, and run it there.
[17:36] And the net result of all that is if you take the same footprint of power, you can get 2 and 1/2 times more tokens out by basically, you know, combining those two systems together, which in today's world of, you know, constrained compute is really valuable.
[17:50] So, Sunny sends me a text, and he said, "I think we can partner with NVIDIA."
[17:58] That in and of itself is a pretty big change because if somebody's your your your chief competitor, you know, the idea that you can partner with them is is a pretty big change.
[18:05] He said, "Would you mind sending Jensen, you know, a text?"
[18:09] And I'm thinking to myself, "Man, you know, I I'm going to spend some political capital with Jensen, so I like I I need to know that this isn't a crazy idea."
[18:16] And so, I kind of sit on it for a week or something.
[18:19] And then Sunny texts me again.
[18:21] He's like, "Have you sent
[18:22] Jensen, you know, that text yet?
[18:24] And so, I said, "Okay, I'm going to send it to him."
[18:27] And Jensen immediately got back to us and said, "Interesting idea.
[18:30] Like, you know, let's let's let's have a chat."
[18:34] And you guys started working with him, and what was really compelling, I think, to Jensen was you had, you know, obviously, somebody had built a competitive chip, but they had mentally thought about how can we produce a lot more tokens together?
[18:49] So, what what Sunny just said is really important.
[18:51] OpenAI is got a fixed footprint of let's call it a gigawatt that they're going to take in September of Vera Rubin's.
[18:59] In one end of the factory goes power and chips.
[19:03] You obviously have the building all the costs, and out the other end comes tokens.
[19:06] Okay?
[19:08] When they bought Groq for the exact same power footprint, for the exact same building, they're now generating two and a half times the number of tokens.
[19:18] And the constraint we have in the world is power and memory.
[19:21] So, if you can double or triple the
[19:23] amount of tokens for the exact same footprint, it leads to an enormous economic outcome for OpenAI or for Anthropic.
[19:31] And so, you've seen as the demand on inference because of inference time reasoning, and we'll talk next about agents, as the demand for these tokens of intelligence have exploded.
[19:43] And literally, we're consuming tens of trillions of tokens now per week around around the world.
[19:51] We've had to come up with more power, more chips, more of these inputs in order to produce those.
[19:58] And so, it's not just about fast chips, it's also just about fast inference.
[20:01] It's just about the ability to get more tokens into the world in a in a world that is constrained.
[20:05] Yeah.
[20:08] How many days from the time, you know, you showed Jensen a working system?
[20:12] Yeah.
[20:12] How many days from that until him greasing you with $20 billion?
[20:18] Uh probably just over a month.
[20:20] Yeah.
[20:22] Yeah, 30 days.
[20:23] >> Yeah. And Jensen is like, you know, did
[20:26] they have any competitive efforts going
[20:28] on at at Nvidia?
[20:29] >> Yeah, I mean, I I think like the
[20:31] you know, Nvidia and you see it and we
[20:34] talked about it at GTC. Nvidia has an
[20:36] ecosystem already of seven chips in five
[20:38] different racks, right? So, Nvidia is no
[20:39] longer making like a GPU. And I think
[20:42] that's what is one of Nvidia's
[20:44] superpowers that they've started to look
[20:46] at disaggregating the problem in all
[20:47] different ways, whether it's storage,
[20:49] whether it's CPUs, whether it's, you
[20:51] know, compute or networking chips. And
[20:53] so, that already exists. So, they had
[20:55] already thought about building a
[20:57] decode-only chip and something that was
[20:59] powered by a lot of SRAM. Right.
[21:01] >> But I think sort of the and then, you
[21:03] know, it's a good lesson for everyone,
[21:04] like us putting that email in, starting
[21:07] to work together, and building a
[21:08] prototype that was working with their
[21:09] systems was a real proof of concept for
[21:12] them, and and you know, in the in these
[21:14] large systems, making these things work
[21:16] together, making them performant, and
[21:18] this is, you know, across two different
[21:19] companies with two different completely
[21:21] different stacks. I think when they saw
[21:22] that we were able to do that, it showed
[21:24] that, you know, we'd be a good
[21:25] integration. And the last thing I'll say
[21:27] is we're two very different types of
[21:28] companies. I think, you know, if we were
[21:30] kind of making a better GPU, there'd be
[21:32] a lot of conflict within Nvidia, you
[21:34] know, after a the the type of deal that
[21:36] we did. But because we were making this
[21:37] SRAM chip deterministic, compiler-based,
[21:40] which is completely different than how
[21:41] GPUs work, it's very complimentary for
[21:43] the cultures and the engineering teams
[21:45] to come together, as well. How many
[21:47] people in here you have used Open Claw?
[21:51] I mean, that's pretty incredible
[21:53] penetration.
[21:54] I saw a stat, Marc Andreessen may have
[21:57] tweeted this today, you know, that most
[22:00] of the people he talks to are somewhere
[22:01] between a hundred and a thousand dollars
[22:04] now a day. Yeah. On on token consumption
[22:08] with Open Claw. And he said, basically,
[22:11] the next 20 years of Silicon Valley is
[22:13] going to be producing technologies to
[22:15] drive down the cost of intelligence.
[22:18] Right? And so, I want to talk about
[22:20] that, Sunny.
[22:21] If we look at the cost of inference,
[22:24] it's dropped by basically 90% over the
[22:27] course of the last year. It's dropped by
[22:29] closer to 99%
[22:31] over the course of the last two two and
[22:33] a half years. So, talk to us about the
[22:35] the the in like, what's driving the unit
[22:39] cost of inference? And if I take a
[22:42] like-for-like,
[22:44] let's call it a unit of intelligence,
[22:46] >> Yeah.
[22:46] whether it's a basic question I ask, or
[22:49] whether it's a little bit more
[22:50] complicated question I ask,
[22:52] do you expect that unit cost to continue
[22:55] to go down? And if so, why? What are the
[22:57] inputs to that unit?
[22:59] >> Yes. So, the the inputs are, I think,
[23:01] the following three major things. The
[23:03] supply chain, right? Like, you know,
[23:05] what can you do across the supply chain,
[23:07] which is, you know, mostly centered
[23:09] around like Taiwan today.
[23:11] Um you know, TSMC and the different
[23:13] packaging technologies and the
[23:14] lithography technologies that they buy
[23:16] from others. The innovation that your
[23:18] engineers can perform, right? And and
[23:21] what I would say,
[23:23] um is like the
[23:25] you know, the amount of power you have,
[23:27] right? And so, like, those are the kind
[23:28] of things we're talking about. And so,
[23:30] what what we see today is,
[23:33] um
[23:33] you know, lithography technology is
[23:36] starting to reach a limit, right? We're
[23:38] not we're not going as fast as we used
[23:39] to, right? And so, we're not getting
[23:41] sort of the the Moore's law. But, so we
[23:43] have to exceed that. So, we're exceeding
[23:44] that in a couple different ways. We're
[23:46] exceeding that by making bigger and
[23:47] bigger chips. And so, if you see that,
[23:49] you know, these chips become quite
[23:51] large, um now, which is very exciting,
[23:54] but also leads to a lot of
[23:56] interesting issues. Cerebras, you know,
[23:58] as Brad was talking about, you know,
[23:59] their chips are kind of size like a
[24:00] pizza box, right? Versus, you know, the
[24:03] CPUs you guys all would have seen. And
[24:04] so, there's a lot of energy and
[24:06] technology there. It's like, how big of
[24:08] a package can you make and how much
[24:09] silicon can you pack in there? Then,
[24:11] there's the innovations. And so, the
[24:12] innovations is really where we're seeing
[24:15] most of this work happen, right? Because
[24:17] that's hand in hand with the models,
[24:19] right? You know, we're seeing this
[24:21] really interesting force today. And
[24:23] there was a bunch of stuff that I don't
[24:25] know if it was leaked or put out there.
[24:26] And Elon is at at the center of some of
[24:28] that, which is they're discussing like
[24:30] these newer models are approaching like
[24:31] 10 trill like 1 trillion to 10 trillion
[24:33] parameters. And those 10 trillion
[24:35] parameter models go back to that first
[24:36] thing I told you. That's in the
[24:38] fundamental flop calculation of how how
[24:40] much compute it takes to generate a
[24:42] token. So, as fast as companies like us
[24:45] are making better and better technology,
[24:48] you know, through lithography upgrades,
[24:50] through memory bandwidth upgrades, um
[24:52] through innovation and you know, how we
[24:53] lay out our circuits, through
[24:55] quantization efforts, NVFP4 was another
[24:57] one. The models are getting bigger and
[25:00] then the demand is increasing. So it's a
[25:02] I'm going to put it back to you. There's
[25:03] this like this three kind of it's a
[25:05] cube, but it's it's really difficult to
[25:08] navigate right now because all the
[25:10] factors are growing in ways which are
[25:13] really challenging. So the demand keeps
[25:15] going up, the models keep getting
[25:17] bigger, and as fast as we're innovating,
[25:19] even if we get a 50x over 5 years, the
[25:21] models of the demand are going faster.
[25:23] And that's why we're seeing, you know,
[25:25] this unique phenomenon like H100 prices,
[25:27] if you're building a startup or using
[25:28] them, they're going up in fact. Yes.
[25:31] Like one of the things that I think is
[25:33] important for everybody to understand, I
[25:36] mean
[25:37] when OpenAI and Anthropic started, their
[25:40] their gross margins, right, on the
[25:43] businesses were highly negative.
[25:45] Okay? So that's a scary thing to do. Go
[25:48] raise a lot of money
[25:49] and it basically produce a a widget for
[25:53] a dollar and you're selling it for 20
[25:55] cents, and you have a big negative gross
[25:57] margin.
[25:58] But why was that, right? They were going
[26:01] out and they were charging you all to
[26:03] use ChatGPT or
[26:05] you know, their APIs were charging a
[26:07] certain amount of money.
[26:08] They weren't that capable, so there was
[26:10] only so much money we were willing to
[26:12] pay, right? And 2 years ago, the cost of
[26:16] inference was a lot higher.
[26:18] But the bet they were making is that the
[26:21] cost of inference would come down a lot
[26:24] and your willingness to pay would go up
[26:26] a lot as intelligence got a lot more
[26:28] valuable.
[26:30] So, you know, I like to think of the
[26:31] first inning of AI was just getting to a
[26:34] place where we could yield answers,
[26:37] right? In code generation, it was
[26:39] basically like auto complete, tab
[26:41] complete. In the case of ChatGPT, it was
[26:43] like, you know, basically telling a
[26:45] slightly better version of Google. But
[26:47] now we're entering into this phase of
[26:50] action.
[26:51] Right? Where agents do things on Go
[26:53] build me an app. Go build me a website.
[26:56] Figure out how to resolve this customer
[26:58] service problem. Sell more of my
[27:00] product. Find a cure for cancer. Book me
[27:03] a hotel in New York. It starts doing
[27:05] things. And when it does things, the
[27:07] amount of tokens it has to consume in
[27:09] order to do those things explodes by an
[27:12] order of magnitude, but the value
[27:14] delivered to the end consumer as a unit
[27:17] of of of intelligence goes up by a 100
[27:20] X. So, your willingness to pay goes up
[27:22] dramatically.
[27:23] >> Can I add one to that?
[27:24] >> Yes. You know, this week we saw Mythos,
[27:26] which is the unreleased model by
[27:28] Anthropic, find a bug in BSD, which, you
[27:33] know,
[27:34] think about how many engineers and
[27:36] software developers and everyone else
[27:38] and, you know, companies using that have
[27:39] looked at that code. So, we've gone to a
[27:42] place where it's doing things beyond
[27:43] human capability.
[27:44] >> Correct. Which is And we're in we're in
[27:46] year three.
[27:47] >> Exactly. We're in year three. To give
[27:49] you another you know, like what's my
[27:51] best evidence to convince you of the
[27:54] value of AI? Well, my best evidence is
[27:56] that Anthropic in the month of March
[27:58] just added $10 billion
[28:01] in in in annualized revenue in a single
[28:03] month.
[28:04] Okay?
[28:05] That is the total amount of annual
[28:08] revenue for Data
[28:09] Databricks plus Palantir combined.
[28:13] And they added it one month.
[28:16] And they didn't add it because they
[28:17] hired a million sales people, went out
[28:20] to a million companies, and convinced
[28:21] them to buy their product. Right? They
[28:24] added because
[28:26] their product crossed a threshold
[28:29] of intelligent capability that millions
[28:31] of customers around the world said, "I
[28:33] have to have this product to make my
[28:35] company better." The amount that
[28:38] Altimeter is spending went up, but but
[28:40] millions of self-interested actors
[28:42] around the world independently made a
[28:45] judgment, "I have to buy a lot of those
[28:47] tokens, a lot of those capabilities,
[28:49] both Claude code and co-work. And the
[28:51] same thing is happening at Open AI. Um
[28:54] not quite
[28:55] you know, on the same exponential in
[28:57] terms of revenue. But, I think for me,
[29:01] this was a little bit of an Oppenheimer
[29:04] moment. This was a little bit of the
[29:05] splitting of the atom. Like, we've heard
[29:08] Dario and Sam talk about the exponential
[29:10] or the end of the exponential on
[29:12] intelligence. But, the big question was,
[29:15] are they going to be able to afford to
[29:18] continue to build the compute in order
[29:21] to keep up with this. I had this
[29:23] somewhat uncomfortable moment with Sam
[29:25] Altman on my podcast, the BG2 pod, that
[29:28] went a little viral
[29:30] when I asked Sam, "Hey Sam, you've made
[29:33] 1.4 trillion dollars of spending
[29:35] commitments, but you only have 13
[29:37] billion of revenue.
[29:39] So, explain to me how that works. Like,
[29:42] how can you commit to spending 1.4
[29:44] trillion and have 13 billion of
[29:45] revenue?" And I had hoped that Sam would
[29:48] make the case that his revenue was going
[29:50] to go up a lot and these were kind of
[29:51] call options and he could renegotiate
[29:53] them, but instead, he said to me, "Well,
[29:55] if you don't like If you don't like your
[29:57] investment, I'll buy back your shares."
[29:59] Which was was not exactly the response I
[30:02] was hoping for out of out of Sam in the
[30:03] moment. But, that was the question
[30:05] heading into 20 2026.
[30:10] My podcast partner Bill Gurley, a lot of
[30:12] other people highly skeptical saying
[30:14] this is an AI bubble. These guys are
[30:17] spending at rates they're never going to
[30:19] be able to pay the bills on because
[30:21] there aren't people on the other end
[30:23] willing to pay for the products to
[30:25] justify that level of spending. And what
[30:27] happened in January was Anthropic had a
[30:30] three three and a half billion dollar
[30:32] month. In February, they have an eight
[30:34] billion dollar month. And in in March,
[30:36] they have a 10 and a half billion dollar
[30:38] month.
[30:39] That to me said, "Oh, everything's
[30:42] changed. The product is now sufficiently
[30:44] good that you have revenue scaling on
[30:47] the same exponential as intelligence.
[30:49] So they can afford to pay for the $50
[30:52] billion per gigawatt to stand up all of
[30:56] these inference factories to produce all
[30:58] this, you know, kind of collective
[31:00] intelligence. Just react to that, Sunny,
[31:02] cuz our group talks a lot about this.
[31:04] There's a lot of debate in our group and
[31:06] on on the All-In pod and others as to
[31:08] whether or not this was a bubble. Yeah,
[31:11] I'd say there's like kind of a couple
[31:12] things that, you know, maybe the broader
[31:15] world doesn't see yet. One, the models
[31:17] that we see today haven't even been
[31:19] trained on the latest hardware. Whether
[31:20] you want it to be, you know,
[31:23] Blackwalls or Veras are just coming out,
[31:25] right? Or Rubins are just coming out.
[31:27] So we haven't even seen that yet. And so
[31:29] we haven't seen the capabilities that
[31:30] you get. And so we'll we'll start to see
[31:32] that. I think one of the the first ones
[31:34] we'll see is, you know, the stuff out of
[31:35] Elon's Grok, right? So that's A. So the
[31:38] capabilities you're seeing here are
[31:39] things that were done on older hardware.
[31:41] So that's A. So when you're inside the
[31:43] ecosystem, you know what's capable and
[31:45] what's coming next, right? I think B,
[31:47] um one of the things that is really
[31:49] starting to take off and I think
[31:51] Anthropic's done an incredible job here.
[31:53] And I think, you know, Codex has an
[31:55] equally incredible job on very hard kind
[31:57] of software problems is that there's not
[31:59] just a chat interface that majority of
[32:02] people are interacting with. It's not
[32:03] just an API, but they've created like a
[32:05] harness around the models. And those
[32:08] harnesses, Open Cloze is just another
[32:10] harness as well. Those harnesses have
[32:12] figured out how to extract more and
[32:14] continually extract. I think, you know,
[32:16] with Claude Code and Cowerk, you can
[32:19] have it just ping you whenever it's
[32:20] stuck on your phone, even if you started
[32:21] somewhere else. And so it can kind of be
[32:23] in this continuous loop and it's working
[32:24] for you all the time. We've never had
[32:26] anything like that. When it's doing
[32:27] that, you take that token consumption of
[32:30] like the, you know, you were doing a
[32:31] query before and it's doing some
[32:32] thinking and coming back. Now it's just
[32:34] working all night long and pinging you
[32:36] every time. Don't even bother me, keep
[32:37] coming back. So, we're seeing these
[32:38] harnesses really extract more and more
[32:41] tokens out of it as well. And the type
[32:43] of problems that people are solving, we
[32:45] gave the code problem, you put a bunch
[32:46] of other ones, but like, you know,
[32:48] inside big businesses and you know, I
[32:50] tweeted this, I think it's fair, but
[32:51] like, inside Nvidia now we have this
[32:53] thing called the Nvidia personal
[32:55] assistant and it's connected to Slack,
[32:57] it's connected to Teams, which you know,
[32:59] sucks, but
[33:01] uh it's connected to our our email and
[33:03] it's connected to all our, you know,
[33:04] files wherever wherever exist. And so,
[33:07] every morning it you it runs and it like
[33:09] figures out like all your task items for
[33:11] the day. You can have it answer those
[33:13] things and and it's really incredible
[33:15] and so, you start to the the way we work
[33:17] and we're talking about this earlier
[33:18] with someone that's like, you don't even
[33:20] write email now, like the someone else's
[33:22] agent is going in their email emailing
[33:23] you and your agent is looking at
[33:24] emailing them back, but a lot more work
[33:26] is getting done because my time is freed
[33:28] up from basically answering emails all
[33:30] day long and approving things out of all
[33:32] these traditional SaaS systems, the
[33:33] agents handle all that. So, the
[33:35] explosion to your point is just in the
[33:37] first or second inning. The amount of
[33:39] tokens is is really just going up. So,
[33:42] we don't we don't fear that. We don't
[33:44] look at that as like an overbuild in any
[33:45] way, shape, or form. I I I I I think
[33:48] the facts and evidence on the field is
[33:51] number one, the cost of both training
[33:54] and inference, uh but inference in
[33:56] particular is plummeting and continues
[33:58] to plummet. That shouldn't be altogether
[34:00] surprising. Technology ultimately is
[34:03] highly deflationary. I've never seen
[34:05] something this deflationary this
[34:08] quickly. I think it's a byproduct of
[34:10] extreme co-design. It's not a single
[34:12] chip, it's a factory. And across the
[34:14] factory, there are all sorts of Moore's
[34:16] laws playing out combinatorially across
[34:19] the factory. At the same time, when
[34:22] you're able to produce a lot more
[34:23] tokens, the unit of intelligence that
[34:26] you're delivering is much more valuable.
[34:28] So, the willingness to pay on the other
[34:30] end goes up a lot. And I'll tell you for
[34:33] an open AI or an Anthropic today, if
[34:35] those guys were at negative gross
[34:37] margins a year and a half or two years
[34:38] ago, they're now at very positive gross
[34:41] margins. Right? So all of a sudden this
[34:43] business that looked dis-economic looks
[34:46] highly economic today. So it's kind of
[34:49] it's resolved a little bit that
[34:51] question. Maybe just, you know, Sunny, I
[34:54] want to finish
[34:55] our our section with maybe just a little
[34:57] forecast and pre-wire you guys to we're
[35:00] going to open it up to questions. It can
[35:01] be about the economics of inference or
[35:04] any part of the stack or any other
[35:05] questions, you know, that you all have.
[35:08] But you mentioned Mythos. It's a model
[35:11] out of
[35:12] you know, that came out this week was
[35:13] not generally released, uh but was
[35:16] sandboxed by Anthropic. Tried to get out
[35:18] a few times. Yeah, tried it tried to
[35:20] escape the sandbox, you know, trained on
[35:23] kind of TPU 7. On the other hand, you
[35:26] have Spud or 5.5 coming out of Open AI
[35:29] probably this week or next, which is a
[35:32] first Blackwell trained model. Elon's
[35:34] going to have one. Meta's just out with
[35:36] a model, you know, yesterday. Google,
[35:38] etc.
[35:40] Talk us through you get to see into the
[35:42] product pipeline at Nvidia. Is there Do
[35:46] you think that
[35:48] the pace of the you know, the cost of
[35:52] inference curve continuing to come down,
[35:54] do you think that continues for the next
[35:56] several years? Do you think that the
[35:58] step function or the exponential, if you
[36:00] will, of both pre-training and inference
[36:03] time reasoning in terms of improving the
[36:04] algorithmic capabilities of intelligence
[36:07] continues? Well, I can tell you, you
[36:09] know, kind of having a chance to work
[36:10] with Jensen now, like he challenges us
[36:13] in everything we do to not show up
[36:15] unless it's a 100X. So whatever, you
[36:17] know, we bring to him and
[36:20] can't get into too many details, but
[36:21] like his first challenge back is is this
[36:24] a 100X than what you did before. So he
[36:26] is challenging the engineers to take a
[36:28] look at every part of the problem, you
[36:31] know, from all the way, you know, down
[36:32] into memory controllers or memory
[36:34] capacity or you know, circuits, whatever
[36:35] it happens to be, to make sure we 100x
[36:37] everything. So, on the first part of
[36:39] your question, yes, because he pushes us
[36:41] to do it. And he gives us the latitude
[36:43] to do and he gives us the resources to
[36:44] go do it. So, I can tell you like the
[36:46] types of things that we've been enabled
[36:48] in coming in as the Groq team, things we
[36:50] could never do as a startup, but Jensen
[36:52] has enabled us to do those things.
[36:54] >> Are you guys harnessing AI yourselves to
[36:57] design the next generation chips?
[36:59] >> A ton, right? We were doing that even
[37:00] before because we needed to we were a
[37:02] small team, but now we have access to,
[37:04] you know, sort of the entire ecosystem
[37:06] of things that are available. So, I
[37:07] think that's A, right? [clears throat]
[37:09] Is that we're we're being pushed to do
[37:10] it. On on the related side though, the
[37:13] the more we innovate, the more the model
[37:16] makers innovate and the bigger the
[37:17] models get. So, this is this and so,
[37:20] um which means the capabilities that are
[37:22] coming out are better. So, we continue
[37:24] to need that filled out. So, you know,
[37:26] we'll all look back and we'll thank, you
[37:28] know, there's a couple companies that
[37:29] changed the footprint of the internet
[37:31] for us and you could talk more about
[37:32] this than I can even Brad, but like the
[37:34] work that Google did to build the
[37:36] infrastructure they did for video, for
[37:39] search, it really paved the way for the
[37:41] rest of the internet, CDNs, all types of
[37:43] other things, right? And so, a lot of
[37:44] this work that's happening to build out
[37:46] this infrastructure will pay benefits
[37:48] and you need that to continue to happen
[37:50] because it can't just be in the
[37:51] innovation of the chips. Like you need
[37:52] more and more infrastructure to be
[37:54] built. I I I
[37:56] I'll wrap with this.
[37:58] You know, we have the great privilege of
[38:00] talking with Jensen or Elon or Sam or
[38:03] Dario.
[38:04] And you guys all can read about, you
[38:06] know, kind of the personal battles they
[38:08] have between that there's, you know,
[38:11] uh some days uh not a love not a lot of
[38:13] love lost between them
[38:16] uh in the race to, you know, to AGI.
[38:19] But right now, I see amazing uniformity.
[38:23] When I talk to them, they all, in a
[38:25] non-hyperbolic way, say we're there.
[38:28] And we got there faster than we thought.
[38:31] Like we're nearing the end of the
[38:33] exponential. And if you ask Dario,
[38:34] "Dario, what is the most surprising
[38:36] thing to you right now?" He says, "We're
[38:39] almost at the end of the exponential,
[38:41] and like people don't even seem to
[38:43] realize it." And if you ask Sam, he'll
[38:45] say the same thing. And if you ask Elon,
[38:47] he'll say the same thing. That shouldn't
[38:49] be scary to any of us. It just means
[38:52] that we're in this recursive place
[38:55] where, you know, we have AGI, and the
[38:58] job of everybody in this room, including
[39:01] the folks sitting up here, is going to
[39:03] be, "How do we harness this technology
[39:05] for the betterment of all of us?" Right?
[39:08] Which is going to require going back to
[39:10] what a poor of said about the Invest
[39:12] America Act.
[39:14] You know, Dario says the accumulation of
[39:16] wealth that's about to occur, people
[39:18] call it the age of abundance we're going
[39:20] to enter into, right? That's going to be
[39:23] easier than ever.
[39:24] But the distribution problems we're
[39:26] going to encounter are going to be
[39:28] harder than ever.
[39:29] Right? And so, that was really the
[39:32] inspiration behind the work I've done on
[39:34] Invest America and the work that I think
[39:36] we're all going to have to collectively
[39:37] do around the social contract, the
[39:39] intersection between public policy and
[39:41] technology. Because when the exponential
[39:43] looks like that, and all of a sudden,
[39:45] you have uh agents that are going to be
[39:48] able to have more capability than kind
[39:50] of collective human intelligence, and
[39:52] it's happening at an accelerating rate.
[39:54] Remember, all the stuff we've talked
[39:56] about has occurred with almost no
[39:58] compute.
[40:00] Anthropic and Open AI are going to add
[40:02] more compute this year than all the labs
[40:05] put together for the last decade.
[40:07] Okay? And the year after that, they're
[40:09] going to double it again.
[40:10] So, you know, uh I I think the rate of
[40:13] change is is is is parabolic. And that
[40:16] to me is both exciting. I'm an optimist
[40:19] about what's to come, but I'm
[40:22] you know, I'm not polyannic about the
[40:24] challenges that come with that rate of
[40:26] change, right? Like it's going to
[40:28] require active engagement like it has in
[40:31] other periods in history around the
[40:32] Industrial Revolution, the Digital
[40:34] Revolution, etc. Because it's going to
[40:36] exact a lot of a lot of change on the
[40:38] world, but with that I just want to say
[40:41] you know, it's been extraordinary
[40:42] watching Sunny orchestrate the work that
[40:45] he's done at Grok. He's incredible
[40:47] thought leader
[40:48] in this whole area.
[40:50] I appreciate you coming in, but but
[40:51] maybe just open it up some questions and
[40:53] hopefully we can cover a lot of
[40:54] territory.
[40:57] Right here.
[41:00] >> [applause]
[41:05] >> As the marginal benefit of
[41:07] increasingly is being discriminated
[41:10] how do you suggest everyone here
[41:11] positions themselves and to make sure
[41:14] we're not just sitting wasting our time
[41:15] studying something.
[41:19] Yeah, I mean listen, I I
[41:21] I have to answer this question for my
[41:23] son and
[41:25] and for so many others
[41:27] and
[41:29] humans have a unique way
[41:31] of finding a way to add value
[41:34] to society notwithstanding disruption.
[41:37] Right? In the Industrial Revolution like
[41:39] if you were a tradesperson or
[41:40] craftsperson, right? And you built a
[41:42] product beginning to end, almost all of
[41:44] them were displaced by
[41:46] you know, mass production.
[41:48] And for that person it didn't feel good.
[41:51] I was really good at making a wheel
[41:53] start to finish. But I was totally
[41:55] disintermediated by the means of
[41:57] production.
[41:58] Okay? So, but
[42:00] it's not like we
[42:02] you know, the world just stopped. Those
[42:04] people found other things to do and one
[42:07] of the observations I have is that we
[42:08] used to have you know, 80% of people
[42:10] that were in manufacturing and that were
[42:12] in, you know, farming and other things.
[42:14] Today we have 70% of people in the
[42:16] service economy. Right? We have the
[42:18] luxury of people, you know, we didn't
[42:21] used to hire coaches as an example,
[42:23] right? Like couldn't afford to hire a
[42:25] coach, right? Today you have coaches and
[42:28] yoga instructors and tons of things in
[42:30] the world that adds a lot of value to
[42:32] the world. And I think that we have
[42:34] higher order things that people do.
[42:37] Right? And so for me one of the things
[42:39] is,
[42:39] you know, if you were well-off enough
[42:41] that you could hire a tutor, a
[42:43] specialized tutor, that was great. But
[42:45] for 98% of the world who couldn't afford
[42:47] that, now they can get that. Or if you
[42:49] were part of the two or three percent
[42:51] that could have, you know, um, concierge
[42:54] medicine, it was really great.
[42:56] But for the other 97% wasn't great.
[42:58] Well, now they can get that same level
[43:00] of care,
[43:01] um, you know, and so I think this is
[43:03] about democratizing intelligence,
[43:05] democratizing access, etc. Um, but it
[43:09] it's not to say that there aren't going
[43:10] to be different challenges. My number
[43:12] one thing again is make yourself bionic,
[43:15] be a creator, figure out a way, you
[43:18] know, that you add value.
[43:20] Um, so if somebody comes and wants to
[43:21] interview at Altimeter,
[43:23] and
[43:25] you know, they say, "Oh, I don't use,
[43:26] you know, AI and I don't use Excel
[43:28] spreadsheets. I do everything by hand."
[43:31] That would be a problem.
[43:33] Right? Like I expect somebody to use all
[43:35] the greatest tools at their disposal to
[43:38] be the most effective they can to add
[43:40] value, to allow us to, you know,
[43:42] generate alpha in the world. And so
[43:44] starting in a place like this, another
[43:45] way of saying it,
[43:47] you know, reserving this for a tweet at
[43:50] some point in time, but I think that IQ
[43:53] gets commoditized
[43:55] and EQ becomes super valuable.
[43:59] Okay? What do I mean by EQ? I mean a
[44:01] network of people in this room. I mean
[44:03] the ability to persuade the person
[44:05] sitting next to you. The ability to form
[44:07] your team, the ability to lead people in
[44:10] different right like that is super
[44:11] valuable.
[44:13] Right? I think it becomes more valuable
[44:14] in the future. And but I think that you
[44:17] know, just being the smartest person
[44:19] in a room and and solving the problem at
[44:21] the board faster than all the other
[44:22] humans in the room like that I think is
[44:24] commoditized and you're not going to be
[44:26] able to beat the machine. Doesn't mean
[44:28] you don't need to learn those things,
[44:29] but I think it'll be hard to beat the
[44:30] machine.
[44:31] Brad, one thank you for doing this. We
[44:33] need BG2 back. Listen, we only get Brad
[44:36] once in a while in the All-In pod now.
[44:37] You only get your one slot, so let's
[44:40] let's get you back on the pod. Exactly.
[44:41] Yeah, let's get BG2 back. But can I can
[44:43] I add one thing to that? I think like
[44:46] there's this other moment that's
[44:47] occurring right now, and I think about
[44:49] this quite often. And you if you
[44:51] actually look at what's happening in
[44:52] mathematics right now, there's all kinds
[44:53] of new discoveries happening. And I use
[44:55] the following analogy like humanity had
[44:57] to wait for like an apple to fall on
[44:59] Newton's head for him to kind of then
[45:01] start theorizing about gravity and you
[45:03] know, start formulating that. But now if
[45:05] if we can have something else working
[45:06] and discovering new things like and it
[45:08] goes back to that chart that Brad
[45:10] showed, right? It's really until we
[45:11] started having more, you know,
[45:13] innovation, more intelligence that you
[45:14] know, those curves went up into the
[45:16] right. We're just about to make that go
[45:18] more vertical. So I I think I think the
[45:20] overall benefit to humanity has already
[45:22] been shown what happens you have more
[45:23] intelligence, right? And so and we don't
[45:25] have to wait for things to happen. We
[45:27] let the ages do it without us in the
[45:29] loop, which I think will be powerful.
[45:31] There's one up. Yeah. Okay, one thing
[45:33] about the appreciation of what they're
[45:35] doing, but particularly given that Apple
[45:36] is following the strategy of let
[45:38] everybody else and then and we're just
[45:40] going to ride on top of them and
[45:41] hopefully I ask how they're building a
[45:42] device which might actually rival and
[45:44] add to Apple. So I'm just curious if the
[45:45] power you can project in I think it's a
[45:47] high stake I mean, listen, I I'll tell
[45:49] you even the people at Apple are nervous
[45:50] with their strategy. Um and so part of
[45:54] it is their challenge around privacy.
[45:58] They have a real challenge with because
[46:01] we don't have the capability yet on the
[46:03] edge
[46:05] and they don't want to have you sharing
[46:07] information up to the cloud,
[46:09] right? Um given their, you know, like
[46:12] they view as one of their core consumer
[46:14] value propositions is con- consumer
[46:17] property or consumer privacy. But I
[46:19] think they put themselves at, you know,
[46:22] at at risk. The the the the bull case
[46:24] would simply be that we're so sticky to
[46:27] the device and the device is so good
[46:29] that they have time and ultimately
[46:32] the Gemini modeler that they're going to
[46:33] put on the phone is just going to be a
[46:35] much more capable Siri. We can all agree
[46:37] the old Siri is really bad. And, you
[46:39] know, it will be a more capable Siri.
[46:40] And for the vast majority of people that
[46:42] will be good enough,
[46:43] right? So that would be the bull case.
[46:45] You know, I think the bear case is, like
[46:48] you said, that other people come along
[46:49] and build more ambient devices that, you
[46:52] know, consumers,
[46:53] uh you know, really like. But for for
[46:55] me,
[46:56] I frankly wish that OpenAI wasn't
[46:58] working on a device. I wish they would
[46:59] just focus on building intelligence, you
[47:02] know, and I think Apple is going to be
[47:04] very formidable,
[47:05] uh you know, in the device world. So I
[47:07] think they're in a reasonably good spot.
[47:08] >> One stat, uh an 8 billion parameter
[47:11] model, which is quite small, right? Can
[47:13] burn out a phone in an iPhone in 30
[47:15] minutes.
[47:16] It goes back Battery life. Yeah, battery
[47:18] life, yeah. So I just say you got to go
[47:20] back and look at how compute-intensive
[47:22] AI is, right? And so that's the real
[47:24] challenge on the stuff that Brad said
[47:26] about pushing so much of that, um you
[47:28] know, frontier intelligence to the edge.
[47:31] It's over here. Yeah.
[47:34] I wanted to touch on something you were
[47:36] talking about with the massive AI hype.
[47:38] Mhm.
[47:39] And I think it was on the recent episode
[47:41] of
[47:41] All In, and Chamath was criticizing, um
[47:45] various tech CEOs for in creating
[47:48] hyper-reality product launches sort of
[47:50] fear-mongering. Yeah. What AI is going
[47:52] to do is to not compare it to using safe
[47:55] dogs. And I guess I would be curious,
[47:58] like what incentives do the tech CEOs
[48:00] need to revise and rethink their
[48:03] messaging?
[48:05] He and I had that argument on the pod
[48:06] again today. Again, call me naive, I
[48:09] think that Dario's speaking
[48:11] authentically what he believes. I think
[48:14] Sam's thinking speaking authentically
[48:16] what what he believes. They're staring
[48:17] at this exponential. They believe that
[48:19] they see AGI or ASI, and I think they do
[48:22] have legitimate concerns. Like listen,
[48:24] I'm glad that we sandbox Mythos. You
[48:27] know, they tested it internally.
[48:30] Um they found 26 vulnerabilities on the
[48:31] Safari browser. And like I said to
[48:33] Chamath today, do you want them to just
[48:35] throw it out there and then all your
[48:36] browser history is out in the public?
[48:38] Probably not. Right? So, at the same
[48:41] time, I don't think it helps, you know,
[48:43] going out and fear-mongering,
[48:45] particularly if your real intent is
[48:48] regulatory capture, you know, to prevent
[48:50] everybody else from climbing up the
[48:52] ladder now that you're on the top. Like
[48:53] I have a real problem with that. Um but
[48:56] I think we have to find that balance and
[48:58] those tradeoffs, right, between
[49:01] reminding people about
[49:03] you know, the optimistic side of things.
[49:07] Um and I encourage you to read both of
[49:09] Dario's
[49:11] um
[49:11] you know,
[49:13] uh essays.
[49:14] Um and his first essay on this is quite
[49:18] optimistic about, you know, what can
[49:20] happen. But I think he has the other
[49:22] side of it, which is but it it it
[49:24] doesn't happen without us being very
[49:25] thoughtful about the guardrails and
[49:27] things we need to put in place. I don't
[49:29] you know, one of the things I was really
[49:31] happy about it's called Project Glass
[49:34] Wing, which is this consortium that they
[49:36] put together this week to effectively
[49:38] sandbox Mythos before they release it
[49:40] publicly. You know, Amazon, Microsoft,
[49:42] etc. Like that team seemed to me to be a
[49:44] very pragmatic market-based solution to
[49:47] to solve the problem. And
[49:49] he and I were were just texting before I
[49:51] came over here. They they found and
[49:52] they've hardened a lot of things already
[49:54] very quickly. And within 100 days you
[49:56] can do a lot when you're having the AI
[49:57] fix the things that it finds. And so,
[50:01] um
[50:01] I you know, I I certainly I talk
[50:04] optimistic about it. I think a lot of
[50:06] other people do. Might encourage them to
[50:08] to find a little bit more balance in
[50:10] their commentary. Um
[50:12] but, you know, I I I also don't want us
[50:14] to ignore the realities
[50:17] that when you split the atom,
[50:19] okay? It can either provide unlimited
[50:22] free energy for the world and totally,
[50:24] you know, bring people out of darkness,
[50:27] right? Or it can be used to make a bomb
[50:28] to destroy cities and and nations.
[50:31] And so, like powerful technology is
[50:33] powerful technology. We can't just stick
[50:35] our head in the sand and act like it's a
[50:36] one-way street.
[50:38] Yeah.
[50:39] Well, thanks both of you for taking us.
[50:41] Sure.
[50:42] You're mentioning that you're seeing
[50:42] positive inference of growth drop
[50:43] dramatically. There's a lot I think
[50:45] Donald says that the cost of training is
[50:47] as 3X a year for every 100 billion
[50:49] instead of last year for every So, I'm
[50:50] just curious how you're you're balancing
[50:52] that out. But, it's getting cheaper
[50:53] certainly but that's what you're seeing.
[50:54] Can you tell me why that happens?
[50:56] I I think you you you see kind of a
[50:58] couple of phenomenons. One, the gear
[51:00] that's used for training turns into
[51:02] inference gear, right? So, those big
[51:04] clusters
[51:05] we're seeing that happen kind of all the
[51:07] time. So, um
[51:09] there there's a just a natural
[51:10] progression between kind of, you know,
[51:12] those two worlds. And then I I I really
[51:15] do think like the innovations
[51:16] [clears throat]
[51:17] that come, you know, from those larger
[51:19] models and those training clusters have
[51:21] such a large benefit and they tie back
[51:24] into, you know, what Brad said, right?
[51:26] You know, if if
[51:27] you know, I I I was just reading this
[51:28] thing today by Mustafa from from
[51:30] Microsoft, right? Saying, "Look, GPT-2,
[51:33] what what we have 50X kind of more
[51:36] powerful compute today than what we did
[51:37] when we did GPT-2.
[51:39] But, look at the capabilities, right?
[51:41] And so, I think you just have to kind of
[51:43] keep those two things in line with sort
[51:45] of the entire topic of the conversation.
[51:47] Um, you know, innovation's going to keep
[51:50] happening because, you know, Brad
[51:51] touched on a little bit, we're just now
[51:53] unleashing AI into designing these
[51:55] things, right? And there's things that
[51:56] we see and we learn in terms of
[51:58] optimizations and software and hardware
[52:00] optimization that we don't see. So, I
[52:02] continue to believe it'll come down. Um,
[52:04] but yeah, the the
[52:06] we're just working on a problem that's
[52:07] just very, very intensive from a compute
[52:10] standpoint. So, those numbers are going
[52:11] to be large.
[52:12] >> Maybe maybe one final question. You can
[52:16] pit dive with Sunny and I. We can answer
[52:18] some some questions after the fact and
[52:21] uh um but uh otherwise I just want to
[52:24] say it's a great privilege for us to get
[52:26] to spend some time with you guys. So,
[52:28] thanks for having us. But maybe right
[52:29] here?
[52:30] Yeah, hi. Thank you for sharing. Just
[52:32] one question about economics in your
[52:35] model.
[52:36] So, congrats on the uh major share with
[52:39] Nvidia and
[52:40] certainly you're part of that Nvidia
[52:42] block.
[52:43] So, what do you think is the long-term
[52:45] economic model for Nvidia because right
[52:48] now it's a giant giant business, right?
[52:50] And certainly buys and sells, you could
[52:52] say.
[52:53] GPU margin 60% net margin.
[52:56] And that makes it very difficult
[52:59] for the for the entire
[53:01] ecosystem because where do you think
[53:03] they can break through those walls?
[53:05] Um, what do you think what eventually
[53:07] would happen?
[53:08] Let's say a few scenarios. A,
[53:11] their value may be limited but they keep
[53:15] their market.
[53:16] And two, they somehow
[53:19] lower their market substantially
[53:21] and be able to keep
[53:23] a a much bigger market share. So, you
[53:26] know, I maybe just repeat it real quick
[53:28] and then I answer it since I'm an
[53:29] >> going to let I'm going to let you answer
[53:30] >> investor in Nvidia. I I work there, so I
[53:32] I shouldn't answer that question. I mean
[53:34] I mean, listen,
[53:35] Nvidia is a four and a half
[53:36] trillion-dollar company. It's trading at
[53:38] about 13 times earnings, very cheap,
[53:41] half the market multiple, growing at
[53:42] 70%, is obviously dominant you know, it
[53:46] it it in the market today. And um
[53:50] I think there's a wall of worry about
[53:52] Nvidia because everybody says what you
[53:53] does, you know, Trainium and you know,
[53:56] and TPUs and Cerebras and Groq and all
[53:59] these people can come up with inference
[54:00] solutions, they can steal your share,
[54:02] they can compete on price. That's the
[54:04] beautiful thing about capitalism. You
[54:06] know what Nvidia will have to do?
[54:08] Compete. They either deliver a product
[54:10] that people are willing to pay more for,
[54:11] they have to drop their price, their
[54:13] margins come down, and they'll have to
[54:14] compete in that market. I would tell you
[54:17] that when I look into the product
[54:19] roadmap um for what's going on at
[54:21] Nvidia, and the acquisition of Groq was
[54:23] part of it, I think they're going to be
[54:25] in an incredible position. They've
[54:26] already announced that they have a
[54:28] trillion dollars, a trillion dollars of
[54:30] sales over the course of the next eight
[54:32] quarters that are already booked. People
[54:34] have more demand than than than they can
[54:36] get memory and supply to build all of
[54:39] this compute out. So, I think we're so
[54:41] early in this. I was in Silicon Valley,
[54:44] you know, not so long ago, 16 years ago
[54:47] when they said there could never be a
[54:49] trillion-dollar company.
[54:51] There would never be a trillion-dollar
[54:52] company. I asked the question, well,
[54:53] why? They say, well, law of large
[54:55] numbers. I was like, well, what
[54:57] what stone tablet is that etched into?
[55:00] Okay? In nowhere. Okay, today we have a
[55:03] four and a half trillion-dollar company.
[55:04] I've already said publicly, Nvidia will
[55:06] be the first 10 trillion-dollar company.
[55:08] Okay? And I'm not It's not because I'm a
[55:10] cheerleader. I can sell my Nvidia and
[55:12] invest in anything that I want to invest
[55:13] in. But
[55:16] that company's leadership, that team,
[55:19] their lead the lead that they have on
[55:20] both training and inference, and the
[55:22] rate at which they're moving, right? I
[55:24] think puts them in a really great
[55:26] competitive position. And they're doing
[55:28] all this notwithstanding the fact that
[55:29] Trainium successful, TPU is successful,
[55:32] custom ASICs are being very successful,
[55:35] and they're still killing it. And I
[55:36] think that says a lot more about the
[55:38] size of the market for intelligence and
[55:40] the compute that's needed to get us
[55:42] there than it does about, you know, the
[55:44] individual company.
[55:46] With that said, we have to wrap. Apoorv,
[55:48] thanks for having us. Thank you. Thank
[55:50] [applause] you.