# 75th ECTC Wednesday Keynote

https://www.youtube.com/watch?v=cNP4bQhRsSE

[00:03] Good morning everyone.
[00:05] Good morning.
[00:07] A warm welcome to ECTC 2025.
[00:11] My name is Florian Herald and I am your general chair.
[00:18] ECTC 2025 is a flagship conference of ITE electronic packaging society.
[00:21] Today really marks the culmination of month of hard work and collaboration.
[00:29] We gather here today to express our gratitude to all of those who have made this event possible.
[00:35] First and foremost, I would like to extend a big thank you to our sponsors whose generous support has been instrumental in bringing this conference to fruition.
[00:44] Your commitment to our shared vision and your investment in our community has allowed us to create an exciting experience for all.
[00:56] And the monitors just went off.
[00:58] [Laughter]
[01:00] Um,
[01:03] thank you.
[01:05] And obviously I want to express my uh um sincere appreciation to our exhibitors who uh showcase cutting edge technologies, innovative products and groundbreaking solutions.
[01:16] Your presence adds tremendous value to this conference creating opportunities for engagement, networking and collaboration.
[01:24] I want to express my uh deepest gratitude to our sim speakers and panelists who have traveled from far and wide to share their expertise and insights.
[01:32] Your wealth of knowledge, research and experience will inspire all of us, setting the stage for meaningful discussions and excellent learning.
[01:45] to the technical program committee volunteers.
[01:47] Your tireless efforts in reviewing our papers, organizing our sessions, and ensuring the highest quality of content have been instrumental in creating an outstanding program for attendees.
[02:01] And of course, a warm thank you to each
[02:04] and every one of you today.
[02:07] Your presence and active participation are the lifeblood of this conference.
[02:12] Your enthusiasm, engagement, and diverse perspectives will shape the discussions and interactions over the next few days.
[02:20] Please enjoy this excellent conference ahead.
[02:21] Let me give you a round of applause really quick.
[02:31] Before we start, a few logistical items.
[02:33] Please install the app.
[02:37] Uh the entire program is up to date obviously.
[02:39] We also have a nice messaging platform for networking.
[02:41] Um, finally, this is very important.
[02:44] After the keynote speech, we will have a Q&A session and I will use the app and the the live setting uh under the keynote uh session um to to be able to answer questions.
[02:59] Now, without further ado, it's my great honor to introduce our keynote speaker, Sam Nasfiger from AMD.
[03:15] Really quick, I would like to give Sam is SVP and corporate fellow at AMD, responsible for technical strategy and product architectures.
[03:26] He has been the lead innovator behind many of AMD's products.
[03:28] Thank you, Sam.
[03:33] Thank you, Florian.
[03:38] Well, I'm super pleased to be able to be here in front of you all.
[03:46] It was um 2020 and a very different time and um and not in any place in particular that I spoke last time and uh you know AI was barely on our radar and now for many of us it consumes most our waking hours and so that's what I have the privilege of talking to you about today are the challenges ahead of us in the AI era and
[04:08] in particular how we can efficiently meet the exploding demand for AI compute.
[04:16] It's one of the most it's the most exciting time ever in my 37y year career as a technologist.
[04:23] So the the fundamentals that drive this u demand for compute are that bigger models are smarter models and the explosion in parameter counts and compute required to train these models has been going at an unprecedented rate.
[04:42] Now you've seen graphs like this before but it's it's such a an extraordinary inflection it's worth repeating.
[04:48] four to 5x per year.
[04:51] The much um vaunted so-called Moore's law was about 1.4x per year, right?
[04:59] Doubling every two going at four to 5x per year is an extraordinary exponential.
[05:06] And there's actually really no end in
[05:09] Sight.
[05:11] And you can see even you know this year at the end of this chart we are requiring an exoflop running for a year in order to ch train the most sophisticated frontier models.
[05:23] Now obviously a year's worth of training that model has become obsolete by the time it's complete.
[05:28] So we need a lot more than an exoflop in a supercomput to do that training.
[05:34] and and this demand is is continuing right through and and and and so you know we can ask ourselves is this a durable trend?
[05:47] We are all skeptical engineers.
[05:50] We look at this stuff we've seen bubbles before.
[05:52] Perhaps we've gotten a little cynical.
[05:54] Uh so that's the big question is is are the economics that drive these trends sustainable and durable?
[06:03] So I want to run through just a few examples that have convinced me that that is the case.
[06:10] So I'll start with the medical field.
[06:14] And this is just barely starting to be tapped.
[06:17] It's in various headlines around diagnostics and and patient consultations.
[06:23] But if you think on the diagnostic front, the potential for an AI that can ingest a patient's entire medical history, their medications, even their genome, and compare that situation with their ailment to all the other patients with a similar profile and the outcomes of those patients treatments and provide a very specific diagnosis.
[06:51] and medical recommendation.
[06:55] There is no doctor in the world that can perform that sort of analysis with that breadth of data and provide that specificity of a diagnosis.
[07:07] The the power is is immense and the uh
[07:11] the ability for robotics to impact surgical procedures.
[07:17] and then there is drug discovery and this has already taken off.
[07:23] AI has tremendous capability for understanding the entire molecular search space of drugs and how they have impacted biological behaviors and their impact to diseases and pathogens and which ones are benign and have side effects and process that and produce meaningful ideas on the particular type of of drug and DNA that will treat a um an ailment.
[07:52] So medical applications are going to be immense.
[07:56] And then robotics and we're we're really just getting started.
[08:00] You know, industrial robots have been a a fixture for a long time with very limited range of motion, very specific actions.
[08:10] But what's going to transform the field
[08:12] of robotics is combining the robotic mechanics with the ability to process sensor information in real time and couple that with a digital twin representation in the cloud where the the robot is modeled and trained and can absorb the information and the training that's provided by other robots.
[08:39] in other environments.
[08:43] And so to be able to respond in real time and make useful decisions based upon the sensor feedback is going to be completely transformative.
[08:51] And the economic impact of intelligent robots that can adapt and respond to their environment and perform tasks that humans perform but do it 247.
[09:05] it the opportunity is is um mind-boggling and it all depends on AI.
[09:14] And the final field that I'll mention to as a motivational supporting statement for why AI is huge and and durable is of course AI for science.
[09:23] You know, science is about making sense of the world of data with math and equations and applying that to to physical problems.
[09:35] And that is a great description of what AI is great at.
[09:41] absorbing climate patterns and understanding crop yields based upon soil types and the particular characteristics of the the of the pests in that region.
[09:56] So in in agriculture the opportunity is going to be huge.
[10:00] It's like having an omnisient farmer who understands a hundred years worth of of weather behavior and the response of his crops to particular fertilizers and can come up with the exact right variant to
[10:16] plant for this year, the right pesticides to use, the right genomic variant to use and improve yields and durability of those crops.
[10:27] and you know biomed material science the the opportunities are amazing and and it's super exciting you know as at AMD we're involved on the leading edge in the supercomputing field.
[10:37] we sourced the hardware for the top supercomputers and the way AI is transforming this the supercomput high performance compute space is is amazing so hopefully I've made a good case that the demand is is durable and those exponentials are going to go as long as physically possible.
[11:00] So it's important then to understand how is this AI field evolving and and this is moving at a faster pace than I have ever experienced in my 37 years in the industry.
[11:14] I used to think I worked, you know, 10, 15 years ago in an extremely
[11:19] fast, dynamic, um, high-speed environment.
[11:22] You know, technology constantly evolving, but it's nothing like we see today.
[11:25] So, I hope I could give you a couple insights into trends that I see in the uh the evolution of AI and they affect how we plan for our future technology development.
[11:36] So the first one to note is this large model training.
[11:39] The blue bar on the left is what's driven AI.
[11:45] You know the chat GPTs and the Geminis to achieve their amazing capabilities ingesting the entire universe of internet data and training the model.
[11:59] But we've run out of data.
[12:02] The high quality textual data has been completely consumed.
[12:05] Now we can do multimodal models looking at photographs and videos but it's much lower quality.
[12:11] There's less much less information per bit in that data set.
[12:13] So the way AI is getting
[12:20] smarter is through approaches called post-raining and test time compute or chain of thought reasoning.
[12:29] And this is where models check each other, generate synthetic data, and iterate on a response and produce a much more thoughtful outcome.
[12:40] So, we've hit the data wall.
[12:42] Pre-training is reducing in importance.
[12:45] Does that mean that compute demands are going to diminish?
[12:49] The emphatic answer is no.
[12:52] Because of the chart there on the right that the intelligence of these models which is what makes them so valuable and indispensable to our future improves with the log of compute invested.
[13:06] And of course that means exponentially more compute for a linear return in intelligence.
[13:12] And every increment of intelligence is hugely valuable.
[13:14] Going from grade school math to graduate level understanding is an incredible game.
[13:20] But it takes two or
[13:22] three orders of magnitude more compute.
[13:25] So the demand for compute is going to continue to escalate.
[13:29] And what that's done is drive down costs.
[13:31] And this is what our industry is super great at, right?
[13:37] We refine our manufacturing processes.
[13:38] We generate higher and higher volumes, get the yields up, and costs go down.
[13:44] We use software and and tuning and new revisions of the silicon.
[13:47] So each one of these lines is a constant intelligence level on a given benchmark.
[13:52] So like the blue one is MMLU greater than 42%.
[13:55] And the cost has dropped by over 10x per year for the same level of intelligence which is amazing.
[14:07] And the paradox here is that being cheaper doesn't mean people consume less.
[14:12] Jevans paradox says that when a desirable quantity becomes cheaper, demand for it in absolute dollar value
[14:23] of consumption increases.
[14:26] Sort of like when cars become a lot more fuel efficient, more gas is consumed because people don't drive less, they actually take advantage of their higher mileage cars and go further.
[14:39] And it's similar with with this inference.
[14:39] So we have a a huge and rapidly growing demand for AI.
[14:46] And the um the the next and the the final thing I wanted to hit on here which is quite fascinating to me is the the pursuit of AGI or artificial general intelligence which is kind of the pot of gold at the end of the rainbow right.
[15:02] gotten a lot of airtime and so we should ask ourselves how do you know people don't know how to measure this for sure but we keep generating new benchmarks and you know LLMs are amazing at regurgitating knowledge answering test questions and identifying patterns.
[15:21] But we have found there are a lot of benchmarks that are easy for humans that
[15:25] a that um LLMs just are stumped by.
[15:29] For instance, simply filling in a yellow square in the gap in the pink ones ends up being really hard for a lot of these models.
[15:37] Now the new testime compute chain of thought models can do better.
[15:42] Um, and we get excited, but then we come up with a new benchmark.
[15:45] It's like, hey, you know, if it has three holes in the shape, color it green.
[15:50] If it has, you know, one, color it yellow.
[15:53] Humans are 100% on this.
[15:56] Easy to do.
[15:59] LLMs completely fail.
[16:02] The test time compute models like um like uh 01 from open AAI do a little bit better but they cost a lot more.
[16:14] So this is the progression of intelligence capability.
[16:16] It's it's it's fascinating to watch and we're going to continue to develop new models and the LLMs get better and better as they solve these
[16:27] These human specific problems.
[16:29] As they become more humanlike and more useful to our world.
[16:32] But it all requires compute and compute creates demand and it creates economic activity.
[16:50] And as the um the the final overview here of of why we are in the midst of a very durable development in our industry, uh these charts on the left, I mean the the dollar numbers on the right are huge, but it's really useful to look at the projections for capex spend for the top eight AI companies.
[17:10] And that's what's averaged out on those lines.
[17:13] And the orange line is the prediction over time for um from back in 2023.
[17:20] And it went up and to the right.
[17:23] It was big.
[17:26] But then in 2024, the average capex
[17:28] um outlay for these companies was projected to be on the red line and in 2025 it moved up to the blue line.
[17:36] So the projections for spend are actually increasing.
[17:40] model usage, growth, inferencing, everything else isn't going down.
[17:45] It's it's only increasing.
[17:48] So, a super exciting place to be.
[17:50] So, now we got to get into the technology, right?
[17:52] What does it take?
[17:54] This demand is there.
[17:57] It's growing.
[17:57] Compelling applications.
[17:57] How do we source the demand?
[17:59] What are the challenges we as engineers have to solve to exploit this opportunity and all that AI can bring?
[18:07] the the the um the improvement here over year for just GPUs and I focus on GPUs.
[18:13] It's something we develop at AMD I know well and we've been doubling compute floatingpoint operation performance per year at a very consistent rate and and that's quite an exponential and and it's been driven by better uh
[18:30] algorithms implementations uh and the uh of course progression of of silicon speeds and packaging technology packing more and more silicon into a given package.
[18:43] Then we've been adding on to that lower precision math and this is critical because AI can exploit fewer bits per number very effectively than traditional scientific compute and of course an 8bit number it consumes half the bits of 16 and four on down and you know multipliers for these low formats become super simple just a few xorgates rather than big arrays of devices.
[19:10] Right.
[19:13] So the software hardware codees to exploit these low precision formats is a huge open uh search space that's produced tremendous value already.
[19:23] So the rates of improvements on flops are extraordinary and we have to feed the compute with a
[19:34] consistent bytes per flop.
[19:37] So the HBM which is the memory style of choice for most of these big iron compute devices has been um has been feeding them and and we've been demanding you know commensurate with the flop improvements uh greater than 2x per two years and and actually those low precision math formats they consume a little more than half the bandwidth per flop.
[19:59] So um memory bandwidth demand is is extremely steep.
[20:05] But uh despite the best efforts of our friends the the DRAM vendors, the HPM stack itself has only been improving about every doubling every four years.
[20:18] So of course that means if we're if we have a uh twice the rate of doubling of bandwidth demand versus the memory itself, we have to put down a lot more stacks of memory.
[20:27] And so that's what you see happening in the industry.
[20:29] And that means bigger and bigger modules to host
[20:35] these HPMs stacked up next to the
[20:38] silicon that consumes that bandwidth.
[20:41] Grading ever bigger and more challenging
[20:44] to manufacture and yield silicon
[20:46] packages.
[20:49] And not surprisingly, the power
[20:51] increases commensurate with the flops
[20:54] and the bandwidth. Despite the best
[20:57] efforts of our silicon technologist
[20:59] friends, the energy per operation
[21:01] improvement from each process node is is
[21:04] diminishing. And and since we're
[21:07] doubling every two years, there's no way
[21:10] that that could keep up. And our power
[21:11] is just taken a much steeper upward turn
[21:15] than in the past. So we have to deliver
[21:17] power into these devices and we have to
[21:20] get the heat out and the industry is
[21:23] moving towards direct liquid cooling and
[21:25] other cooling approaches because there's
[21:28] no other way to get the heat out.
[21:32] So you may ask why don't we just split
[21:35] these devices up you know so the memory
[21:37] isn't keeping up. Well heck let's just
[21:38] do a lot of small little compute
[21:40] devices. They'll be a lot easier to cool
[21:43] lower power and distribute them out. And
[21:46] that's a much smarter approach. Why are
[21:48] you guys going for the chest lumping
[21:49] huge modules and um and packing
[21:53] everything in? The answer is that energy
[21:58] of communication increases exponentially
[22:01] with distance. And that's that's really
[22:03] the fundamental. So when we're on the
[22:06] left side of this chart and computing in
[22:08] the device, the energy to move a bit
[22:12] back and forth from the memory or from
[22:15] the cache to the floatingoint unit and
[22:18] back and to iterate on a result is
[22:20] dramatically lower then you have to
[22:22] iterate than iterating on a result
[22:24] across devices or in the rack. So we
[22:28] have factors of tens and and hundreds
[22:32] between different levels of
[22:34] communication. So maximizing locality is
[22:38] the key to efficiency and that factor is
[22:43] what is driving our focus on these big
[22:46] compute modules tight integration
[22:49] and the packaging technology to make
[22:51] that possible.
[22:53] So the uh the net takeaway is that power
[22:58] is limiting our performance. It's
[23:00] inexraably increasing as demand for AI
[23:05] and the compute uh goes up and as the
[23:09] technology scale factors cannot keep
[23:12] pace. So you've seen all this in the
[23:16] news, you know, gigawatt data centers
[23:19] and uh the renewed interest in nuclear
[23:21] power and all aspects of power
[23:23] consumption.
[23:25] We're we're going to be up in uh
[23:27] multiple gigawatt regimes in the future.
[23:31] So let's dig into where that power is
[23:33] going and how we can bring it down. the
[23:37] the data center power consumption is
[23:40] driven by basically compute
[23:42] communication and overhead.
[23:46] We we want the compute to consume the
[23:49] vast majority of the data center power
[23:51] if possible, but uh we always have to
[23:55] communicate between these devices. And
[23:57] the reason we're building these giant
[23:59] data centers with these big racks with
[24:02] tens to hundreds of huge GPUs connected
[24:06] together is because that's the most
[24:09] efficient way to run these AI programs.
[24:12] they have to exchange data and the more
[24:14] memory that can be shared between
[24:16] multiple devices, the more efficiently
[24:19] these models can iterate towards the
[24:21] final result. That iteration requires
[24:24] communication.
[24:25] And that's the slice that you see there
[24:29] in the middle. And then between racks is
[24:32] the scale out network computation that
[24:36] typically today is done with optical
[24:38] interconnect. And then we have the data
[24:40] center overhead on the top. So our goal
[24:42] is to drive down each one of these
[24:44] elements. More efficient compute, more
[24:47] efficient communication, reducing
[24:49] overhead.
[24:54] So the the advanced packaging, and this
[24:55] is what I'll get into in more detail
[24:57] now, is is just fundamental to how we
[25:01] can meet this demand in as efficient a
[25:04] way possible. 2.5D foundational. We've
[25:07] been doing that for many years. AMD
[25:09] pioneered this with HBM over 10 years
[25:12] ago in the gaming space actually before
[25:14] the AI era, but it has proven
[25:17] foundational to growing uh the AI
[25:19] compute.
[25:21] 3D is something I'll talk about more.
[25:24] Obviously, tighter integration of
[25:26] components with minimal distances, low
[25:29] communication overhead is is going to be
[25:31] key for those huge modules I talked
[25:33] about.
[25:35] Inevitably, we have to communicate
[25:36] between devices in that scaleup domain
[25:39] with the shared memory.
[25:41] Electrical has gone far
[25:45] uh faster than I ever deemed possible.
[25:48] 10 years ago, if I thought we were
[25:49] discussing 200 or 400 gigabit per second
[25:52] over copper, I would I would have
[25:54] thought, yeah, you're crazy. It's uh
[25:56] just amazing how we've squeezed the
[26:00] copper and figured out how to signal
[26:02] over those lossy channels, but it it is
[26:05] running out of steam and optical is the
[26:08] future. So, integrating this is s is
[26:12] completely essential and you know as
[26:15] that you saw that power curve taking a
[26:18] steep upward trend.
[26:21] Well, as as power increases, current
[26:23] increases and power delivery becomes a
[26:25] huge challenge. So, a lot of exciting
[26:27] opportunities in that space and I'll
[26:29] talk a little more about it. And getting
[26:31] the heat out is one of the most exciting
[26:34] and fascinating challenges. And you
[26:38] know, with uh kilowatts being pumped
[26:40] into these modules, the uh the hotspots,
[26:44] the cooling technology is uh is moving
[26:46] very rapidly. So let me talk about each
[26:50] one of these areas in a little more
[26:51] detail. And you know the the first thing
[26:54] to note is is the 3.5D and why it's so
[26:59] valuable to our industry
[27:01] and and the the the fundamentals boil
[27:05] down to the cost of communication which
[27:07] it's what I hit on before. It's why
[27:09] we're packing as much as we can into a
[27:11] module, right? So you know on package
[27:13] communication pretty good. It's a lot
[27:15] better than going off package. 2.5D
[27:19] is um even better. We can get a lot more
[27:22] bits per jewel of energy.
[27:25] But with 3D, we can dramatically reduce
[27:28] the energy cost of compute. And since
[27:32] power consumption is the currency of
[27:34] compute and the fundamental limiter to
[27:36] how far we can scale it, if we can
[27:39] exploit this capability, we're going to
[27:41] produce to produce higher performing,
[27:44] more efficient devices that uh the world
[27:47] can consume more of. The other very
[27:50] attractive aspect of 3D
[27:53] is that we can pack more silicon in the
[27:55] module. Once again, locality is king,
[27:58] right? So if we can get more high
[28:01] performance silicon colloccated in a
[28:04] package package device we have a high
[28:08] performing device provided we can cool
[28:10] it and get the power in the the ability
[28:13] to um manufacture these large modules is
[28:17] constrained by you know warpage effects
[28:19] stuff that you all know very well you
[28:22] know warpage yields all of the the
[28:25] factors that constrain how large we can
[28:28] um we and uh produce these. So if we can
[28:32] 3D stack our silicon, we can get more
[28:35] compute into that same footprint, the
[28:38] same yield with the same constraints.
[28:41] So about 2x the compute area. Um yet
[28:45] another motivator for going 3D, low
[28:48] communication overhead, higher density
[28:50] of compute.
[28:53] So let me talk a little bit in detail.
[28:56] Now this is last year's product. We're
[28:58] coming out with a new one to be
[29:00] announced in June and it actually looks
[29:03] very similar to MI300 in construction.
[29:06] So this is is very relevant to the
[29:09] latest and greatest uh AI GPUs from AMD.
[29:14] So to to review some of the capabilities
[29:16] we've packed in, we have partitioned the
[29:18] die into multiple the the big compute um
[29:21] device into multiple pieces of silicon.
[29:24] And we did that because that improves
[29:27] yields and it improves modularity. So we
[29:30] can produce multiple product varants
[29:32] from the same tapeouts if we know how to
[29:35] connect them up efficiently. So we can
[29:37] pack CPUs into here. We can pack compute
[29:40] devices in. Obviously the uh eight HBM
[29:44] packages are critical to provide the
[29:46] memory bandwidth. We have cache and and
[29:50] all of the interconnect components to
[29:52] communicate between multiple devices.
[29:56] The stackup sandwich looks like this.
[30:00] It's uh it's remarkable how we've been
[30:03] able to pack so much into one device.
[30:05] And you know, the cartoon on the lower
[30:07] left really can't do justice to the uh
[30:11] what I consider just a a beautiful
[30:13] cross-section there on the right of how
[30:16] all this comes together. You can see the
[30:18] C4 bump on the bottom, the micro bumps
[30:21] above the coast silicon interposer layer
[30:23] in the middle, and then you have the
[30:25] through silicon VAS for hybrid bond
[30:28] connecting the two die uh in the middle
[30:31] with the silicon carrier on top.
[30:35] And the the green layers in the left
[30:37] cartoon are the active silicon. And note
[30:41] how we have on top what we call the XCD.
[30:43] That's the compute device which does all
[30:46] the math and generates the most heat. So
[30:49] we have the active devices for that XCD
[30:52] at the very top of the stack right next
[30:55] to the heat spreader. Better for
[30:57] thermals.
[30:58] So putting putting all this together,
[31:01] it's a it's a very elegant construction
[31:03] that solves many problems. We have
[31:06] direct high-speed connections to the
[31:08] bottom die which has the cache and the
[31:12] communication network on chip that then
[31:14] goes to the high bandwidth memory which
[31:16] is 2.5D connected. So we can move a lot
[31:20] of data on this device in a very
[31:23] efficient way and that's what I've
[31:25] plotted here. When um you tabulate the
[31:29] connectivity and the bandwidth in the
[31:31] diagram on the right, we have terabytes
[31:34] per second communicating between the
[31:36] four IO die. And each of those eight
[31:40] XCDs can consume 2.1 terabytes per
[31:43] second vertically through the hybrid
[31:45] bond through silicon VAS.
[31:48] Communicating that much bandwidth
[31:51] through even an advanced package 2.5D
[31:53] connection would cost almost 100 watts
[31:56] which is what I've shown on the chart on
[31:58] the left whereas it's less than 10 watts
[32:00] through high bin bron stacking.
[32:04] This sort of power efficient high
[32:06] bandwidth is what's going to continue to
[32:09] drive 3D stacking as a key enabler for
[32:12] AI compute. Another thing I want to
[32:16] spend time on because it's it's critical
[32:19] to to understand is is that software has
[32:22] the ability to dramatically improve the
[32:25] power efficiency of our devices. So we
[32:28] all work super hard designing the most
[32:31] efficient silicon, the most amazing
[32:33] packaging and and low power
[32:34] interconnects, but if software abuses
[32:37] them, we're going to end up with power
[32:40] inefficient compute. And and so that's
[32:43] what I've wanted to show here is that we
[32:45] have the ability in this device to
[32:47] partition it into multiple segments and
[32:51] those segments affect the cost of
[32:53] communication. If we can have data
[32:56] colllocated on a fraction of this big
[32:58] module and let's say most the
[33:00] communication occurs locally to the
[33:03] nearby HVMs and only necessary ancillary
[33:07] communication goes to the others. we
[33:09] have a lot more power efficient compute.
[33:12] So these we call NUMA domains per socket
[33:14] or NPS domains and software has to be
[33:19] aware of the construction of the device
[33:21] in order for this to be effective.
[33:25] And so to illustrate that a bit further,
[33:27] you know, we have the pyramid of
[33:29] connectivity there on the right. And you
[33:31] can see that compute devices are the top
[33:33] layer in the gray boxes. And then they
[33:35] go to their first level of cache, second
[33:37] level of cache, last level of cache, and
[33:39] HPM. And we do all of these caching
[33:42] levels to improve locality and amplify
[33:44] bandwidth as we move up. And that makes
[33:48] a lot of sense because communication
[33:50] gets dramatically cheaper as we get
[33:53] closer to the compute devices which what
[33:55] I've shown there on the right.
[33:58] So what we need to do is to inform
[34:01] software of this hierarchy and get it to
[34:05] map the matrix multiplies for given
[34:09] segments of the workloads to quadrants
[34:12] or or smaller segments of this GPU
[34:15] device. And when we do that right, we
[34:18] can save over half the power.
[34:21] And that's that's a huge shame. You
[34:23] think about what it takes to cut power
[34:26] by a factor of two through silicon
[34:28] engineering or package innovations.
[34:30] It's heroic efforts and hundreds of
[34:32] engineers and exotic equipment. Right?
[34:35] So this highlights why the hardware
[34:39] software codees aspects of of our effort
[34:43] are so critical. We have to work closely
[34:47] with the hardware folks to understand
[34:48] the hard hardware software folks to
[34:50] understand the hardware so they can map
[34:52] the algorithms to it properly.
[34:57] So I'll look forward now a little bit.
[35:00] We have plenty of challenges as we see
[35:04] the growth that we have to feed in the
[35:06] future and power delivery is near and
[35:10] dear to my heart and I've illustrated
[35:12] here today's devices and you can see top
[35:15] right the topology on these big compute
[35:18] modules so these are you know kilowatt
[35:20] plus compute modules today and so we're
[35:24] talking multiple kilo of total current
[35:27] that has to go into the silicon device.
[35:30] The the uh stack of blocks on the left
[35:34] and right of those diagrams which are uh
[35:36] MI300, MI250
[35:39] um are the voltage regulators.
[35:42] And so of course, you know, if you look
[35:44] at the bottom, you know, those things
[35:46] are stacked left and right. Well, then
[35:47] they have to deliver the kils of current
[35:51] horizontally through the package. And
[35:54] there's resistance in the package. There
[35:55] is resistance in the bumps between the
[35:59] power blocks and the device and the BGA
[36:02] package and and all the other elements
[36:04] in the path. That costs I squared R
[36:07] power. So current goes up by a factor of
[36:10] two. I squared R losses go up by a
[36:13] factor of four. And pretty soon we are
[36:16] in the untenable domain of of generating
[36:19] a ton of heat in the package and and
[36:22] wasting those precious watts which we
[36:24] really need to go to compute because
[36:27] power is what's limiting our growth in
[36:30] this industry. So we're going to have to
[36:33] move the power delivery underneath the
[36:35] compute devices. That's kind the obvious
[36:38] and most direct route. If we can get
[36:40] those voltage regulators into the blue
[36:43] module or underneath the blue module, we
[36:46] have much lower resistance in the path
[36:48] to compute and we have better regulation
[36:52] of our voltage and and a more efficient
[36:54] compute device. And there's a lot of
[36:56] great and exciting work going on in this
[36:59] field. But it's uh it it's absolutely
[37:02] essential to continue our scaling
[37:05] thermals. So, I've already mentioned,
[37:07] you know, today's uh air cooling can um
[37:12] you know, we're managing to cool up to a
[37:14] 1000 watt devices with high-speed fans.
[37:18] And that's the on the on the left there.
[37:21] And I have to note that, you know, I I
[37:23] total up these rack level power
[37:25] consumption budgets and what we need to
[37:27] work on and fan power is a frighteningly
[37:30] large number, like uh 20% plus of the
[37:34] total rack power. So I talked about I
[37:36] squared R losses costing 10 to 20% of
[37:39] power. Fan power can do the same. So now
[37:43] we got 40% of our power just for
[37:46] delivering current and extracting heat.
[37:49] That's obviously not the way to build an
[37:52] efficient compute system. So that's
[37:55] what's driving the relentless move to
[37:58] direct liquid cooling, which has some
[38:00] power overhead from pumps and and
[38:03] condensers, but dramatically less than
[38:06] high-speed fans with uh huge heat sinks.
[38:11] and and so liquid cooling absolutely
[38:14] critical and it's it's a driver not only
[38:17] of enabling higher power modules but of
[38:21] improving the power efficiency of our
[38:23] data center because it it produces less
[38:26] waste heat.
[38:30] Then there's the optical interconnect
[38:31] requirement and I touched on this
[38:34] already but the the you know electrical
[38:37] the days of electrical are are indeed um
[38:40] in our rear view mirror at least by the
[38:43] end of this decade and and part of that
[38:46] is actually related to power. I mean we
[38:48] we can signal faster. It's just the
[38:52] distances we have to travel or we're
[38:54] able to travel with 400 gigabit per
[38:57] second are very limited. And that means
[39:00] we need to have rettimer devices put
[39:02] down on the boards to boost the signal
[39:06] amplitude and and recover the signal.
[39:09] And rettimers cost power. They cost as
[39:11] much power as the original driver on the
[39:13] device. And and so it becomes
[39:16] impractical to signal at those rates and
[39:18] very expensive as well.
[39:21] So optical is uh you know it's going to
[39:24] get cheaper over time but it's not cheap
[39:26] right now. It's it's you know fiber
[39:28] attached units are complicated and
[39:30] there's a lot of sophisticated devices.
[39:32] We need lasers in the system figure out
[39:34] where to put them. They have temperature
[39:36] limits. So there's thermal interactions.
[39:39] um a lot of exciting challenges to
[39:42] overcome with this technology and
[39:44] getting it into the volume where we can
[39:47] reliably produce millions of these and
[39:49] have them operate for years on the
[39:51] world's most complex problems. That's a
[39:53] it's a big challenge and it's something
[39:56] many of you in this room I'm sure are
[39:58] are engaged in. And I also view optical
[40:02] as a big opportunity for improving the
[40:04] efficiency of our compute modules by
[40:08] reducing the energy to communicate
[40:10] between devices even on the same large
[40:13] package.
[40:15] So lots of exciting work ahead in the
[40:17] optical domain.
[40:20] The other area that I see huge
[40:22] opportunity is continuing the move to
[40:25] reduce the memory bandwidth power
[40:27] consumption. So I I talked, you know,
[40:30] about how our demand for memory
[40:32] bandwidth is is continuing up and to the
[40:34] right, outstripping what the memory
[40:37] stacks themselves can produce.
[40:39] And the energy for memory is very high.
[40:42] If you remember that pyramid chart, you
[40:44] know, it's it's hundreds of times more
[40:47] than local connectivity through a 3D
[40:50] stacked hybrid bond interface.
[40:53] So we need to solve the problems around
[40:57] the 3D integration of high bandwidth
[40:59] memory and there are a lot of problems
[41:02] here. The thermals are one obvious
[41:05] challenge because not only are we
[41:08] increasing the absolute power of our
[41:09] module but now we're increasing the
[41:11] power density by putting more devices
[41:14] together and we are also of course
[41:16] creating power delivery challenges.
[41:19] Those currents I talked about are very
[41:22] high and all we have are through silicon
[41:24] VAS to deliver current, but the carrot
[41:28] at the end of the stick is that low
[41:31] energy per bit to communicate
[41:32] vertically. And so there's tremendous
[41:35] economic motivation to solve these
[41:38] problems and do these tighter and
[41:40] tighter levels of integration.
[41:45] So scaling up to meet the AI challenge
[41:48] is is the the the grand challenge of our
[41:52] day and the problems are are uh
[41:56] exciting. They are daunting and the
[41:59] amount of innovation in this field is
[42:02] escalating driven by the economic demand
[42:04] and the huge potential of this
[42:06] technology to transform our lives to
[42:08] improve the world in so many different
[42:12] ways.
[42:13] The growth of these compute modules is
[42:16] driving the need for much tighter 3D
[42:18] integration, higher bandwidth
[42:20] communication between them, the power
[42:22] delivery and the heat extraction. I
[42:25] personally am super excited about the
[42:27] opportunities ahead of us and the
[42:30] tremendous power of our industry to
[42:32] collaborate together and solve these
[42:34] problems for the betterment of our
[42:36] world. Thank you for the chance to speak
[42:38] today.
[42:44] [Applause]
[42:51] Thank you Sam for a very exciting talk
[42:53] and very inspiring. Thank you again.
[43:00] We are now going to take questions from
[43:02] the audience. There's a microphone in
[43:03] the middle and uh please if you want to
[43:06] use the app in the live section of the
[43:08] presentation I will be able to relay the
[43:10] questions to Sam. Come on. Thank you.
[43:19] >> Yeah. Thanks.
[43:22] >> Yeah. Thanks for the great presentation.
[43:24] Uh my name is Jun Yung from Corning. Um
[43:27] so we talking about the uh this bigger
[43:29] and bigger the packaging. Um so uh based
[43:34] on my understanding I don't think we can
[43:36] make package size bigger than rec size
[43:38] because we cannot accommodate in the
[43:40] system. So assuming the packaging size
[43:43] is getting bigger and bigger. Do you
[43:45] think that bigger package is still kind
[43:48] of the better way to improve the system
[43:51] performance or do you see any other the
[43:55] um kind of um the mass the way um to the
[44:00] improve the the uh the system
[44:01] performance better?
[44:04] >> Yeah, good good question. So I I tried
[44:08] to um communicate that hierarchy of the
[44:11] cost of communication and that's the
[44:15] fundamental factor driving the value of
[44:17] large packages
[44:19] and and of course the the bigger a
[44:21] package gets the higher the cost of
[44:24] communication from one side to the other
[44:27] and so there are diminishing returns
[44:30] right and and that's where 3D becomes
[44:33] particularly valuable because We can go
[44:36] vertical and we can get more silicon
[44:39] into the same size at lower cost of
[44:41] communication.
[44:43] And if we can get memory into that same
[44:45] vertical stack, we have an even more
[44:48] compact efficient compute module. And so
[44:51] to your question that you know is are
[44:53] there better ways than the size of the
[44:55] package? I yes because we we get the
[44:58] package beyond a certain level and it's
[45:01] no better than two packages with say an
[45:05] optical link between them. Right? So
[45:08] those are the sorts of problems we're
[45:09] going to have to grapple with because
[45:11] it's all about the efficiency of
[45:13] communication and how the software maps
[45:15] the problem to perform the compute task
[45:18] with the least energy possible. So we
[45:20] all need to be thinking about the energy
[45:23] of compute and the cost of communication
[45:25] in our choices around technology
[45:28] development because that's what's going
[45:29] to drive the value. So then can I
[45:32] interpret your the um answer as the
[45:36] future architecture more like uh the um
[45:39] the inter package um the communication
[45:43] with a 3D stack probably another kind of
[45:47] the future direction
[45:49] >> it's going to be a critical part of the
[45:50] future direction. Yeah. So at some point
[45:52] the this bigger packet the pack the the
[45:54] increasing package size will stop at
[45:57] some point because of the
[45:59] >> it will. Yes. Yeah. Yeah, we we will hit
[46:02] a point of diminishing returns where the
[46:04] the cost and the manufacturing
[46:06] challenges will not be worth whatever
[46:08] energy savings remain
[46:11] >> and that is based on the uh inter intra
[46:14] or inter package the efficiency should
[46:16] be better than on package energy
[46:18] efficiency or your direct
[46:20] >> correct correct yeah provided the
[46:23] software is smart enough to to map it
[46:27] >> thank you for the great question I'll
[46:30] ask one Really quick, Sam. Um, you
[46:32] talked about AI for medical
[46:34] applications, robotics, and science. Do
[46:38] we understand the target metrics and
[46:41] what the chip should look like five
[46:43] years from now to be able to, for
[46:44] example, have fully intelligent robots
[46:48] and where you need to go? What what is
[46:50] the energy per bit you need to achieve?
[46:53] What is the data rate?
[46:55] >> Yeah. No, that's that's a great one. I
[46:57] mean the field of robotics in particular
[46:59] humanoid robotics the technology
[47:02] challenges are are pretty immense right
[47:04] I mean there's the the actuators and you
[47:07] know the mechanics of these um these
[47:10] machines and you know their ability to
[47:12] sense their environment and then respond
[47:15] and interact and and tremendous
[47:18] developments going on there and there
[47:21] there's a division of the compute I
[47:23] think what you're getting at between
[47:24] what's colloccated in the device and
[47:27] what is in the cloud. So, we can't
[47:30] possibly include in the device all the
[47:32] compute required for it to respond to
[47:36] all of the millions of sensory inputs
[47:38] it's receiving and and the history that
[47:41] it has to have about how to respond to
[47:44] those inputs. So, that's why we need
[47:46] that digital twin in the cloud in some
[47:49] gigawatt data center somewhere, right?
[47:51] that is um that has a much more uh
[47:54] comprehensive view of the robot's
[47:56] capabilities, environment, history and
[47:57] and how it should operate. So the um the
[48:01] the division you know it's going to be
[48:03] fluid the division between but there
[48:05] there are absolutely needs for low power
[48:09] um highly efficient edge computing
[48:11] capabilities in the devices flexible um
[48:14] FPGAAS are are very attractive compute
[48:17] target for uh robotics
[48:20] and you know big kilowatt GPUs not so
[48:24] much uh so so that's going to be the
[48:28] realm of the cloud aspect of compute but
[48:30] uh yeah the the developments are are
[48:33] progressing really rapidly it's an
[48:34] exciting space
[48:36] >> thank you go ahead
[48:38] >> yeah uh James Louu a professor from RPI
[48:43] um
[48:45] you know you talk about the 3D report of
[48:48] 5D integration I have been work on this
[48:52] more than 25 years and um actually this
[48:57] year marks 20 years I coined the hybrid
[49:02] bonding concept
[49:04] and uh but you know we will solve all
[49:07] the problem for 3D integration and
[49:11] reduce the cost but probably some
[49:14] problems harder. Can you mention one of
[49:17] the probably most challenging problem
[49:20] with the 3D integration that may take
[49:23] time to solve?
[49:26] Yeah, that's that's f you find fine. Uh,
[49:28] you know, we've been delivering this in
[49:30] high volume now for uh like about four
[49:34] years with the the vcash solution
[49:38] initially targeted for our desktop um
[49:40] CPUs.
[49:42] >> And that was a a real good initial step
[49:46] into 3D because it was an optional
[49:49] performance enhancement to the CPU,
[49:52] right? So we still had a great product
[49:54] without the 3D cache stacked on top, but
[49:58] we could charge more and provide higher
[50:00] performance for devices that used it.
[50:02] And so that allowed us to um gain a lot
[50:06] of experience in the manufacturing
[50:08] aspects. And some of them related to
[50:10] yield, some were, you know, device
[50:12] degradation effects because we have to
[50:14] thin the silicon significantly. Um, the
[50:17] thermals were pretty exciting because
[50:19] with the thin silicon, the the heat has
[50:23] much less opportunity to laterally
[50:25] diffuse than with a nice thick 800
[50:28] micron chunk of silicon, which is a
[50:29] great thermal conductor, but you cut
[50:31] that down to 10
[50:32] >> or less. So, thermals were definitely
[50:35] and will continue to be one of the big
[50:38] challenges there. and and we have
[50:41] technologies around fine grain thermal
[50:42] sensors and the ability to uh monitor
[50:46] and adapt to uh thermal excursions by
[50:49] reducing activity levels and such. The
[50:53] um the the other big challenge besides
[50:57] heat extraction is you know as I
[50:59] mentioned in the talk is the power
[51:01] delivery right so we have these very
[51:02] thin copper pillars our silicon devices
[51:06] love to have their own Each IP each
[51:11] wants its own voltage rail.
[51:13] >> That's right.
[51:13] >> If you force everybody to the same
[51:15] voltage, you end up with a lot of
[51:16] inefficiency. So now we want to chop up
[51:19] the power delivery into sub dozens of
[51:22] voltage domains and that means less
[51:24] copper for each one of them. Right? So
[51:26] how we generate the the power with
[51:30] minimal loss
[51:31] >> to the device with these high currents
[51:33] is is I believe one of one of the bigger
[51:36] uh challenges as well. So power delivery
[51:40] and thermal management.
[51:43] Thank you.
[51:43] >> Not not to mention yields and uh
[51:46] assembly and test and uh
[51:48] >> well you probably you guys can work out.
[51:53] I'm from university so
[51:56] >> thank you.
[51:56] >> Yeah thanks for the question then
[51:58] actually can we just uh talk about this
[52:00] cross-section for a second? What an
[52:02] amazing stackup right that we've seen.
[52:05] Um can you talk a little bit about um as
[52:08] a fabless company the strength of your
[52:11] relationship with semiconductor fundry
[52:13] >> and how how you both road map the future
[52:16] effectively.
[52:17] >> Yeah. Yeah. No, that that's a great
[52:19] story and I do believe it's you know our
[52:22] industry has been disagregating and far
[52:26] fewer vertically integrated companies
[52:28] than um in the past when I started my
[52:30] career and and so we end up with
[52:33] extremely strong technology providers in
[52:36] the packaging and silicon space test and
[52:39] assembly
[52:41] and AMD's fabulous were mostly uh design
[52:44] engineers but we have an extremely
[52:47] strong group of folks, some of them here
[52:49] today, who interface regularly with the
[52:53] um with our our semiconductor partners.
[52:57] And and as our uh manufacturing volumes
[53:01] have increased over the years, we've
[53:03] gotten a lot more attention from from
[53:05] these important suppliers and and so we
[53:08] can provide input about how they need to
[53:10] shape their next generation technology
[53:12] because they trust our voice to be um
[53:15] representative of what the industry will
[53:17] require in the future. So the um it ends
[53:21] up being a very close collaboration
[53:23] which is the only way these things work,
[53:25] right? you know, hard problems are
[53:27] solved by groups of engineers
[53:29] collaborating together, sharing data,
[53:32] bouncing ideas and solutions off. So,
[53:34] we've actually been able to develop that
[53:35] kind of a close collaborative
[53:37] relationship with these leading edge
[53:40] suppliers, you know, TSMC, ASC, Amcore,
[53:43] all the um the OSATs. It's been
[53:45] extremely productive.
[53:46] >> Yeah. Very nice to keep the ecosystem
[53:48] going.
[53:50] It's all about the ecosystem.
[53:52] >> Okay.
[53:54] from to university the you talking about
[53:57] the stacking of the HBM onto the compute
[54:00] chip. Okay. Uh the basically the uh DM
[54:05] performance is uh very strong rapidly
[54:08] improved by introducing the HPM due to
[54:10] the par data processing inside the GPM.
[54:14] Okay. But uh even use the HBM the output
[54:18] data of the HBM uh converted from the
[54:22] parallel to serial. So the eventual uh
[54:26] data and with limited this parallel uh
[54:30] to serial conversion.
[54:33] uh so the if we can uh use a parallel
[54:37] data from the HBM directory then we can
[54:42] uh improve the performance very much. So
[54:44] the uh when you stack the HBM onto the
[54:48] computer chip. Uh it is possible to uh
[54:52] use such kind of the parallel data as it
[54:56] is in the computer
[54:58] chip. that case then you need some
[55:00] special computer architecture to uh
[55:04] effective the power data from the HBM.
[55:09] >> I I think um what you're getting at is
[55:12] compute in memory and the opportunity
[55:15] there
[55:16] >> blurring the boundary between the
[55:18] compute device and the memory device.
[55:20] >> Yeah. And the 3D integration is very
[55:23] very very effective to use the par data.
[55:26] Okay. So that if you can use this
[55:29] parallel data uh directory in the
[55:32] computer chip then we can reduce the uh
[55:37] power consumption.
[55:38] >> Yeah. Yeah. No that's um absolutely
[55:42] blurring the boundary between compute
[55:44] device and memory device is an active
[55:46] area of research and and a big
[55:48] opportunity.
[55:50] And you know, memory devices aren't very
[55:52] well suited to energy efficient compute,
[55:54] but that compute can be right next to
[55:56] the data bits that feed it. And so there
[55:59] is there is opportunity processing in
[56:01] memory. The probably the biggest barrier
[56:04] to exploiting that is is software. And
[56:08] one thing I have learned the hard way
[56:10] over many years as a as a hardware
[56:12] engineer and designer is software is
[56:15] actually harder than soft than hardware.
[56:19] to get the software to change and adapt
[56:22] to a new compute paradigm is is a
[56:24] monumental task. Um but there there is
[56:28] sufficient motivation to drive these uh
[56:31] approaches to exploit more near-memory
[56:33] compute or in-memory compute. So that is
[56:36] is absolutely going to be a direction
[56:40] >> that is a good team of the STC
[56:47] system. How do you code design? That's
[56:49] very important.
[56:51] >> Yeah, code design is critical. Thank
[56:53] you.
[56:53] >> Yeah, thanks.
[56:54] >> Um would you like to answer a question
[56:57] about the stock market?
[56:59] >> We have the inevitable.
[57:01] >> Very nice question. Fantastic talk. Um
[57:05] obviously investment is critical for
[57:08] companies like AMD and and others to
[57:10] create continue to create innovate and
[57:12] exist. Um with the power of AI uh how
[57:16] can uh we understand the stock market
[57:18] predictions uh volatility is I guess
[57:23] important there. Um any comments on the
[57:27] future of the stock market?
[57:29] >> Can can it uh improve stock trading and
[57:31] and balance the stock market? Oh, we
[57:34] destroy it.
[57:35] >> Why is AMD stock price so low?
[57:40] >> Yeah, I don't think artificial
[57:42] intelligence is any better than humans
[57:44] at uh stabilizing the stock market. But
[57:47] um yeah, it's it's it's pretty
[57:50] interesting to see and that's why I
[57:51] included the the durability of the
[57:53] growth patterns in the talk because you
[57:56] know investors are trying to make sense
[57:58] of this. Is is it real? Who are the
[58:00] winners? Who are the losers?
[58:02] And no one really knows, right? Um I I
[58:06] think, you know, we know, you know, as
[58:08] as a company at AMD, it's my job to set
[58:11] the the strategy and and the investments
[58:14] that we make, those long-term bets for
[58:17] the future. And so understanding those
[58:19] trends, making the right bets, and then
[58:22] sticking with it. So stock market
[58:24] fluctuations are noise. Uh I, you know,
[58:28] we've made decisions well informed. we
[58:31] understand the trends and and you low
[58:33] pass filter all of that stuff. It'll
[58:35] work out in in the end and but it all
[58:38] does come down to to the economics and
[58:40] so it's fundamental.
[58:42] >> Thank you. The Quinn question from the
[58:44] audience.
[58:45] >> I'm Krishna Chaitri from Corvo. So you
[58:48] mentioned multiple times about the
[58:50] criticality of thermal management to
[58:53] make this technology viable and my
[58:55] question is um what level of innovation
[58:59] is required to make this happen? Does it
[59:02] does it really require for us to learn
[59:05] how to manage the thermal conductivity
[59:08] or it requires some breakthrough
[59:10] innovation to come up with some new
[59:13] materials or super cool fluid something
[59:16] like that. So your thought on that?
[59:19] >> Yeah. No, good good question. So you
[59:21] know, yeah, what what technological
[59:22] innovations as with anything just like
[59:25] you know, pushing electrical signaling
[59:28] from what seemed, you know,
[59:30] extraordinarily fast at 10 gigabits per
[59:32] second all the way up to 100, 200, 400
[59:35] gigabit per second across copper wires.
[59:38] that is an a process of continual
[59:42] improvement and evolving the technology
[59:45] learning from the uh iterations of
[59:48] development
[59:50] and and we're going to have the same
[59:51] thing with thermals now that you know
[59:53] we're way beyond I mean 20 years ago I
[59:57] was at the bleeding edge pushing the
[59:59] envelope of our CPU servers to 130 watts
[01:00:02] people thought that was ridiculous 130
[01:00:04] watts is like this
[01:00:07] cooling limit
[01:00:08] Well, now we're, you know, kilowatt plus
[01:00:11] and we're going to keep pushing that
[01:00:12] higher. I've I've seen tremendous um
[01:00:16] progress there. And some of it comes
[01:00:18] down, you know, liquid cold plates, you
[01:00:21] know, how fast do you pump the fluid
[01:00:22] through? What temperature is that inlet
[01:00:25] water or do we use refrigerants?
[01:00:28] uh local thermal sensors like I
[01:00:30] mentioned to get the device to be more
[01:00:32] resilient to thermal excursions because
[01:00:34] you know it's the downstream device that
[01:00:36] gets the worst of it right the first guy
[01:00:38] preheats the fluid and you know if you
[01:00:40] got multiple and then data center
[01:00:42] variation so so the more adaptive our
[01:00:44] silicon can be the more resilient it is
[01:00:47] to heat issues then there's the
[01:00:49] reliability of the device itself we we
[01:00:52] have to you know improve our tolerance
[01:00:54] to those things
[01:00:56] but for I you
[01:00:58] My my I think in this industry our
[01:01:00] horizon is no more than five years or
[01:01:02] we're kind of fooling ourselves, right?
[01:01:03] You know, I I see I see optical on the
[01:01:06] five-year horizon big time. I see
[01:01:08] continued 3D integration as critical.
[01:01:11] And then on on thermals, I I think we're
[01:01:13] at the point where we we continue to
[01:01:15] improve what we know. um micrfluidics
[01:01:18] have proven very very difficult to
[01:01:21] deliver a uh a significant improvement
[01:01:24] over direct liquid because of pressure
[01:01:26] drop issues and and other you know flow
[01:01:30] challenges. So I I think pushing what we
[01:01:32] know today and improving all those
[01:01:34] dimensions through cycles of learning
[01:01:36] we're we're going to we're going to
[01:01:38] solve this for the next five years.
[01:01:40] Thank you.
[01:01:42] >> On a similar topic, in terms of
[01:01:44] technology advancements,
[01:01:46] um it seems like non-silicon
[01:01:49] technologies are not on your radar at
[01:01:51] all. Do you think the door is closed to
[01:01:54] reuse semiconductor materials and
[01:01:56] transistors at this point with silicon
[01:01:58] being so far ahead?
[01:02:01] >> Yeah. Yeah. you know, graphine and uh
[01:02:04] other device car. It it seems like those
[01:02:08] keep getting pushed out. You know, the
[01:02:11] just like gallium nitride was the
[01:02:14] technology of the future for most of my
[01:02:17] career and still still is out there and
[01:02:20] and we're in integrating, you know,
[01:02:23] tremendous number of exotic
[01:02:25] materials and metals and such to extract
[01:02:29] more out of the baseline silicon device.
[01:02:32] the um you know metal oxide and you know
[01:02:35] the new interconnect materials. So but
[01:02:38] yeah there there is nothing that I have
[01:02:40] seen in my limited five-year horizon
[01:02:42] that says that silicon isn't going to be
[01:02:44] the thing for over the next 10 years.
[01:02:46] Yeah,
[01:02:47] >> it's a it's a big challenge for the for
[01:02:49] the industry u beyond packaging as well.
[01:02:52] Uh we have some really nice pointed
[01:02:53] questions, technical questions uh on the
[01:02:56] app. So u I'm going to ask a couple but
[01:02:58] please come to the stage if you'd like
[01:03:00] to ask as well. Um, one of them is, uh,
[01:03:03] GPU toGPU stacking. Um, is this
[01:03:06] something that is on AMD's horizon to
[01:03:08] increase compute density?
[01:03:10] >> Yeah, that's a good one. We have
[01:03:12] certainly looked at it and, you know,
[01:03:16] the obviously super attractive. So, you
[01:03:18] take a, you know, a GPU or compute
[01:03:20] device and stack it on top of another,
[01:03:22] um, the cost of communication drops. All
[01:03:25] the goodness of 3D, we get a higher
[01:03:27] density. Um, the the obvious problems
[01:03:31] are the ones that I just talked about,
[01:03:33] right? Power delivery and heat
[01:03:34] extraction. And you can run the math and
[01:03:39] if if you cut the power of each one of
[01:03:41] those devices in half, your power
[01:03:43] density and um didn't um didn't worsen.
[01:03:48] But in order to cut the power in half,
[01:03:50] we had to reduce the operating voltage.
[01:03:52] Right? So if your nominal voltage was 08
[01:03:53] and you drop it to 6 you you get you
[01:03:56] know substantial power reduction your
[01:03:58] current went up by you know 33%. So
[01:04:02] power delivery got worse but your heat
[01:04:04] extraction um is is largely unchanged
[01:04:07] although you have some more thermal
[01:04:09] resistance in the path. Um and and I
[01:04:12] could see someone exploiting that sort
[01:04:14] of a of a compute stack. Um, now the the
[01:04:18] barrier in my world is we've already
[01:04:22] played the voltage reduction card to get
[01:04:25] the power uh get the energy per
[01:04:27] operation as low as possible in today's
[01:04:30] devices. So um it's going to be really
[01:04:32] tough to do that again um in order to
[01:04:35] stack two devices but but it's something
[01:04:37] that you know perhaps new uh CFET
[01:04:40] technologies or gate all round would
[01:04:42] would enable in the future. So, it's
[01:04:44] worth evaluating.
[01:04:47] >> Great, great answer. Um, we've talked
[01:04:50] about power and thermal and and energy
[01:04:52] efficiency. Um, do you truly believe
[01:04:55] that the the scaling of energy
[01:04:57] efficiency will keep up with AI models
[01:04:59] or you'll always be racing against the
[01:05:02] newer and newer models? And
[01:05:04] >> yeah, that if you if you extrapolate
[01:05:07] that chart I showed with the scatter
[01:05:09] plot, um, it is it is kind of
[01:05:12] frightening. So we, you know, we set
[01:05:15] very aggressive energy efficiency
[01:05:17] improvement goals at AMD and and strive
[01:05:19] to meet those and and generally have.
[01:05:23] But I I did do some math to just look
[01:05:26] five years out on that chart and you
[01:05:29] know, we've got great efficiency
[01:05:31] improvements. We feel pretty good, but
[01:05:33] that trend is going to dramatically
[01:05:35] outstrip the efficiency gains by like
[01:05:37] orders of magnitude, right? So it's a um
[01:05:42] you know how does that balance out? Um I
[01:05:46] yeah no one knows for sure. I the the
[01:05:50] one thing that's certain is the more
[01:05:52] efficient we can make our devices you
[01:05:54] know reducing that cost of communication
[01:05:56] improving integration optical all this
[01:05:58] stuff um the the bigger the market will
[01:06:01] be because power is what's going to
[01:06:03] limit these big deployments. uh and all
[01:06:07] those all the projections are that yeah
[01:06:09] the uh our our efficiency gains will not
[01:06:12] be able to fully keep up. So we're going
[01:06:14] to be steering more of our worldwide
[01:06:17] grid power to compute which is good for
[01:06:19] us as an industry. Right. But uh I got
[01:06:23] it.
[01:06:23] >> That's right. I think we we'll stay busy
[01:06:25] for the next few years for sure.
[01:06:27] >> Yeah.
[01:06:27] >> Um let's take one more question from the
[01:06:29] audience.
[01:06:30] >> Yeah. Yeah. My name is Chin from Taiwan.
[01:06:34] Thanks for the exciting lecture about
[01:06:35] this amazing topic. My question is what
[01:06:38] from on one page you mention about the
[01:06:41] power consumption. The Andy your company
[01:06:44] do a great job for the GPU HPN power
[01:06:47] consumption reduce. What's your idea and
[01:06:50] plan for the other portion to reduce the
[01:06:53] power consumption?
[01:06:55] >> Which other portion?
[01:06:57] >> Yeah, something like the system overhead
[01:06:59] or the Yeah. scale up.
[01:07:01] >> Yeah. No, that's that's great. the and
[01:07:05] the the other portion. So the in in like
[01:07:08] the the the rack the scaleup domain has
[01:07:13] um switch switches network switches that
[01:07:17] connect the multiple GPUs together. So
[01:07:19] we can't directly connect 100 GPUs you
[01:07:22] know with 100 wires. We we need to go
[01:07:25] through a switch to get the the
[01:07:27] bandwidth of communication. So those
[01:07:29] those switches uh consume a lot of power
[01:07:32] as well. So that's definitely a target
[01:07:35] area improved switch design and that
[01:07:37] goes handinhand with interconnect
[01:07:39] technology. So making those switches
[01:07:41] optical very attractive and then there's
[01:07:44] the scale out switches which today are
[01:07:47] Ethernet. We're moving to ultra Ethernet
[01:07:50] and improving the efficiency there and
[01:07:51] power management approaches for this
[01:07:54] communication because these workloads go
[01:07:56] through phases. They go through compute
[01:07:58] phases for tens of milliseconds to
[01:08:01] seconds and then they exchange all the
[01:08:04] weights and activations to go to the
[01:08:06] next phase of compute. And so if we have
[01:08:08] power management perhaps software
[01:08:10] informed that can recognize those
[01:08:12] phases, power off the switches during
[01:08:14] compute phases and and then reduce power
[01:08:17] and compute while switching. Then we end
[01:08:19] up with an average power reduction with
[01:08:22] no loss of performance. So that's that's
[01:08:25] one example of approaches balancing
[01:08:27] communication and compute. And then
[01:08:29] there's the power delivery overhead.
[01:08:32] It's like 20 plus percent. There's the
[01:08:34] cooling overhead where fans are super
[01:08:37] wasteful. We want to go to direct
[01:08:38] liquid. And there's and then you the the
[01:08:41] networking like I discussed. So those
[01:08:43] those are the big focus areas for us.
[01:08:45] >> Thanks.
[01:08:46] >> Yeah. Thank you. Yeah. I mean it's
[01:08:48] really fascinating, right? We talk about
[01:08:50] optics, we talk about thermal
[01:08:52] mechanicals, power, thermal management,
[01:08:55] electrical interconnects, truly
[01:08:57] multi-disiplinary, which is really a a
[01:08:59] testament to to this audience here. Um,
[01:09:03] as we as we wrap up, I'm I'm curious how
[01:09:05] you you like to describe AI, AI
[01:09:08] innovation and the future of AI to um um
[01:09:13] you know, the general public. So not
[01:09:15] people who have advanced degrees but
[01:09:18] 99.9% of of the folks out there and um
[01:09:23] yeah how do you make them gain
[01:09:24] confidence in in models and in the
[01:09:27] answers and all of that. Can you comment
[01:09:28] on that?
[01:09:29] >> Yeah that's that's a great question. I
[01:09:30] do do get asked that a lot. Um, and it's
[01:09:33] not, you know, I view AI as just another
[01:09:36] tool like, you know, computers back in
[01:09:39] the 60s or, you know, the internet and
[01:09:42] the 90s, 80s and 90s and and, you know,
[01:09:47] AI is yet a more powerful way to extract
[01:09:51] information and to share information
[01:09:52] that humans have generated. So, it can
[01:09:55] be used for good or for ill is the way I
[01:09:58] I describe it. um the potential for good
[01:10:01] is huge just like I think we'd all say
[01:10:03] the internet has has produced great
[01:10:05] value for humanity and that knowledge
[01:10:07] sharing. So it it's it's a powerful tool
[01:10:11] but we need to be careful with it and
[01:10:12] and I'm not going into the the ethics
[01:10:14] and security aspects of AI but they are
[01:10:16] real and and significant. So yeah it it
[01:10:20] it's like the yin and yang right of
[01:10:22] technology development but uh overall I
[01:10:25] view it as a as a force for good. Well,
[01:10:28] thank you so much. Uh we are going to
[01:10:30] wrap up now. Thank you Sam for an
[01:10:32] excellent talk and a great Q&A.
[01:10:38] Thank you everybody. We will have the
[01:10:40] technical session starting in about uh
[01:10:42] 15 minutes.
