# Optica Executive Forum 2026: Complete Keynote Richard Ho

https://www.youtube.com/watch?v=GAE4AdwZQEA

[00:00] Highlights from the Optica Executive Forum at OFC.
[00:02] Thanks to these sponsors.
[00:20] We're releasing seven sessions from Optica's Executive Forum because the world is watching.
[00:27] Feedback encouraged.
[00:27] Now, enjoy.
[00:31] Thank you very much, Elizabeth.
[00:33] I hope you are happy and satisfied with having 600 executives in front of us today.
[00:38] It is time to start.
[00:39] And there is no better way to start than to bring to the stage the first member of the Optica Executive Forum Planning Committee for him to bring for him to bring the keynote.
[00:50] Mark Feiler is a person that despite his age is an industry veteran.
[00:53] He worked for most of the GAFA companies and now he's been in photonics to Oracle.
[00:58] Mark, thank
[01:01] you very much for being with us today.
[01:06] Good morning, everyone.
[01:06] Glad to be here today.
[01:08] I've had the pleasure of serving on the planning committee for the last few years here at the Executive Forum.
[01:15] And as we were kicking around ideas for keynote, I mentioned the possibility of Richard Ho from OpenAI.
[01:22] And the committee was extremely excited about that.
[01:23] So, um uh I was pretty delighted when I reached out and and he gladly accepted.
[01:27] So, um I won't belabor it much longer.
[01:31] We have an expertly produced intro video for him that's coming up and then he'll join the stage.
[01:37] So, thanks, Richard.
[01:40] Richard Ho leads hardware at OpenAI, where he works at the intersection of AI models and the compute systems required to run them at scale.
[01:50] His focus is not just building bigger infrastructure, though, but co-designing hardware, memory, interconnect, and system architecture around the demands
[02:01] of frontier AI workloads.
[02:05] He helped build Google's TPU program across multiple generations,
[02:11] contributed to the Anton supercomputers at D.E. Shaw Research,
[02:16] and earlier co-founded Zero In Design Automation, a pioneer in chip verification tools, later acquired by Mentor Graphics, now Siemens.
[02:28] Rajid brings a rare combination of AI infrastructure, high-performance computing, and chip design expertise at a moment when computer architecture has become central to being competitive.
[02:43] Please welcome Rajid Alahab.
[02:51] I want to thank the academy for wrong alley show.
[02:54] I think wrong alley show.
[02:55] Okay.
[02:58] Let's go to the real one.
[02:58] First of all, thank you to Mark.
[03:00] Thank you to Elizabeth, to the Optica
[03:03] organizing committee, and thank you all for being here.
[03:06] I know you had a choice this morning.
[03:07] You could have been up a few hundred miles north listening to Jensen Huang, but you're here.
[03:11] So, thank you.
[03:14] Um and I want to spend uh the next 30 minutes or so uh telling you a little bit about how we at OpenAI think about AI hardware, where it's going, and why, as Elizabeth said, opticals, optics, optical communication, um general uh you know, photonics uh technology is going to be critical for this infrastructure moving forward.
[03:36] It's interesting that we, I think, live in this very like privileged time right now, where there is basically an exponential increase in the capabilities around us.
[03:46] Things are changing, I think, faster than the human mind can actually comprehend, right?
[03:53] ChatGPT was introduced just 3 years ago, and we're seeing a huge amount of change already.
[03:58] Uh just in 2025 alone, OpenAI introduced a number of models
[04:04] going from a better-based foundational model with greater foundational intelligence,
[04:09] going through longer context, coding capabilities, into reasoning, where we had long chains of thoughts, long inference paths, where the model can actually think of different solutions and pick among them, and come up with a better solution overall.
[04:26] We even moved into agents, right, which is the hot buzzword of today.
[04:30] Um but we moved into agents, where you have things that are able to use other tools to get the job done.
[04:34] So, the agents are able to call out, you know, through uh you know, executing on the computer program, executing through the web, and doing other things in order to get a workflow accomplished.
[04:48] And then towards the end of the year and into this year, Codex was introduced, which is the latest coding agent.
[04:55] And I don't know if you've used Codex, but if you haven't and your organization hasn't, I highly recommend you do.
[05:02] This is a competitive advantage, and it is
[05:04] something that is altering the entire engineering ecosystem, starting in software, but even in my team, uh we're seeing it affect hardware as well at this point, right?
[05:16] So, the capabilities are growing and they're getting a lot better there.
[05:19] The key message here is that AI growth is not just about larger models.
[05:24] For the last 2 years, we've been talking about scaling, how, you know, larger models, trillions of parameters, you know, a lot more compute.
[05:33] You've seen the build-out in data centers in the news a lot.
[05:35] But really, what's going on now is that we're getting to this capability stage, where the agents need longer context, they need to be able to reason a lot more, and you need to be agentic.
[05:45] And there's a lot that that implies for the infrastructure, which is what we'll go through today.
[05:50] So, what is the net effect of this?
[05:52] Well, even in 2025, we're already seeing the users growth being very, very large, right?
[05:59] There are now over 900 million daily weekly users of the chatbot alone.
[06:04] What we're seeing in terms of enterprise is a huge growth, right?
[06:06] So, eightfold increase in the number of enterprise level messages going back and forth to these models.
[06:15] And if you look at the number being used by these enterprises, the number of reasoning tokens, as I said, you know, things like the agents and the uh reinforcement learning based rollout of chains of thoughts, has grown over 320x.
[06:28] That's enormous, cuz we start from a pretty big base and we've grown that like to some really, really large uh amount here.
[06:37] So, it is, I think, a a real challenge for us in the infrastructure business, that includes all of you, to be able to grow and meet this incredible scale that we're seeing, which, as I said earlier, is exponential, which means this is only going to keep going up, right?
[06:50] So, what are we going to do about it?
[06:54] Let me talk a little bit in detail about what this actually means in terms of the infrastructure.
[06:59] So, we start off with longer context.
[07:02] And what longer context means, right?
[07:03] Context is the thing that the agents
[07:05] It's used. It's kind of memory. It's your prompt.
[07:08] It's everything that's remembered and is relevant to the question that's being asked to it.
[07:13] With longer context heading to 128k, 512k, a million tokens and more at this stage, it means that your KB cache, which is the key value cache that they use to kind of like do most of the analysis, is growing.
[07:26] And what this has the effect of is increasing the amount of memory that you need.
[07:31] With reasoning, there's a lot more uh what they call runtime uh thinking.
[07:37] So, for every thought that it's thinking, it's basically doing a bunch of inference.
[07:42] And so, that bunch of inference there is generating inference times token um and it's causing a lot more work.
[07:47] It's causing a lot more flops operations required per prompt that it's given.
[07:53] With tool use, we're getting a lot more in terms of round trips back and forth from the user, going out to a tool, coming back, getting some kind of reply, getting a adjustment to, you know, what it's actually thinking about, and then
[08:05] doing additional work there, which causes stress on the network,
[08:09] on all of the storage and all of the orchestration that traffic causes.
[08:13] And finally, with agents and with the kind of like coding, uh you know, capabilities we're seeing, what we're seeing are sessions now.
[08:22] We're seeing users working on a problem multiple iterations through, working on like an entire coding, uh you know, problem there.
[08:29] And what this has the effect on the infrastructure is that it is causing our clusters to become large, requiring a lot of parallelism, and it's causing a lot of reliability concerns, right?
[08:42] So I think one of the main main things is that as these clusters are growing larger, they go down a lot cuz the mean time between failures for any one component is small, but when you multiply it by the number of components in that system, it becomes quite noticeable and it's quite uh uh you know, quite worrying actually, right?
[08:58] So these are some of the infrastructure issues and concerns that we need to address in order to get useful intelligence over to the user.
[09:07] So scaling, right?
[09:10] So in the past, when we talked about scaling, there were these things called scaling laws.
[09:14] Um so for those of you who are familiar, right?
[09:16] back in I think 2017-ish or so, uh OpenAI uh published a paper that basically showed that the amount of intelligence that a model has scales proportionally with the amount of compute you give it.
[09:29] And the more compute you give it, the more intelligent that you you know, capability you get.
[09:34] Um you know, this graph is from Epoch AI.
[09:37] They basically in charting this and it basically shows that the amount of training compute that's been used has been growing roughly 5x a year.
[09:45] And if you think about it, 5x a year is much faster than Moore's Law, right?
[09:50] So, it's a lot and we're having to deal with this gap between where Moore's Law is giving us compute scaling with what we need for training.
[09:59] But, the message I'm coming here today is that it's beyond training now.
[10:04] As I've been indicating, training is still large, it's still 5x,
[10:08] but we need a lot more inference compute and we need a lot more runtime compute.
[10:12] And that actually causes an even greater stress throughout the entire infrastructure.
[10:18] So, what are the four bottlenecks that actually determine how much we can scale?
[10:24] So, the first and most obvious one that a lot of people talk about is really the flops or the operations and I've called it numerics here because I think a lot of it is really based on the numerics.
[10:34] So, we've gone from supercomputers that were kind of like float 64 and we've kind of gone down to smaller and smaller data formats because ML and AI could and so it's good.
[10:47] We're down to FP8, you know, eight bits per per number and even down to FP4, which means that the amount of compute you can get per operation just blows up, right?
[10:57] Increases by that factor there.
[10:59] Which is really good, but that's not the only thing that matters and I think that if you look at some of the marketing from like GPU makers and and other people, they do focus a lot on that number
[11:09] because it's good for marketing, but actually what we actually see when we run these systems is that we don't get that number, right?
[11:16] So, they publish a peak flops number, but we get much, much less than that in reality when we run the models.
[11:22] Now, why is that?
[11:25] The reason is that the system is not fully efficient and can't utilize all of those flops available.
[11:31] It's because of these four these three other things here.
[11:34] We have the memory, all right?
[11:35] So, memory bandwidth is important.
[11:38] HBM is critical.
[11:40] And if you've been watching kind of like market memory prices, you'll know that it's becoming even more critical as everybody's demanding more and more HBM.
[11:47] Um we're currently about to introduce HBM 4 into the systems.
[11:49] HBM 4E is on the horizon.
[11:53] HBM 5 is being defined.
[11:55] I think that will continue growing, all right?
[11:57] And we'll get tens of terabytes per second of memory bandwidth coming out of that, which is good.
[12:00] Um the energy usage there remains a first order limit, meaning these memories are not you know, they're kind of power hungry.
[12:08] They're not really like that efficient, and we
[12:10] got to work on that.
[12:12] It's for a different conference.
[12:13] We'll do that elsewhere.
[12:16] Um power and cooling is super important,
[12:18] right?
[12:19] So, these large computers in the data centers are taking megawatts and up to gigawatts.
[12:21] And so, we got to work on that.
[12:23] But what I'm here to talk about today is the other part of the system efficiency, it's the network, right?
[12:28] Both our scale-up and our scale-out, and now our scale-across all matter.
[12:34] And every little bit of picojoule per bit savings that you can get for me and for us is going to matter, all right?
[12:41] So, those are the things that really, you know, and you need to really work on all of these.
[12:44] You can't do one alone, which I think what the industry has focused on a lot in the past.
[12:49] You really have to attack all of these.
[12:52] So, why is this happening?
[12:55] So, back to kind of logic,
[12:57] IRDS International Roadmap for Device Scaling.
[13:00] So, what this graph shows you basically is four things.
[13:03] One, Moore's Law not dead,
[13:07] but slows down has slowed down a lot.
[13:10] We've gone to basically planar
[13:12] structures to 3D structures.
[13:15] Um and now we're starting to go towards heterogeneous integration of structures, right?
[13:18] So, uh we're being co-packaging chiplets, co-packaging IO chiplets and stuff like that.
[13:23] And so, what it means is that the amount of focus on what you get on a single chip or chiplet matters less and it's about the integration of the entire system.
[13:35] You can see that basically energy scaling has stopped.
[13:36] Uh it basically flattens out.
[13:38] Um and there's not much to be had there.
[13:41] So, you know, you need to get to uh systems and you then you need to get to the interconnects and the overall uh logic in order to to continue scaling these systems here.
[13:54] In essence, right?
[13:54] AI infrastructure is turning into a very very large distributed computer, right?
[13:59] So, for I'm sure if most of the people here remember kind of like the supercomputer top 500 lists used to be published, you know, these uh uh national labs and and research centers used to publish these.
[14:10] Well, today's computers, I think, extend
[14:15] far beyond that in terms of scope.
[14:18] They, you know, if you actually compare these in terms of the amount of compute they do, I think these would actually put most of the top 50, top 10, maybe in the top computer to shame for the amount of compute that our AI systems are actually producing.
[14:29] And so, what do we have here, right?
[14:31] So, you know, it's it's a distributed computer.
[14:32] We have scale-up that basically does the fine-grain synchronization within a small pod.
[14:40] Small can be relative, right?
[14:41] It can be actually pretty large.
[14:43] We have the scale-out that basically pulls these together within a a data hall or a small campus.
[14:47] And then we need to scale across across data campuses.
[14:49] All of these start talking about, you know, communication reach and uh and bandwidth and power.
[14:59] So, the real change I think that we're seeing is this uh uh movement towards long-reach optics, right?
[15:08] Long-reach uh in the past has been safer on the optics side.
[15:12] On the short-reach side, uh it's been more efficient and remains
[15:17] useful to use copper.
[15:19] We're starting to see some of that change.
[15:20] We'll talk about it in the next couple of slides, but you know, when we build our systems right now, most of our short reach rack uh communication is through copper, where it's just simple, cheaper, very reliable.
[15:32] And the real question is, you know, once you try to get beyond the rack, what do you do, right?
[15:35] And I think pluggables, um modules, eventually CPO, might come into that, and we can talk a little bit about where and why and how that can happen, right?
[15:45] Okay.
[15:46] So, here's just a the way I see it, the way we kind of see it over over in AI.
[15:51] Um you know, it's the reach ladder here, uh basically talking about where within the data center, within the rack, within the data center, within the campuses, things make sense.
[15:59] Um we're seeing, you know, the areas where the optics come in, we're seeing, um you know, the the the copper.
[16:05] Uh and we would like to see copper keep going, actually, to be honest, right?
[16:08] So, I think there's a lot to be said about uh keeping it simple, keeping it cheap, keeping it reliable.
[16:16] So, copper will continue.
[16:18] Um we are currently, I think, in that 224 gig per uh lane transition right now.
[16:25] Um it is, I think, going up pretty smoothly.
[16:28] We want to see the 448 gig chip path continue.
[16:32] There is challenges there, there are challenges, but I think the technology is going to reach there, right?
[16:37] We're looking at at things um such as a CPC, right?
[16:40] Co-packaged copper, and other things there.
[16:44] The reach is shorter on the 448, but it's still okay, right?
[16:47] Our main thing is that we want to keep the rack passively uh passive cabling.
[16:52] Uh we don't want to have re-timers in the rack if we can avoid it because of the energy cost there.
[16:58] And so, we want to keep everything like direct.
[16:59] And so, at 448 with co-packaged copper, you can probably still keep it within the rack if you design your rack well.
[17:07] Signal integrity is very important.
[17:10] So, I would say five or six years ago when I was doing TPU work, right?
[17:13] We were doing chip design with in essence just the computer architects and just
[17:20] the RTL people and the physical design people.
[17:22] These days when we do our chip design, we have the signal integrity people, we have the coms people, we have optics people involved and that's really important here because signal integrity is is like a key indicator of whether that system is going to operate and whether that system is going to be reliable and stays up during during operation.
[17:41] So here are all the tradeoffs.
[17:43] Um I won't go into all the details here.
[17:45] But the main message is that we hope that we can drive copper as far as it can drive because it is still cheaper and simpler.
[17:55] But optics is coming.
[17:55] Optics is almost here in fact, right?
[17:58] And it enables the scale that we need to get to those large computers I was talking about earlier, right?
[18:04] There's pluggables, there's NPO, near package optics, and co-packaged optics.
[18:10] They all have their little strengths and weaknesses, there's pain points.
[18:15] And where are we now, right?
[18:17] So I think today we're definitely using the pluggables.
[18:18] I think near package optics
[18:21] is right around the horizon, right?
[18:24] So all of our system designers kind of evaluating in that area now and we're looking very hard at CPO and how that can integrate into our systems.
[18:33] The key message here is that it's not a one or zero,
[18:36] right?
[18:37] Is that there is a a range of adoption of bringing the optics closer and closer into the compute.
[18:42] And I think systems developers, systems designers are doing it in that incremental phase, right?
[18:49] So we will do it to this level that we think is you know, appropriate that gives us the the right cost benefits, gives us the right system capability benefits, and the right kind of reliability benefits that we can we can, you know, keep up.
[19:04] The co-packaged optics definitely on the way.
[19:08] And so that's the I think the focus of a of our work and a lot of our attention right now, hence our presence here at Optica and OFC.
[19:13] So, here's the way, you know, when we think about CPO, about optics, we're really doing that two trade-off.
[19:20] And I kind of have been kind
[19:22] of stressing this uh in our last few slides, but let me let me really like pound this in.
[19:27] For us, it is a question about the integration of the optics with the silicon and the operational part of it, right?
[19:34] So, you know, the the key thing for us is keeping things operational in the data center.
[19:39] So, we want to make sure that things are replaceable, that the MTTR, the mean time to repair, is very short, and that the reliability level is super high.
[19:51] The way we're seeing it today is that the pluggables are easy to operate, but the bandwidth density is not as high as we would like it.
[20:00] We would like to get to NPO and CPO where the bandwidth density is much higher, but today we see the operational serviceability of that as being, you know, a little bit hard.
[20:10] And that's an area that we want to see addressed.
[20:11] We want to basically bring the CPO and NPO uh graphs over to the left side here and and, you know, make it easier to operate.
[20:20] That's the main thing.
[20:22] Right now, the way we see it is that the CPO becomes
[20:24] compelling when the front panel density
[20:26] gets too high, which it is getting there
[20:29] right now, and when the electrical reach
[20:31] stops, you know, penciling out. So, we
[20:33] can't design with that uh electrical
[20:36] reach anymore. Sorry, I said optical, I
[20:37] meant electrical.
[20:39] So, the winning architecture for these
[20:41] system designs is going to be that
[20:42] trade-off, right? Between
[20:44] getting the capability you need and
[20:46] making sure that when it's put into the
[20:48] data centers that they actually like
[20:50] stay up, are field replaceable, and we
[20:53] can keep the intelligence rolling
[20:54] through through the the models.
[20:58] So, um
[21:00] what are the problems, right? So, we
[21:02] know the benefits. CPO reaches our
[21:03] reach, density, and power.
[21:05] What are the things that we're worried
[21:06] about? So, let's go through some of
[21:08] them, right?
[21:09] One of the main things is what is a
[21:11] laser strategy for the CPO? Cuz that's
[21:14] the the main light source. So, is it
[21:16] integrated? Is it external? I think
[21:18] there's a lot of problems with it being
[21:20] integrated. It's easier on design, for
[21:22] sure, but the thermals are very
[21:24] challenging. The failure rate, we think,
[21:26] is pretty high. Um on the external side,
[21:29] that's good, you know, your mean time to
[21:31] replacement is pretty fast,
[21:32] um but we think it uses a lot more power
[21:34] and it's harder on the system design.
[21:37] Yield and test. You know, we want to
[21:39] basically be able to yield these
[21:40] devices. So, it has to be pretty high.
[21:43] We want to be able to get it in there
[21:44] with a pre-reliable, uh you know, final
[21:47] test and being able to get it
[21:48] operational.
[21:49] And then the serviceability, you know,
[21:51] can you hot swap it? Like, CPO could be
[21:53] pretty hard to swap, right? If the if
[21:55] your CPO chiplet fails, are you going to
[21:57] lose the entire TPU with all of that
[21:59] very expensive HBM memory on it? That
[22:01] could be a bit of a problem, right? So,
[22:02] how do we deal with that?
[22:04] Um we want to be able to isolate faults.
[22:06] We want to be able to figure out if
[22:08] there's a better serviceability story
[22:10] there and I think that's a real
[22:11] challenge for for this industry to
[22:13] solve, actually.
[22:15] Thermals are important, as I said. You
[22:17] know, we spend a lot of time now
[22:18] designing thermals both for the chip and
[22:20] for the system. And of course, once you
[22:22] have an optical engine in there, the
[22:24] thermals have to be much more tightly
[22:25] controlled and so that becomes an issue.
[22:27] So, does that mean that we can
[22:29] co-package or it means that we have to
[22:30] co-package and manage out the thermal
[22:32] somehow? It is a pretty hard challenge
[22:34] and so, you know, I'm not sure that's a
[22:36] fully solved problem at this point.
[22:38] And then, when we're manufacturing our
[22:40] trays, it turns out that, you know, when
[22:42] you're trying to stack a lot of these
[22:44] servers, right? These XPU or GPU
[22:46] servers, along with the network
[22:48] switches, along with the CPU hosts,
[22:51] you want to squeeze it in. It's about a
[22:52] 44 OU kind of height there, roughly. You
[22:56] can go a little higher, but not too much
[22:57] higher, because the data centers have
[22:59] some amount of limit in in of the
[23:01] elevators and the door heights and OSHA
[23:04] kind of like uh uh you know
[23:06] restrictions on what you can and can't
[23:08] have people move. And so you want to get
[23:10] as much into that single rack as you
[23:11] can. So you want to try to fit in the
[23:13] slimmest like rack design you can for
[23:16] your XPU. And if you have optics on it,
[23:18] well, how do you route that, right? How
[23:20] do you make it routable so that it is
[23:21] compact but it's manufacturable quickly
[23:24] and is repairable. So those are the kind
[23:25] of issues that we see as being like
[23:27] challenges
[23:28] to making that happen.
[23:29] And then of course, you know, just
[23:30] generally I think that you know, it's
[23:32] great to see this conference, to see all
[23:33] these people here. The ecosystem is
[23:35] maturing. I definitely see that. There's
[23:37] a lot of startups, a lot of activity,
[23:40] which is all good. But we would like to
[23:42] see some of this kind of like mature
[23:43] out, have second sourcing on stuff and
[23:46] being able to rely on it. And and we'll
[23:48] talk a little bit about that in in a few
[23:49] slides as well.
[23:52] So, scale up, scale out. The way we see
[23:54] it, you know, there's two basically
[23:56] fundamental paths to this, two
[23:58] fundamental approaches. You can go fast
[24:00] and narrow, which I think you know, the
[24:01] majority of the you know, discussions
[24:03] have been about. Or you can go slower
[24:04] and wider. You can even slower and even
[24:05] wider, right? We're looking at all of
[24:07] these.
[24:08] Obviously with the serdes and the kind
[24:10] of standard, you know, fast and narrow,
[24:13] you know, it's it's pretty pretty it's
[24:16] moving along nicely.
[24:18] We're waiting to see some of the slower
[24:19] and wider technologies kind of like come
[24:21] on. We see it as like you know, your
[24:23] fast and narrow, we think it is about
[24:26] five picajoules per bit. Got a slide
[24:27] there to show why
[24:29] coming up. And then you know, we think
[24:31] that if we go slower and wider, we think
[24:32] you can get better
[24:33] energy efficiency. And energy
[24:35] efficiency, as I was saying earlier, is
[24:37] the most important thing. You have to
[24:38] scale this thing. It takes a lot of
[24:40] power and you want to try to maximize
[24:42] your usage per per joule and per
[24:45] picajoule of energy that you're given
[24:46] there.
[24:49] This is why we think
[24:51] you know, fast and narrow will hit a
[24:53] energy wall, hit a power wall there. So
[24:56] what we're seeing here is kind of the
[24:57] module consumption um for silicon
[25:00] photonics uh
[25:01] uh uh
[25:02] uh Mach-Zehnder based um for FRO, LRO,
[25:06] and LPO.
[25:07] We think that it's likely to hit a wall
[25:09] around 5 pJ/bit at 400G um Fast NRZ
[25:12] solution. And so, you know, it'd be
[25:15] great if we could actually break through
[25:16] that wall cuz we want to get well under
[25:18] 5 pJ/bit end to end.
[25:21] So, where do we see that going? Well,
[25:24] considerations, right? So, we think
[25:26] and uh you know, we can talk about this
[25:28] cuz this is this is what this forum is
[25:30] for, really. We think we want to get
[25:31] under 2 pJ/bit including the IO.
[25:35] We think we need to have a super
[25:37] reliable system, very simple system
[25:39] design, so it can reduce all our uh
[25:41] hardware failure
[25:43] says yeah, corners yeah, concerns. We
[25:46] want to be very high volume, so that's
[25:47] the other thing that I, you know,
[25:48] whenever people come and talk to me
[25:50] about their solutions, we do talk about
[25:52] what is your volume path, right? How do
[25:54] you get there? What does your
[25:55] manufacturing look like? You know, where
[25:57] are the weak points in that in order to
[25:59] get to very high volume? And we'd like
[26:01] to be able to see a a pilot of some sort
[26:03] in order to verify this in a production
[26:04] environment so we can measure the fit
[26:06] rate, we can measure the MTTR, and see
[26:09] what you're actually going to be able to
[26:10] get.
[26:12] The serviceability, again, super
[26:13] important to us, all right? We want to
[26:15] be able to um you know, look at plug out
[26:18] things, take stuff away. And then time
[26:20] to market, all right? So,
[26:22] uh where possible, we'd like to see
[26:24] things reused. Um
[26:27] in this era right now, one of the things
[26:29] I do work on in
[26:31] in another part of our kind of
[26:32] organization is making the silicon
[26:35] design cycle shorter
[26:38] because the ML cycles are so short. As I
[26:41] showed you right at the beginning,
[26:43] right? Even within 2025, the ML models
[26:46] moved so fast and so far that the
[26:48] hardware has to keep up. And the
[26:50] hardware is not just the chip design,
[26:51] it's also the system design. And in
[26:53] fact, today I would say the system
[26:55] design is actually comparable in terms
[26:57] of complexity and in terms of time
[26:59] needed as the chip design, which, you
[27:01] know, is that the way like 15 years ago
[27:03] I would have said no, a system design
[27:05] should have been a lot easier. Um today
[27:06] it is as complicated as a chip design
[27:08] and it's taking as much effort and as
[27:10] much uh you know, of the schedule as as
[27:13] chip design.
[27:15] So, time to market is important, being
[27:16] able to keep things um going and being
[27:19] able to build on previous generations.
[27:21] So, there's a nice road map. We look for
[27:23] road maps. So, when when people present
[27:25] to us uh their solutions, we we always
[27:27] ask them, "What's the road map? Where is
[27:29] the scaling? What does it look like in 5
[27:31] years, 10 years, right?" Um that's a
[27:33] really important thing to us because
[27:34] we're not trying to design to a single
[27:36] point, we're designing for a scaling
[27:38] trend.
[27:39] And then finally, this is the one uh
[27:41] the bandwidth density. Our target, and
[27:44] this is almost a challenge to you all,
[27:46] is, you know, what is the scaling
[27:48] bandwidth we want? We want to see a 2x
[27:51] improvement in the bandwidth per mm² uh
[27:55] per millimeter of shoreline of of the
[27:57] XPU every couple of years, 2 to 3 years
[28:00] if we can. You know, I don't know we
[28:02] don't think we're there yet, but that's
[28:04] what we would like to see because that's
[28:05] what's going to take to keep up with the
[28:08] compute and the memory growth that we're
[28:10] going to see coming on in the in the the
[28:12] model growth. That's what we would like
[28:14] to see, that's what we're asking for.
[28:19] How to pulling this together, right? So,
[28:21] um
[28:22] Moore's law slowing down,
[28:25] but still going,
[28:27] and there is continued improvement
[28:30] through the, you know, chiplets and
[28:32] through pulling in
[28:33] this, you know, basically disaggregating
[28:35] some of the some of the compute. Uh I'm
[28:37] going to preempt and and think
[28:40] and say I think Jensen up in Santa Clara
[28:43] and San Jose is going to be talking
[28:45] about this aggregated compute today.
[28:47] Um you know, I think he's going to be
[28:48] talking about how you can do the decode
[28:50] and pre-fill with two different bits of
[28:52] of a silicon.
[28:53] I think that's the right path, by the
[28:55] way. Um, and that's going to happen.
[28:56] We're going to see a very heterogeneous
[28:58] data center. We're going to see
[28:59] disaggregated compute in many ways. So,
[29:02] digital scaling probably going to be
[29:04] able to continue. We think we can pull
[29:06] the digital energy down by about 2x
[29:08] still.
[29:09] Um, and we're going to go to lower VDDs,
[29:11] lower uh, kind of like threshold
[29:12] voltages, which will get the energy down
[29:14] as well.
[29:16] So, what does that mean on the
[29:17] networking side? So, on the networking
[29:19] side, you know, we want to be able to
[29:21] see continuous scaling of the the of the
[29:24] networking to keep up with that. And
[29:26] again, the bandwidth density becomes the
[29:28] bottleneck if we don't do something
[29:29] about it. It's going to be the most
[29:31] important thing, I think, that uh, that
[29:32] the challenges us. And then, while
[29:34] that's happening, we have to scale the
[29:36] power way down, way down in order to
[29:39] keep up with this. So, the networking is
[29:41] becoming as important, if not more
[29:43] important, than the digital part, in my
[29:45] opinion.
[29:47] Side note here. So, this is a bit of a
[29:50] uh, a last-minute add. Uh, last week,
[29:53] you know, there was a uh, an MSA, which
[29:56] is a multi-supply agreement, was signed.
[29:59] Open AI was part of that foundational
[30:01] part. Um, why did we do this? This is
[30:03] basically uh,
[30:05] a standard for the physical layer, the
[30:07] kind of optical communication,
[30:09] addressing mostly scale up, but it can
[30:11] also apply to scale out as well. Uh, for
[30:13] rack and multi you know, multi-rack,
[30:15] multi-row. And this is a a joint um,
[30:19] MSA between the hyperscalers and some
[30:22] suppliers. So, we're opening this up,
[30:23] basically. We're just going to say,
[30:24] "Here's what we think we need." You
[30:26] know, there's a few people who signed up
[30:27] to to support it. But, basically, we
[30:29] need to interplay. We need to interplay
[30:30] between our CPO solutions that are in
[30:32] our chip with switch solutions, with the
[30:34] rest of the infrastructure. And so, we
[30:36] think that, you know, being able to
[30:37] standardize it really uh, gives a way to
[30:39] be able to uh, to do that.
[30:41] Um, you know, there are some specs in
[30:43] there.
[30:44] Uh, Binbin, my colleague, who uh
[30:47] is going to be here later today and at
[30:49] OFC was part of the structuring this.
[30:52] So, if you have technical questions
[30:54] about, you know, what were the decisions
[30:55] that were being made here and why, you
[30:57] can definitely ask him. Find me and I'll
[30:59] connect you with him. Um, there's some
[31:01] reasons why, you know, we went with this
[31:03] path here.
[31:04] Um, but basically, you know, we think
[31:06] that standardization, cooperation,
[31:08] co-interoperability
[31:10] uh is really important moving forward
[31:11] and so this is our effort for it.
[31:14] Okay, so what's my takeaways here? So,
[31:17] three big things.
[31:19] Interconnects is central to AI design,
[31:22] not just system, but just AI
[31:23] infrastructure in general.
[31:25] Um, and the AI progress that we're
[31:27] seeing today is so fast and so
[31:30] exponential that the infrastructure
[31:33] intensity is going up very, very quickly
[31:36] and it's just it's not just about model
[31:38] size, right? It's about capability, it's
[31:39] about agents, it's about um inference,
[31:42] chains of thought, things like that,
[31:43] right? So, we have to keep up.
[31:46] Importantly, the reach, the energy per
[31:49] bit, serviceability will decide where
[31:52] copper ends and where optics take over.
[31:56] In ideal world, the optics takes over
[31:58] really fast, really early because the
[32:00] energy savings are high and so we want
[32:02] to get right in with the copper uh with
[32:05] the silicon.
[32:07] If we can make that happen, again, all
[32:08] the concerns I talked about today need
[32:10] to be addressed in order to make that
[32:11] happen.
[32:12] And and third one, which is really the
[32:14] challenge,
[32:15] we do need to see this bandwidth density
[32:17] increasing in order to keep up with the
[32:18] scaling of of the rest of the compute
[32:20] structure. So, you know, that's what we
[32:22] would like to see.
[32:23] The end
[32:24] uh message here is that it's not a
[32:27] question of if optics matters to AI,
[32:31] it's a question of where and how fast it
[32:34] moves into the critical path. It will be
[32:36] in the critical path. It's just a
[32:38] question of when and where, right? I
[32:40] would like to see it sooner. I would
[32:42] like to see it closer.
[32:43] But, you know, have all these challenges
[32:46] in order to to make that happen. And I
[32:48] look to this group to be able to lead
[32:50] our our path towards that.
[32:53] With that, my time is up. So, I want to
[32:54] say thank you very much for your
[32:55] attention. I'll be around.
[33:13] Everybody talking about scaling AI,
[33:19] but the data center's choking deep
[33:21] within.
[33:25] Copper running hot.
[33:28] Yeah, the signal's getting thin.
[33:31] So, we flip the switch now.
[33:35] Optics is in.
[33:37] Bandwidth climbing fast.
[33:40] Racks are running red.
[33:43] Cloud demand exploding overhead.
[33:49] Pluggables fading as the limit's
[33:53] closing.
[33:56] Co-packaged light is how we win.
[34:01] It's photonics,
[34:03] baby, 2026.
[34:07] Riding that light wave,
[34:10] doing new tricks.
[34:13] From the fiber in the ground to the chip
[34:15] in my hand, we make that sunshine jump
[34:18] on command. Yeah, photonics, baby.
[34:24] 2020 season.
[34:40] >> Hey.
