# @HPCpodcast-104: Silicon Photonics, w Keren Bergman (2)

https://www.youtube.com/watch?v=hLjeY2_wuiY

[00:01] For over two decades, we've partnered with the world's leading processor makers to solve the toughest thermal challenges.
[00:07] From the largest AI clusters to the top supercomputers, Cool IT cold plates set the standard for liquid cooling.
[00:18] Combining unmatched reliability and performance.
[00:20] See them in action at OCP Global 2025 and learn more at coolitsystems.com.
[00:32] Especially this year, we're seeing the real leaders in HPC, the real leaders in AI systems and hardware.
[00:43] Almost every single one of them has been putting stakes in the ground and saying, "Okay, we're going to do this."
[00:47] Now, there's a really good example of fatonics in the scallop, which is the Huawei system.
[00:55] Early predictions, I would say we're talking about at least two orders of magnitude.
[00:59] Two orders of magnitude in
[01:01] performance.
[01:04] From Orion X in association with Inside HPC.
[01:06] This is the ATH HPC podcast.
[01:10] Join Shaheen Khan and Doug Black as they discuss supercomputing technologies and the applications, markets, and policies that shape them.
[01:18] Thank you for being with us.
[01:20] Hi everyone, welcome to the ATH HBC podcast.
[01:23] I'm Doug Black of Inside HPC and with me is my co-host Shaheen Khan of Orionx.net.
[01:30] And we're very happy to have with us today a special guest Karen Bergman, a noted expert in the field of optical IO interconnect technology.
[01:38] Karen is the Charles Bachelor Professor of Electrical Engineering at Colombia where she also serves as the faculty director of the Colombia Nano Initiative.
[01:49] She also is a co-founder starting three years ago of Escape Photonix which is in the optical IO arena and she appears regularly at technology events including the
[02:03] supercomputing conference to discuss optical IO.
[02:04] So Karen, welcome back.
[02:07] Thank you so much.
[02:10] It was a great fun to have the previous chat and I'm really looking forward to today.
[02:13] Great.
[02:13] Such a pleasure to have you Karen.
[02:15] Our last episode was one of our more popular episodes.
[02:17] I know there's a lot of interest in the market and great time for us to catch up.
[02:21] We were all reminded that it was two and a half years ago, just like that.
[02:27] So, it's amazing how quickly things move forward.
[02:30] Even two and a half years ago, a lot has accelerated since then.
[02:34] Yeah.
[02:34] You came on with us in April of 2023, which generated a lot of interest, and there's a lot of ongoing interest in optical IO as a potentially new foundational technology that will allow HPC AI class servers to run faster and cooler.
[02:51] So, why don't we just start with a very big picture perspective, Karen?
[02:57] What's changed with the technology over the past two plus years?
[02:59] I would say you know the most you know dominant changes
[03:04] have been really the maturation of the ecosystem.
[03:06] So just two two and a half years ago obviously you know already a lot of interest in this technology with silicon phutonics primarily being the forefront technology for fatonic interconnects and the possibility of using them in the context of AI data center systems HPC systems as opposed to optics of course has been used for decades in longhaul fiber optic communications and even in in data centers you know in in the longer reach connections.
[03:37] So this is not what we're talking about here.
[03:39] This is really getting fatonics in the data path in what I like to call the data path.
[03:46] So over the last couple of years there's been continuing I would say explosion of both you know small and large companies that have made strategic and really put stakes in the ground for for bringing these to the commercial world.
[03:58] The most
[04:04] I would say in my view the most pronounced announcement came from Nvidia back in March of this year that they were essentially made the announcement that they are going to be using co-ackaged optics in their systems initially through their switches.
[04:25] Okay. And just to clarify exactly what we're talking about I sort of needed this clarification.
[04:29] We're not talking about moving data within the chip, but it's chip to chip within the server, within motherboards, etc. Is that correct?
[04:39] Exactly. Within the chip over very small distances, you know, let's say sort of less than a millimeter or millimeters.
[04:47] Copper is fantastic. Copper can do very high density. Copper can also deliver very high bandwidth density. I mean copper can also deliver very low energy consumption per bit in the single to tens of ftojoules per bit and it's cost effective. It's it can be used in 3D
[05:05] integration and there's a huge manufacturing ecosystem around that.
[05:10] So really when we talk about co-package optics or sometimes I refer to it as embedded photonics, it's really about bringing the photonic interface to the chip and then using the the advantages of the optical domain which basically are you can send a lot of data over any distance you want.
[05:33] Whereas in electronics of course the longer the distance over which we send data you know we experience a lot of loss and we have to amplify the signal and spend more energy as a function of the frequency.
[05:48] In the optical domain the very simple fact is that the losses of the optical medium whether it's an optical waveguide or whether it's a fiber optic cable are extremely low.
[05:58] And so we can it's the same signal whether we send it over a few centimeters, a few meters or maybe even up to a few
[06:06] hundreds of meters across the entire HPC system across the entire data center.
[06:12] So that that's really the key one of the key advantages.
[06:14] And so when we're talking about bringing data movement and the optical domain to these HPC AI systems,
[06:23] the technology that we're referring to is the interface.
[06:25] Let's put the photonic transceivers somewhere on the chip whether it's at the you know in in the form of a fatonic IO chiplet or some other co-ackaged designs formulations you know all those things are obviously up for discussions and there will be different photonics in different places in the system as as well but that's essentially what we're talking about it's transitioning to the optical domain at that distance point and so you imagine having photonics you know with within the blade within the rack and also of course as we already have today between racks and across the system and
[07:05] Karen just quickly I think the first
[07:07] thing on everyone's mind listening to our discussion is you know this notion that optical IO is potentially such an exciting technology but it always seems it seems to have a receding horizon we're always two or three years away what's your take and we can get into the particulars but what's your take on the commercial readiness for this technology.
[07:29] It's a great question and you're right.
[07:30] You know, we've I've been obviously, you know, working in this field for decades and we are always excited when we do research and and we like to think about the insertion of those technologies in commercial systems and you're correct that in the past, you know, it you could say it was the technology of the future and always will be.
[07:50] But this is different.
[07:52] This is really different and you know, being still cautious.
[07:55] There are a few reasons that Photonix hasn't seen the fullblown you know implementation.
[07:59] The number one is cost as it always is right cost is number one and you know for example in the la latest Nvidia system the blackwell still
[08:10] it's made out of copper right.
[08:12] the interconnects are still copper and that primarily the reason is cost not performance.
[08:16] photonix it's already known that photonix can outperform the equivalent electrical interconnects both in the bandwidth density and in offering lower energy consumption but you know for various market reasons it's the cost.
[08:29] so why is the cost the cost is also obviously a factor of the ecosystem the commercialization the manufacturing ecosystem and that's if I go back to the first comment I made you know what's been changing in the last you know two two and a half years or so since our last conversation.
[08:46] it's really the maturation the further maturation of this manufacturing ecosystem.
[08:51] are we all the way there yet.
[08:53] of course not but because what we're seeing and especially this year we're seeing seeing the real leaders in HPC, the real leaders in AI systems and hardware.
[09:05] Almost every single one of them has been putting stakes in the ground and saying, "Okay, we're going
[09:10] we're going to do this now.
[09:12] And that means that they have the suppliers, they the fabrication, the manufacturings, the packaging, you know, with real funding, with real orders, with real markets behind that.
[09:26] So, I can almost anticipate the next question is like, okay, when, right?
[09:28] when do you think we'll see it?
[09:30] And I would say that at this point we're really looking at 2028 as being the year.
[09:36] That's a little bit my opinion and various, you know, obviously various talks and conversations and the status of putting together everything that I know of what's out there.
[09:48] I think this is about 2028 or so will be the first real deployment of co-ackage photonics in systems.
[09:54] We can come back in three years and check on that.
[09:57] I'm hoping that it will happen even sooner and of course we'll start to see some things sooner.
[10:04] But I think that will in my opinion I think that will be when we can actually say look here it is.
[10:09] Karen, has the science been settled for
[10:12] the products that are being developed or are they still being pursued in various capacities?
[10:18] Is there like one direction that the industry and the manufacturing world is pursuing or are new ideas coming in from various directions?
[10:27] There's always new ideas of course, but I would say that there are really about the next generations in terms of how things will be done.
[10:34] There's still quite a bit.
[10:36] There are multiple approaches that are out there and they will have to be shaken out in the marketplace with vendors and you know there's of course lots of questions about scalability, reliability, you know all the usual stuff that needs to be proven out for mass scale manufacturing and adoption.
[10:56] So those debates are very much ongoing.
[10:59] There's, you know, various technologies that are being quote unquote baked off at this point.
[11:03] But I think that if you step away and look at the picture, definitely we're going to be using some form of wavelength division
[11:13] Multiplexing.
[11:14] How many channels is probably going to be small number of channels initially and then a growing number of channels as we continue to scale into the future.
[11:22] The co-package the assembly of the photonics together with the whether it's the compute or the switch or the memory.
[11:31] Those approaches of exactly how things are going to be packaged assembled there are multitude of technologies and approaches that are being pursued right now by various players and we will see may maybe many of them several of them will succeed because it's not a it's not a one thing I just want to make that clear.
[11:51] C-ackage fatonics in systems is not a one thing.
[11:53] There will be fatonix that is co-ackaged perhaps with the switch with the switch in inside of the interconnect network.
[12:03] There will be other photonics that's you know used for longer potentially for longer distances in the architecture and so forth.
[12:13] I think I expect that there will be different solutions for different points in the data center for sure and within the system as well.
[12:19] What is your perspective on the software protocols that run on these interconnects and is that any different because the fundamental technology is different?
[12:31] It will have to be the same.
[12:31] I think for a technology to succeed if we need there's already a pretty tall barrier to entry for photonix as we discussed earlier for the cost right if on top of that I ask the system providers to revamp their software I think that puts us you know many steps backwards I don't think that will be successful path maybe sometime in the future but at this point it's really about inserting this technology in the most seamless way possible.
[13:02] One of the big debates around co-package, co-integrated or photonix is the replaceability.
[13:08] Today in the systems we have of course you
[13:14] know the well-known pluggable photonics.
[13:16] right very easy very useful vendor interoperability everybody can design to it.
[13:21] you can something breaks you replace it right it's a simple pluggable as it's called and many applications of course.
[13:28] long longer reach shorter reach you know all of the above and those things are not going to go away they're of course they're going to be part of the future systems as well but now we're talking about putting optics more more inside more inside the real data path of the applications and and really gaining the benefits of the integration from the point of view of the bandwidth density.
[13:51] The pluggables don't even come close to the bandwidth density that we get from co-ackage optics.
[13:55] It's like maybe two orders of magnitude difference.
[13:59] That's how big it is.
[14:01] And of course, the low energy consumption of the co-integrated optics, but it's co-acked.
[14:06] So it's a you know it's a module.
[14:10] So if some part of it breaks, the laser breaks or the this fails, then what do we do?
[14:16] Do we need to replace the whole module?
[14:18] Do we, you know, what's the strategy?
[14:20] So those are some of the questions.
[14:22] I think they're all solvable, 100% solvable issues,
[14:25] but those are really where the technology questions are.
[14:26] The software is if someone has an approach that requires a redo of the software stack, it will not succeed in the near term for sure.
[14:37] Well, you know, my motivation to ask was what we saw with rotating discs and solid state discs.
[14:42] And initially it was all like slide in and you can't tell the difference.
[14:49] But then once it got jelled, people said, "But this is not rotating.
[14:54] This does different wear and tear attributes.
[14:57] I can optimize it this way or that way.
[15:00] Okay.
[15:02] And then that over time sort of permeated through the software stack because the awareness that the medium was different had an impact.
[15:10] So for something like an interconnect, I'm not sure the same analogy holds, but that was anyway the
[15:16] Motivation.
[15:19] Is there any difference that might provide opportunities for optimization that will prove irresistible at some point?
[15:25] 100% absolutely and that will come when so the first step is about the interconnects.
[15:30] So for example putting a phatonic IO on the socket that might include you know GPUs memory and so forth or as Nvidia announced a fatonic IO on the switch on a switch module but the next step and this is still more in the research domain is to also include fatonic switching in the system.
[15:52] Not I'm not talking about the of course you know Google for example has optical circuit switches in their systems as a way of doing some for reliability as well as topology engineering things like that more at kind of the the macro level but I'm talking about batonic switches that would be again more in the data path and could enable imagine that we have this very large HPC AI system you know with I
[16:19] don't know 10,000 even 100,000 endpoints.
[16:22] If we keep the data in the optical domain, we can use optical switches to reach much further sort of increase the diameter of the compute that's available to us.
[16:34] We actually have a paper on this at SC this year.
[16:36] So, I'm very excited about that using optical switches in these systems.
[16:38] When we get more mature with the interconnects and we're starting to look at photonic switches as well, absolutely we will need to think about the full hardware software stack.
[16:49] And that's a subject of research that I'm doing right now as well as of course other people are pursuing but it will certainly come to commercial at some point in the future.
[17:00] So Karen I was really impressed by your statement that you said right now it's not performance it's price and I assume part of that means manufacturing price you know production at scale.
[17:10] These are issues that really need to be ironed out.
[17:15] But are you saying for example Nvidia's next GPU could use this
[17:20] technology as things stand now?
[17:23] Yes, absolutely.
[17:23] Absolutely.
[17:25] I mean where exactly is the where you put the optical interface is a design question.
[17:31] For example, the Nvidia large larger GPU.
[17:34] I guess it's the B200.
[17:36] Maybe they're coming up with a new one.
[17:37] Of course, you know, the connectivity be, you know, inside of that, I believe, is still going to be electronic because it's very advantageous to have it electronic, but at the socket level where you have the GPUs and the HBMs typically to include an optical IO, fatonic IO, what that will enable you to do is really scale out that.
[18:01] So as we know in in these data center systems right we talk about scale out and scale up.
[18:07] So scale out is when you string together you know these thousands or tens of thousands of servers but scale up is really in you know your compute your closely coupled interconnected high
[18:22] performance compute and so for example
[18:26] we go back to the Nvidia system because
[18:27] I obviously you know they're the
[18:29] dominant in the market so like the NVL72
[18:32] right that's a scale up that's an
[18:34] example of what I would consider a scale
[18:35] up and it's all copper At this point,
[18:39] if you went to Phetonics, you take the same
[18:42] even the same GPUs and you go to
[18:43] phatonix, it's mindboggling.
[18:46] I mean, we're talking and first of all, you can
[18:48] expand the domain.
[18:50] You're not limited to
[18:52] 72. You can imagine potentially
[18:54] thousands.
[18:54] And with equivalent or better, much
[18:58] better than with densities at the same,
[19:02] you know, you want very importantly,
[19:03] it's about keeping the power envelope
[19:05] from growing.
[19:08] that that's one of the the
[19:08] key limitations today to computing.
[19:11] It's
[19:11] it's about power and energy consumption.
[19:13] There's a really good example of
[19:15] photonics in the scalup which is the
[19:18] Huawei system if you're familiar with
[19:20] it. They have they use inferior GPUs in
[19:25] their systems because of you know various export controls and things like that.
[19:29] So the GPUs that they use in their system, at least in the one that's was published, has GPUs that are about three times less performance than the Nvidia GPUs,
[19:41] but they just took conventional, not this most advanced photonics that I'm talking about with bandwidth densities and all that performance, but they just took conventional photonics like pluggable linear pluggable optics and connected the system in the scale up domain
[19:58] and We're able to create a system that's approximately twice as powerful, computingly powerful as the Nvidia NVL72.
[20:12] So they start with 3x less performance compute, right?
[20:16] Compute processor.
[20:19] But just using Photonix, it just expands the physical domain of your compute capability because you just can connect
[20:25] more more compute to it. you're not
[20:27] worried about, you know, the losses and
[20:30] the distances and all of that. So, just
[20:33] imagine what you could do if you take
[20:36] the most advanced GPUs that are
[20:39] available in the world and combine them
[20:43] with this new generation of embedded
[20:46] photonics, co- package photonics,
[20:49] early predictions, I would say we're
[20:51] talking about at least two orders of
[20:53] magnitude, two orders of magnitude in
[20:55] performance. That's the reason that
[20:58] we're so excited about this.
[21:00] >> For more than two decades, Cool It has
[21:03] partnered with the world's leading
[21:05] processor manufacturers to solve the
[21:07] thermal challenges of the most powerful
[21:09] chips on the planet. Our cold plates
[21:12] deliver the highest reliability and
[21:14] performance, enabling hypers scale and
[21:16] neocloud clusters to run the most
[21:18] demanding AI workloads. We design and
[21:21] manufacture liquid cooling products at
[21:24] scale that power the world's top
[21:26] supercomputers and AI infrastructure.
[21:29] See cool IT technology in action at OCP
[21:32] Global 2025 and learn more at
[21:35] coolitsystems.com.
[21:38] [Music]
[21:40] Are you saying well it makes sense I
[21:42] think that if you improve the
[21:44] interconnect efficiency bandwidth
[21:46] latency coherency
[21:49] that it allows you to use lesser GPUs
[21:52] but more of them to get to the same
[21:54] place because the efficiency doesn't
[21:55] drop as fast etc etc is there such a
[21:59] threat there
[22:00] >> it allows you to use as many GPUs as you
[22:02] want whatever you have but much much
[22:05] more efficiently and also to be able to
[22:07] scale up the numbers that if you want to
[22:10] use right so sort of the typical
[22:13] architecture today due to bandwidth
[22:16] limitations is hierarchal right inside
[22:20] if I look inside the socket if I look
[22:22] inside what I mean by the socket is this
[22:26] typical package that we have with the
[22:28] compute GPUs HBMs etc inside there the
[22:33] bandwidths are fantastic on the order of
[22:36] 10 terabytes
[22:38] communication bandwidth, electronic, all
[22:40] electronic. As soon as I get outside of
[22:42] that socket and I can connect some of
[22:45] these together via switch fabric,
[22:48] electronic switch fabric, the bandwidth
[22:50] takes about a 10x drop. I go one more
[22:53] and I build out a system. So for the
[22:56] scale out and I take another 10x I take
[22:58] another 10x hit. So what the fatonix
[23:02] will do is bring it into that 10
[23:05] terabyte. Right? start there and now I
[23:08] can go anywhere I want and still have
[23:11] that incredibly high bandwidth that I
[23:13] today have only inside the socket.
[23:16] >> So it is eliminating distance
[23:17] essentially.
[23:19] >> That's right. It's essentially
[23:21] creates a system that is we can say is
[23:24] distance independent. Of course I don't
[23:26] want to very important. We still have
[23:29] time of flight. We still have deal with
[23:31] we have to deal with the speed of light.
[23:33] We can't get rid of that. there's going
[23:35] to be some limitations on how far we can
[23:37] scale things, but for AI systems, the
[23:40] sensitivity to this latency is not as
[23:42] severe.
[23:44] >> And so the potential gains for these
[23:46] systems in performance are just
[23:48] incredible. And especially, you know,
[23:50] now that the new measure is going to be,
[23:52] forget about flops or anything like
[23:54] that. It's going to be in units of
[23:56] power.
[23:57] >> You mentioned it, you know, I mean, even
[24:00] I saw a recent article where open AI,
[24:03] right? They they just have some
[24:05] contracts with AMD and Nvidia and etc.
[24:09] And the contracts are in gigawatts.
[24:12] >> That's right.
[24:13] >> Right. So this is the number that
[24:16] matters now. And so what optics will do
[24:19] is imagine that we can give you 100x
[24:22] performance inside of that same
[24:23] gigawatt.
[24:24] >> Right? So this is a good segue into a
[24:27] question I had about materials. What is
[24:29] the state of science research production
[24:32] when it comes to novel materials that
[24:35] are optimized for speed, energy loss,
[24:39] etc. or is it all the same because it's
[24:41] all fiber optics?
[24:42] >> There's been a lot of advances and a lot
[24:45] still more need to be done for sure. We
[24:47] can always on the material side. So the
[24:50] big thing right the the kind of the
[24:52] dominant thing has been the advent of
[24:55] silicon photonics. So now we can
[24:56] basically have we can design we can
[24:59] fabricate fairly complex photonic
[25:01] circuits photonic integrated circuits
[25:03] you know PICSS as we call them in
[25:06] conventional CMOS fabrication. So my my
[25:10] lab as well as many others you know we
[25:12] fabricate in 300 mm foundaries there are
[25:16] specific runs you know optimized for
[25:19] fatonic fatonics but they're basically
[25:21] they're using conventional tools that
[25:23] are used in seamos fabrication and there
[25:25] are several commercial ones you know
[25:27] tower jazz foundaries and now TSMC of
[25:30] course
[25:31] >> so that's the main thing now the bad
[25:33] news is that okay so you know we want to
[25:36] be talking we can't talk about photonics
[25:38] without talking about laser. Sometimes
[25:40] people forget that you need the laser.
[25:42] >> Yes,
[25:43] >> photonics is great, but we need the
[25:45] photons to make it work. And so with all
[25:48] this wonderful things that we can do in
[25:50] silicon, unfortunately silicon is not a
[25:53] great material for lasers.
[25:55] >> It's an indirect band gap of course and
[25:57] we cannot at least right now we cannot
[26:00] make lasers in silicon. So the silicons
[26:02] are 35 typically material you know
[26:05] materials that are used and so they
[26:07] somehow need to be that's why the
[26:08] packaging problem in photonics is and
[26:12] continues to be a big issue you know how
[26:14] do you bring the laser in how do you
[26:16] integrate the laser I mentioned that we
[26:18] are definitely will need to go to
[26:20] wavelength division multiplexing to get
[26:22] to these kinds of bandwidth densities
[26:24] that we're talking about how do you
[26:25] bring in multiple colors of light into
[26:28] the chip I can quickly mention that the
[26:31] company of which I'm a co-founder,
[26:33] Escape Photonix, you know, one of our
[26:35] key novel technologies is that we're
[26:38] able to make things called comb lasers
[26:41] where using a single laser, we're able
[26:43] to generate many colors all at the same
[26:46] time. So that that's one of our key
[26:48] technologies. But in you know in general
[26:51] the the issue of the laser is a major
[26:55] issue for getting these this technology
[26:57] into the systems and that of course
[27:00] translates into how do you combine do
[27:04] you combine do you have the 35 material
[27:07] outside of the silicon? Do you combine
[27:09] it with the silicon and some kind of a
[27:11] packaging platform? There are other
[27:14] companies and researchers that are
[27:16] working on growing 35 on top of silicon
[27:19] and other techniques. So that's a big
[27:22] one. That's a big topic and big issue.
[27:24] Of course, there are other materials as
[27:26] well. For example, one of the issues
[27:28] that we're working on in research is how
[27:31] to make the photonic circuit less
[27:34] sensitive or even completely ather
[27:36] thermal and not sensitive at all to
[27:38] thermal variations. Photonics, any
[27:41] photonics is naturally sensitive to
[27:45] temperature because if you change the
[27:46] temperature one way or the other, the
[27:48] index of refraction of the material is
[27:50] going to change and therefore your
[27:52] photonic circuit is going to change in
[27:54] some way. Especially when you're doing
[27:58] dense wavelength division multiplexing
[28:00] and using things like resonators, you
[28:02] know, then it changes pretty
[28:04] dramatically. And so, how do you
[28:05] navigate that? And are there materials
[28:08] that are less sensitive to temperature?
[28:11] Materials that can compensate for that.
[28:12] So again, that opens up the future of
[28:15] research is bright.
[28:18] >> Many important problems to solve.
[28:20] >> There's still things we don't know.
[28:21] You're saying
[28:22] >> absolutely.
[28:23] >> This all sounds it just echoes of what's
[28:25] going on in quantum only I'd say less so
[28:28] in the sense that you're closer than
[28:30] quantum for readiness. But is there
[28:33] going to be an issue? Say we're three
[28:34] years out. These new chips are bursting
[28:37] on the scene. Everybody wants to use
[28:39] them with silicon phutonics. What about
[28:43] integration of the new chips with the
[28:47] installed base of copper
[28:48] interconnectbased chips? Is that going
[28:50] to be an issue in data centers or in
[28:53] servers?
[28:53] >> You mean that there'll be like a mix of
[28:56] some things with just copper and some
[28:57] things with just platonics? Is that your
[28:59] question?
[29:00] >> Yeah,
[29:01] >> it's a good question. my my thinking
[29:05] definitely a good question. I don't know
[29:06] that I I don't have a clear answer
[29:09] because I would say you know the system
[29:12] owner right the system vendor or the
[29:14] operator that has the maybe the the more
[29:17] mature technology in their data center
[29:20] and now wants to add platonics. I don't
[29:23] know that they're going to keep the old
[29:26] stuff and add, you know, somehow add the
[29:29] new stuff as well. More the things that
[29:32] I hear and work on are replacement. So,
[29:36] you know, we're going to, you know,
[29:38] replace the racks with basically, you
[29:40] know, fatonic enabled racks. I don't
[29:43] know about the, you know, operability
[29:45] with, you know, old older stuff with the
[29:47] neurophatonics. That it's a fair
[29:49] question. It's definitely a fair
[29:50] question. Yeah,
[29:50] >> that's a unit anyway in the data center
[29:52] these days, especially with the way
[29:54] Nvidia has been architecting them.
[29:56] >> Yeah, I mean definitely the data centers
[29:58] have fiber the fiber installation is an
[30:00] expensive thing and they do love to
[30:03] upgrade you know without having to rip
[30:05] out the fiber and change and I think
[30:07] that's very doable. That's not going to
[30:08] be an issue so much
[30:10] >> as inside the racket. Plus, you know,
[30:13] everybody wants to sell new systems so
[30:15] that's good for business.
[30:16] >> That's right. I had a question about
[30:18] also manufacturing. One of the things
[30:21] that keeps coming up with anything that
[30:23] is analog or analog looking is that it
[30:27] is hard to consistently manufacture the
[30:30] same specification
[30:32] because you it's the AM radio that you
[30:35] have to tune or you know like a piano
[30:38] that falls out of tune. Is that an issue
[30:41] with optical interconnects or are
[30:44] systems sufficiently consistent or can
[30:47] be manufactured or are AI enabled to
[30:50] tune themselves back to the right zone?
[30:52] What does that situation look like?
[30:55] >> Yeah. So in the context right of these
[30:58] fatonic interconnects that issue
[31:01] does exist but a little bit different
[31:03] perhaps than you know what you had in
[31:06] mind with regards to analog systems. So
[31:08] a typical WDM photonic link that we are
[31:12] envisioning
[31:13] would have data modulators and receivers
[31:16] right that are designed to operate at a
[31:19] certain wavelength and we can combine
[31:21] them together so that we can have many
[31:24] wavelengths potentially you know running
[31:26] through running through the link. That's
[31:28] how we get the bandwidth density. We
[31:30] scale in the number of wavelengths and
[31:33] definitely when we manufacture these
[31:35] modulators and filters and other
[31:37] components in CMOS fabrication in sort
[31:40] of conventional foundaries and they come
[31:42] out there'll be some variations right
[31:44] there'll be some we design them for you
[31:46] know this exact frequency but we measure
[31:50] and the resonance is a little bit off
[31:52] that frequency and that modulator needs
[31:55] to be tuned and and so that's absolutely
[31:58] being being worked that's really part of
[32:01] the package right now.
[32:02] >> So the circuitry that goes along with
[32:05] interfacing to the these optical chips
[32:09] has typically has two parts. It has the
[32:13] high-speed digital data part, right? The
[32:15] data transmitter and the receiver side
[32:17] of it as well. And it also has a bit of
[32:21] an analog circuitry control circuitry
[32:23] that that is part of the chip, part of
[32:25] the system. It could be a separate chip
[32:27] sometimes, depends. But that is used for
[32:31] calibrating, tuning up, you know,
[32:33] getting the fatonic chip lined up to
[32:36] where it needs to be. So that's
[32:39] certainly is an issue and it's all of
[32:41] these things can also change with
[32:43] temperature. So these control and has to
[32:46] adjust for that as well. But you know,
[32:48] it's something that we're used to doing.
[32:50] You know, we've had lasers, commercial
[32:51] lasers for many decades. All the lasers
[32:54] have, you know, circuitry that keeps it
[32:57] keeps them, you know, in tune and so
[32:59] forth. We've had different, but we've
[33:02] had transmitters for fiber optic
[33:04] systems, you know, for decades. And so,
[33:07] we know how to do it. The challenge for
[33:09] this application for inside the AI
[33:11] system and the data center is to be able
[33:14] to do it in much smaller footprint,
[33:17] right? For to regain that that bandwidth
[33:19] density and with a lot less energy.
[33:22] That's a it's energy is like the you
[33:24] know first second third class citizens
[33:27] in all the designs very important. So
[33:29] yeah definitely technical again not
[33:32] insurmountable challenges but absolutely
[33:34] part of the process. Karen, on power
[33:37] consumption, what is your sense of how
[33:40] things will play out when these chips
[33:42] come on the market? Will data centers
[33:45] run up demand for compute power up to
[33:47] the same level of power consumption that
[33:50] data centers are already consuming now
[33:53] or more so in the future. So that it
[33:56] nets out that we haven't really
[33:57] addressed the power problem or will this
[34:00] make a big dent into the power
[34:01] consumption? I think that, you know, if
[34:03] you have a data center that you built,
[34:05] you know, with whatever hundreds of
[34:07] megawws, you're going to try to use all
[34:10] the power that you're paying for. You're
[34:12] not going to all of a sudden use half
[34:14] the power of whatever you built. What
[34:16] the futonics will do is enable you to
[34:20] continue to scale your compute and
[34:22] capabilities inside of that envelope.
[34:25] And so I believe the way that it might
[34:27] roll out is the initial phetonics will
[34:30] not be the most energy efficient that we
[34:33] can make things. There's still more
[34:34] research that we can do. There's still
[34:36] more technologies that we can bring to
[34:38] the table make things even more
[34:40] efficient than they are than they will
[34:42] be in my magic year of 2028. And that
[34:45] will be one let's say mark in the
[34:47] ground, right? will bring let's imagine
[34:49] that we bring in this first generation
[34:51] of co-ackage platonics and it you know
[34:55] you're going to get definitely a gain in
[34:57] the compute you're going to get a gain
[34:59] in the efficiency of how you run the
[35:01] application so you can run the
[35:02] application imagine you have a certain
[35:04] application instead of running it
[35:06] certain amount of time now it takes 10x
[35:08] less right so you can run 10x more
[35:12] things under the same power supply
[35:14] envelope and as we continue to get
[35:17] better with the photonics and we have
[35:19] paths towards that right now. There may
[35:21] be more in the research but they will
[35:23] they are evolving and the photonics will
[35:26] get even more energy efficient and
[35:27] you'll be able to do more. Yes, I I'm
[35:29] sure that assuming that this AI
[35:32] generation continues to grow as we
[35:35] expect more data centers will be built
[35:38] and more power will be drawn. But what
[35:40] we can do is make things a lot more
[35:42] efficient and maybe bend the curve. I
[35:45] wouldn't say we're not going to turn
[35:46] down the power because I don't think the
[35:48] operators are going to turn it down, but
[35:50] we can bend the curve and make it scale
[35:52] much more reasonably.
[35:54] >> Right.
[35:56] >> Okay.
[35:56] >> What does the global scene look like
[35:58] when you look at research as well as
[36:00] production, we mentioned Huawei, their
[36:03] matrix, cloud matrix I think it's
[36:05] called, or Atlas and their ascent. We've
[36:07] covered them in our podcast in the past.
[36:09] But obviously the research on the global
[36:12] scene is also a interesting topic. Who's
[36:16] good at this stuff besides obviously the
[36:18] US and your own lab in the US?
[36:20] >> Yeah, the US is is very good. We have
[36:22] certainly we have leadership in various
[36:24] things. I would say you know China is
[36:27] amazing. you know they have put just
[36:30] incredible resources into this by
[36:33] basically doing what we hope I hope we
[36:36] can do more of here in the US is by
[36:39] having combined you know very targeted
[36:42] efforts you know from the government
[36:45] industry and research you know the
[36:47] research enterprise including
[36:49] universities and other research
[36:51] institutions and just pouring resources
[36:55] into this are they ahead of us I
[36:57] honestly don't know But they if I look
[36:59] at what's happening in China, I can see
[37:01] from the papers that are being submitted
[37:03] to conferences and publications and what
[37:05] at least what's in the public domain
[37:07] amazing really impressive. So they're a
[37:09] big player on this and I believe they
[37:11] will continue to be a growing player.
[37:13] The US of course and and obviously of
[37:16] course there's very strong very strong
[37:19] work in Europe and in Japan as well of
[37:21] course.
[37:21] >> Right. Is this sort of their their
[37:24] strategy really focus on the
[37:26] interconnect aspect as opposed to the
[37:27] pure processing part to try to catch up
[37:30] with the west on chips? It certainly is
[37:33] part you know maybe because the
[37:34] processor part was more limited you know
[37:37] due to the geopolitical things that are
[37:40] way over my my my head all of us
[37:42] >> all of us right but one way or the other
[37:45] you know it's highly accelerated it's
[37:47] very impressive
[37:48] >> last time we talked I remember we also
[37:50] mentioned fabrication technology and I
[37:54] seem to recall that you mentioned for
[37:56] opto electronics you don't really need
[37:59] the leading leading edge of chip
[38:02] manufacturing and as you mentioned like
[38:04] 300 nanometers would do. I don't really
[38:07] need two nanometers out there. Has that
[38:09] changed or is that still the situation?
[38:12] >> It's still 30 300 is a little high but I
[38:15] would say you know 90 nanometers and
[38:18] below we use fabs that are 180. We use a
[38:22] fab that's 65 nanometer. So definitely
[38:24] much more mature technology than
[38:26] obviously the two 2 nanometer or even
[38:29] smaller nodes. You don't need that. And
[38:31] it's the reason is, you know, basic
[38:34] because the optical wavelength is way
[38:36] bigger, right, than the electron. And
[38:38] and so what we really need from the fab
[38:41] is to be able to make the optical
[38:44] photonic structures, you know, the wave
[38:46] guides, the switches, the resonators,
[38:48] everything that we design smooth so that
[38:51] we don't have losses. One of the key
[38:54] design parameters is about minimizing
[38:56] loss. The less loss you have in the
[38:58] photonic circuit, the less energy you
[39:00] need from the laser, the less
[39:02] sensitivity you need from the receiver
[39:04] and therefore the power consumption goes
[39:06] down and the more bandwidth you can send
[39:09] through. So loss, it's all about the
[39:11] loss budget as well, you know, as a
[39:13] primary focus. So that's great and you
[39:16] can achieve that with, you know,
[39:17] processing technology nodes that are
[39:20] more mature. However, what's also very
[39:22] important, we touched on that a little
[39:24] bit before, is the variability.
[39:28] And if you use, let's say, you know, 65
[39:32] nanometer node, right? But you use tools
[39:35] in that fab line that are advanced tools
[39:39] that are tools that are used for more
[39:42] advanced nodes. Then you get much more
[39:45] consistent
[39:47] accuracy and repeatability and that can
[39:50] play a huge role in the ultimate
[39:52] performance of the pick.
[39:54] >> I see. So it's almost like consistent
[39:56] quality of every data path, let's say.
[39:58] >> Exactly. Exactly. It's also good for
[40:00] packaging just to add to that a little
[40:02] bit. So, you know, the photonic pig on
[40:04] its own, if I just give you a pick,
[40:07] there's nothing you can do with it,
[40:08] >> right?
[40:09] >> Can't turn it on. You can't send any
[40:11] data. There's nothing you can do with
[40:12] it. It's like a it's like a brick of
[40:14] silicon. So, I need to connect that pick
[40:17] a to a laser of some type or co
[40:20] integrate the laser. And I need to
[40:22] connect everything to the electronic
[40:24] domain to send data, to power the laser,
[40:27] you know, everything. And so the other
[40:31] really important aspect is the
[40:33] packaging. Talk about co-ackage optics.
[40:35] You know the package is the assembly and
[40:37] the packaging is very important and the
[40:41] advanced tools being able to do things
[40:44] you know 300 mm wafer scale with
[40:47] advanced tools also lets you do things
[40:50] like you know 3D packaging that enables
[40:52] you to co assemble with electronics in a
[40:55] very precise way. You can have very
[40:57] small pitches. You can all these things
[41:00] enable you to reach better and better
[41:02] bandwidth density and performance.
[41:04] >> Right. Right. Right. And pick you
[41:06] mentioned is photonix integrated
[41:08] circuit. That's what that means.
[41:09] >> That's right. That's right.
[41:11] >> Also I remember from our last
[41:12] conversation that photonix for
[41:15] communication check photonix for
[41:18] computing I don't know complexity may or
[41:21] may not. What is your latest take on
[41:24] optical computing? It's a very
[41:26] interesting topic. I think that at this
[41:29] point in time it's in the research
[41:31] domain and very interesting you know I
[41:33] myself am working on it. I think that
[41:35] there are some key functions that you
[41:38] can do in the optical domain that
[41:41] because of the parallelism the natural
[41:43] parallelism that you get in the optical
[41:45] domain that could be extremely
[41:49] interesting and accelerating you know
[41:51] certain compute functions. And my
[41:53] argument there goes in the past, right?
[41:55] There's been there's a long sometimes
[41:59] colored history of of optical computing
[42:01] like back to the 90s. You know what some
[42:04] people did in the past was said, okay,
[42:06] I'm going to build a computer and make
[42:08] optical all optical gates and do
[42:11] everything in the optical domain. And
[42:13] that turned out to be, you know, not a
[42:15] very promising path in the end and kind
[42:17] of crash and burn.
[42:18] >> Sounds really hard.
[42:19] >> Yeah, it was very hard. It was exciting
[42:21] to think about you know I remember you
[42:23] know we did things like optical logic
[42:26] gates and things like that and it was
[42:28] nice research but compared to what you
[42:30] can do in electronics even back then is
[42:33] it's night and day.
[42:34] >> But let's say that we are getting to
[42:37] everything that we've talked about today
[42:39] which is we already have we we've
[42:42] already paid the price of converting to
[42:45] the optical domain. Let's say we're in a
[42:46] world where embedded fatonics co-
[42:49] package optics is part of the system.
[42:50] Just like I said earlier about the
[42:52] switches, you know, we can keep going.
[42:54] So now it's we already paid the the
[42:56] biggest cost is moving from one domain
[42:59] to the other from the electrical to the
[43:01] back.
[43:02] >> So we've already paid the price, we paid
[43:03] the energy cost, everything. Maybe we
[43:05] can do something with that data in the
[43:07] optical domain. Now it makes a lot more
[43:09] sense to me.
[43:10] >> If it's already there, can we
[43:12] >> It's already there. Yes. And so that's
[43:15] the question. It's still in the research
[43:17] domain, but I think that there are some
[43:18] really exciting things on the horizon.
[43:20] Yeah.
[43:21] >> I was going to ask you about the cost of
[43:23] transceivers and now they're embedded.
[43:25] They're everywhere. Are they wire speed
[43:27] and no issues on energy or is that still
[43:31] like a lump that one needs to worry
[43:33] about? Sounds like is the latter
[43:35] >> the cost of the transceiver. You mean
[43:37] >> just the cost in terms of time and
[43:39] energy in terms of latency and
[43:41] >> Oh yeah. No, it's it's really minimal.
[43:43] It's really minimal. I see. Yeah, it's
[43:44] very minimal. It's almost instantaneous.
[43:47] It's like in the picosconds domain.
[43:48] Yeah. So, got it.
[43:49] >> Unless let me just with a caveat
[43:52] >> there there are certainly transceivers
[43:54] where that where the data is much more
[43:56] complex like a much higher order
[43:58] modulation formats coherent modulation
[44:00] you all the stuff that's being used
[44:02] >> in telecom. No, those are not good for
[44:06] this application because you have to do
[44:07] a lot of signal processing and all kinds
[44:10] of stuff. So which would adds a lot of
[44:12] energy and latency. So that that's not
[44:14] what we're talking about here. Here when
[44:16] we think about these very stripped down,
[44:19] you know, very typical to like HPC
[44:21] systems where you have a proprietary
[44:23] interconnect. Now it's in the optical
[44:25] domain, the latencies at the interfaces
[44:27] are really minimal.
[44:29] >> Right. Right. Right. Excellent. You
[44:30] know, a couple of years ago, I thought I
[44:32] had this brilliant idea that if you take
[44:34] a number and you want to factoriize it,
[44:37] why, you know, can there be like a prism
[44:39] and then you shine it and then whatever
[44:42] comes out on the other side are the
[44:43] factors. So then I go Google it and
[44:46] indeed somebody had already worked on
[44:48] it.
[44:49] >> It wasn't a new idea. So it seems like
[44:52] there
[44:53] >> yes that and things doing like optical
[44:56] you know that's what I meant by
[44:57] functions like
[44:58] >> things like optical FFTs other that
[45:01] naturally use the parallelism of the
[45:03] phhatonics
[45:04] >> right so I wanted to ask you about the
[45:07] company that you have co-ounded and I
[45:09] think for anybody who wants to go look
[45:11] it up it's pronounced xcapephotonix.com
[45:15] if I'm correct
[45:16] >> yes
[45:17] >> what is it about what are you guys
[45:18] trying to do I was delighted to hear
[45:20] about it and you know as much info as
[45:23] you care to share with us.
[45:25] >> Oh, absolutely. And thank you. Thank you
[45:26] for that. We're very excited. So I and
[45:29] two other Colombia faculty, Mika Lipson,
[45:32] who is the pioneer of silicon photonics
[45:36] and Alex Gayeda also a faculty at
[45:39] Colombia also pioneer in nonlinear
[45:41] optics and especially comb laser
[45:44] technology and our other founders at the
[45:47] company CEO Vec. So the company is about
[45:51] really bringing a lot of what we talked
[45:52] about to the commercial world and the
[45:55] kind of the key differentiator that we
[45:57] have is this the comb laser technology.
[46:01] So this is imagine that you have a
[46:03] single laser that generates
[46:06] simultaneously
[46:08] any number of wavelength channels that
[46:10] are exactly precisely spaced. You can
[46:13] design and space those channels exactly
[46:15] what you want. Each one of the channels
[46:18] can deliver relatively high power and
[46:21] it's all driven by by a single laser in
[46:23] a very which is a very energyefficient
[46:26] way to do it if we sort of trace it all
[46:28] the way back to the wall plug efficiency
[46:31] and so forth. So this was this work
[46:33] started when we've been the three of us
[46:36] at Colombia have been working together
[46:38] for o over 15 years
[46:41] >> and in the last number of years you know
[46:44] we combined this comb laser which
[46:46] generates many wavelengths together with
[46:49] the link architecture work that I do to
[46:52] deliver everything that we just talked
[46:54] about you know the high bandwidth
[46:55] densities we're talking about just to
[46:56] give some numbers you know reachable to
[46:59] 10 terabit per second per millimeter
[47:02] 240 or even more.
[47:04] >> Wow.
[47:05] >> As we keep going. And this is edge
[47:07] bandwidth density that you get out of
[47:09] the chip fiber coupled inside. If you're
[47:12] thinking about a panel scale computing
[47:14] system, you can get even higher
[47:15] bandwidth density. It's just incredible.
[47:17] So we decided that this is very exciting
[47:20] and a few years ago we launched the
[47:23] company and are hoping to you know
[47:25] obviously the vision of getting to these
[47:28] very large wavelength counts and
[47:30] bandwidth densities and so forth is on
[47:32] our road map we have the ability to get
[47:34] there and in the nearer term the more
[47:36] where I was saying where the market is
[47:38] going in the in the next let's say two
[47:41] three years is smaller number of
[47:43] channels so let's say eight channels or
[47:45] 16 channels so That's primarily what
[47:48] we're focused on right now in the
[47:49] company is bringing that to market.
[47:51] >> Excellent. And the use case for this
[47:53] will be initially in data centers inside
[47:56] racks etc.
[47:57] >> Exactly. This is about the fabric
[47:59] getting a fatonic fabric you know within
[48:02] the rack or blade to blade you know th
[48:04] those kinds of distances. Yeah.
[48:06] >> I see. So really it just fills the
[48:07] spectrum of what is eligible to go
[48:10] optical.
[48:11] >> Exactly. Exactly right. Yeah.
[48:13] >> Excellent. Excellent. Well good luck
[48:15] with that. I noticed the announcement
[48:17] from Columbia University actually
[48:19] because they were bragging about the,
[48:21] you know, the funding that you raised, a
[48:23] very successful round, you know, was
[48:25] like $45 million or something. So that
[48:27] all sounds excellent. Best of luck for
[48:29] that. I think that's exciting and
[48:31] advances the state of the technology.
[48:33] >> Thank you so much. Yeah, we are. We're
[48:35] super excited about it and as you said,
[48:37] you know, there's a lot of interest in
[48:39] it and we hope now is the hard part,
[48:41] right? bringing this technology, doing
[48:44] all the engineering work and really
[48:45] bringing this technology to foreground
[48:47] and we're very excited about it.
[48:49] >> Well, you have the dream team and you've
[48:51] got the runway so it all bodess well.
[48:53] >> We hope so. Thank you.
[48:54] >> Excellent. Well, thanks for extending
[48:56] this thing. Doug, any other topics from
[48:58] you?
[48:58] >> No, I was just going to make the comment
[48:59] that, you know, it's hard to raise that
[49:01] kind of money, but then it's harder to
[49:03] do what you're supposed to do with the
[49:04] money. So, but best of luck with all
[49:08] that.
[49:08] >> Hardware, I like to say hardware is
[49:10] hard. Hard work is hard indeed. Yes.
[49:13] >> No, I think that's it. That was a great
[49:14] conversation and really appreciate you
[49:17] bringing us up to speed in this area.
[49:18] Carolyn,
[49:19] >> thank you both so much. You know, this
[49:21] is my favorite topic to talk about, so I
[49:23] can be here all day.
[49:24] >> Awesome. Thank you so much. What a
[49:26] treat. Appreciate it and look forward to
[49:28] catching up again later and running into
[49:30] you at SC.
[49:31] >> Absolutely.
[49:33] >> That's it for this episode of the ATHPC
[49:35] podcast. Every episode is featured on
[49:38] insideh.com
[49:39] and posted on orionex.net. Use the
[49:42] comment section or tweet us with any
[49:44] questions or to propose topics of
[49:46] discussion. If you like the show, rate
[49:47] and review it on Apple Podcasts or
[49:49] wherever you listen. The HPC podcast is
[49:52] a production of OrionX in association
[49:55] with Inside HPC. Thank you for
[49:57] listening.