# Advanced photonic technologies for future systems: Optical computing or optics for computing?

https://www.youtube.com/watch?v=6lF9O7L0MbY

[00:00] Thank you very much.
[00:01] Thank you very much, great to be here.
[00:04] Great to be here, yeah.
[00:04] I would like to address various aspects related to optics for computing.
[00:09] For interconnects as well as maybe even doing computing with optics and give you a little bit of my perspective on this.
[00:16] So I would like to start with this system which is a supercomputing system that IBM built in 2008.
[00:23] And of course, I have already been various introductions about say the challenges of AI in this respect.
[00:30] I want to show this system because it is the first better flop system built in 2008.
[00:36] Well, is 2008 a long time ago or not, but I think it's not so long time ago.
[00:42] And today, such a system which was say the highest performance supercomputer in 2008 now needs to run 10,000 days in order to train one of the modern AI networks.
[00:57] And that's based on these plots that we all know and how this is.
[01:00] all know and how this is increasing in time.
[01:02] increasing in time.
[01:04] Of course with all the impact on energy and whatsoever.
[01:06] and whatsoever.
[01:08] So just let's have a little bit of closer look at the kind of calculation that needs to be done.
[01:15] calculation that needs to be done.
[01:16] >> [snorts]
[01:18] >> We have here the neural network and for every neuron what needs to be done is that you calculate say the incoming signals from the previous layer of neurons where they need to be weighted which is a multiplication and all these weighted signals have to be added together which is an accumulation so it's a multiply accumulate.
[01:36] it's a multiply accumulate.
[01:38] And for the full layer this turns into a vector matrix multiplication.
[01:43] And for for say the the scaling of the compute effort goes with the square of the number of neurons.
[01:51] And for the tremendously large neural networks that we see today it's clear that this is scaling say quadratically and if you compare this with the compute
[02:01] and if you compare this with the compute effort for the neurons themselves that
[02:03] effort for the neurons themselves that scale of course with say the number of
[02:05] scale of course with say the number of neurons.
[02:06] neurons.
[02:06] Today really the vast vast majority of
[02:08] Today really the vast vast majority of the calculation required for doing
[02:11] the calculation required for doing the inference and training of neural
[02:12] the inference and training of neural networks is de facto matrix
[02:14] networks is de facto matrix multiplication 99.9%
[02:17] multiplication 99.9% and more.
[02:17] So that's [snorts] important
[02:18] and more.
[02:18] So that's [snorts] important to realize if you think about what to do
[02:21] to realize if you think about what to do in computing
[02:22] in computing and also going potentially to optical
[02:24] and also going potentially to optical computing.
[02:25] computing.
[02:25] So what kind of technology are we using
[02:28] So what kind of technology are we using today?
[02:29] today?
[02:29] We have of course the transistors
[02:33] We have of course the transistors in the front end of line and in the
[02:34] in the front end of line and in the system we have then the back end of line
[02:36] system we have then the back end of line in order to connect multiple transistors
[02:38] in order to connect multiple transistors to build the logic gates to build the
[02:40] to build the logic gates to build the circuits in order to do the de facto
[02:42] circuits in order to do the de facto matrix multiplications.
[02:42] You build it
[02:45] matrix multiplications.
[02:45] You build it into a system, you combine it with
[02:46] into a system, you combine it with memory
[02:47] memory >> [snorts]
[02:48] >> [snorts] >> and
[02:50] >> and there's something important here in the
[02:52] there's something important here in the sense that you know I also will talk
[02:54] sense that you know I also will talk about optical computing a little bit.
[02:56] about optical computing a little bit.
[02:56] One of the reasons why optical computing
[02:58] One of the reasons why optical computing and also neuromorphic computing really
[03:00] and also neuromorphic computing really made a big step say roughly 10 years ago
[03:03] made a big step say roughly 10 years ago was that we were all thinking 10 years ago that say CMOS scaling is is ending.
[03:09] ago that say CMOS scaling is is ending. And what do we see now?
[03:11] And what do we see now? What ended at that time was the Dennard scaling which was say the continuously scaling down of a certain geometry of the silicon transistor.
[03:14] What ended at that time was the Dennard scaling which was say the continuously scaling down of a certain geometry of the silicon transistor.
[03:16] scaling which was say the continuously scaling down of a certain geometry of the silicon transistor.
[03:19] scaling down of a certain geometry of the silicon transistor.
[03:21] the silicon transistor. What what's currently happening is that scaling is continuing not just by scaling down but just by say bringing in new geometries, new materials in order to continue to advance the transistor density.
[03:25] What what's currently happening is that scaling is continuing not just by scaling down but just by say bringing in new geometries, new materials in order to continue to advance the transistor density.
[03:27] scaling is continuing not just by scaling down but just by say bringing in new geometries, new materials in order to continue to advance the transistor density.
[03:30] scaling down but just by say bringing in new geometries, new materials in order to continue to advance the transistor density.
[03:32] new geometries, new materials in order to continue to advance the transistor density.
[03:34] to continue to advance the transistor density.
[03:36] density. The same holds for memory and of course also the way how we combine memory and processing.
[03:38] The same holds for memory and of course also the way how we combine memory and processing.
[03:40] also the way how we combine memory and processing.
[03:43] memory and processing. So in the past we had these DIMM slots you suddenly remember vertically on the board.
[03:45] So in the past we had these DIMM slots you suddenly remember vertically on the board.
[03:47] you suddenly remember vertically on the board. Nowadays we have high bandwidth memory that is integrated on the same processor, on the same package, very close to the accelerator or the processor.
[03:50] board. Nowadays we have high bandwidth memory that is integrated on the same processor, on the same package, very close to the accelerator or the processor.
[03:52] memory that is integrated on the same processor, on the same package, very close to the accelerator or the processor.
[03:54] processor, on the same package, very close to the accelerator or the processor.
[03:56] close to the accelerator or the processor.
[03:57] processor. And of course, the communication within the system.
[03:59] And of course, the communication within the system.
[04:01] the system. Um going from electrical to optical as
[04:04] Um going from electrical to optical as discussed already by various speakers.
[04:07] discussed already by various speakers.
[04:08] And this morning when Richard introduced me, he welcomed me back.
[04:11] and I added here a slide of work that we were doing and as um.
[04:16] and and something I was showing say 15 years ago, this is from 2011.
[04:22] And this was our road map at that time.
[04:25] on integrated optic integrating optics in computing systems.
[04:29] And you see here at the top 2008 is again the system that I was showing you before.
[04:35] Um and the reason that this system is interesting also from an optical point of view is that it was the first system where pluggable optics was actively used.
[04:45] And then in 2011, IBM built a system where massive optical communications was introduced.
[04:52] It was again a supercomputing system where co-packaged optics, we didn't use it, we didn't call it co-packaged optics at that time, but where co-packaged optics was actively introduced into the system.
[05:06] was actively introduced into the system as a fixed base solution.
[05:08] as a fixed base solution.
[05:09] And based on this, we had this road map.
[05:12] based on this, we had this road map integrating optics deeper and deeper into the system as you see indicated.
[05:17] into the system as you see indicated um onto the processor package.
[05:20] onto the processor package and especially also communication of optics into the board and as you see at the lower lower line, bringing optics and electronics closer and closer together.
[05:25] optics into the board and as you see at the lower lower line, bringing optics.
[05:28] the lower lower line, bringing optics and electronics closer and closer.
[05:30] and electronics closer and closer together.
[05:31] together also with say within the chip as we all see happening now today.
[05:34] also with say within the chip as we all see happening now today.
[05:37] see happening now today. But there was a big big disruption.
[05:39] But there was a big big disruption. Let me talk a little bit more about this system from 2011.
[05:41] me talk a little bit more about this system from 2011.
[05:44] system from 2011. It was called PERCS or P775 supercomputing system.
[05:47] It was called PERCS or P775 supercomputing system.
[05:49] supercomputing system. You see here the board with in the end um say all the different modules, also a lot of in the end these were kind of switch modules where all the optics was introduced.
[05:52] You see here the board with in the end um say all the different modules, also a lot of in the end these were kind of switch modules where all the optics was.
[05:54] um say all the different modules, also a lot of in the end these were kind of switch modules where all the optics was.
[05:57] lot of in the end these were kind of switch modules where all the optics was introduced.
[05:59] switch modules where all the optics was introduced.
[06:00] introduced. You see here at the bottom the substrate on which the switch chip is integrated as well as all the sites.
[06:03] You see here at the bottom the substrate on which the switch chip.
[06:05] the substrate on which the switch chip is integrated as well as all the sites.
[06:08] is integrated as well as all the sites where in this case 56 VICSEL based transceivers were attached.
[06:16] Um And now imagine we are going to put this together.
[06:20] And maybe first imagine how it is to put an electrical system together.
[06:25] You solder your chip on your substrate, you put your substrate with the chip on the board and that's it.
[06:32] In this case, what happened is that you solder your chip and then 56 modules have to be attached.
[06:42] As the next step, all the fiber cables need to be attached and routed on the board to find the right spot at the front of the board in order to make the connection to say other boards and other racks, right?
[06:57] So, what we see here is that optics brings in a tremendous amount of additional assembly overhead.
[07:03] So, what was the consequence at that time?
[07:08] what was the consequence at that time?
[07:08] This whole road map where we were thinking, "Wow, now optics is coming."
[07:14] It didn't come at that moment in time.
[07:17] Um [snorts] and what are the reasons?
[07:18] What are in the end the reasons behind this?
[07:22] It's what I indicated, the assembly overhead, but I think there's more than that.
[07:26] And that's just the fact that optics optics is complicated.
[07:31] I mean, just look at an electrical link.
[07:33] What is it?
[07:36] It's basically a copper wire.
[07:38] >> [snorts] >> And maybe you have some electrical connectors in between.
[07:39] But when we go into optics, clearly the signals at the beginning and the end are electrical.
[07:44] So, we need a source, a laser, um modulator, driver, and then we have all these relatively delicate, complex interfaces optically, um and receive again amplifier and so on.
[07:55] So, there's many more components that come into it.
[07:59] So, it's not just the assembly overhead in itself as I indicated at system level, but already
[08:09] indicated at system level, but already at the transceiver level itself.
[08:12] It's at the transceiver level itself.
[08:12] It's much more that needs to come in.
[08:14] So, much more that needs to come in.
[08:14] So, it's really a challenge to make optics cost effective.
[08:16] Um cost effective.
[08:20] Um But, now here we are again.
[08:22] here we are again.
[08:24] And as discussed, I mean, if we look at the way how, say, in the end computing scaled,
[08:25] um there is a tremendous scaling over the last 20 years in compute itself.
[08:27] Um 60,000.
[08:30] 60,000.
[08:31] The communication between memory and processing was only a factor of 200 in the last 20 years, whereas the overall communication in the system scaled with a factor of 60.
[08:34] So, this means a tremendous change in the overall balance in the system.
[08:37] The amount of communication per compute um flop massively reduced over the last 20 years.
[08:38] But, now we have AI.
[08:40] So, we need to, say, correct this imbalance.
[09:12] We need to, say, correct this imbalance.
[09:14] So, looking at this picture again,
[09:16] looking at this picture again, how we are now go How are we now going to make sure um that we actively overcome this overhead of optics.
[09:24] I think that's one of the big big challenges that we need to address to really make this cost effective.
[09:32] So,
[09:33] So, I will show you a few things that we've been working on in the past and that we continue working on at IBM.
[09:39] Um, there's this one concept called adiabatic coupling.
[09:42] If we have integrated optic chips, a big challenge is the the the sorry, the linking of the integrated optic waveguide to the fiber.
[09:54] And the integrated optic waveguide, silicon or silicon nitride, is very small, submicrometer, whereas the fiber itself, the core, is say roughly 10 micrometer.
[10:02] So, then how do we do this transition?
[10:05] There's a whole range of optic options doing mode expansion, creating couplers and so on.
[10:09] Something we've been focusing on quite a bit
[10:14] we've been focusing on quite a bit is is adiabatic coupling.
[10:16] is is adiabatic coupling.
[10:19] What you do is you taper the silicon or the silicon nitride waveguide.
[10:21] In the vicinity, you have another waveguide.
[10:24] Um, in our case, it's a polymer waveguide.
[10:25] And this way, by tapering, you adiabatically transition the mode
[10:28] from being in the um, say silicon waveguide, if the silicon is broad enough,
[10:33] to the polymer.
[10:35] The advantage of this approach is that it is wavelength independent and polarization independent, if you design it the right way.
[10:37] But also that it can be super high density, and density was mentioned already a few times today.
[10:41] Because you can put these waveguides very close together.
[10:43] So, what you can imagine is that you have a polymer waveguide structure that you fan out from the high density at the chip level to the density you can get where you connect to a fiber array.
[10:46] And that's one of the things that we've been doing, for example, um, in collaboration with our colleagues in um,
[11:14] in um, uh, Bromont, in um, uh, Bromont, in um, uh, in Canada, which is an IBM assembly site.
[11:19] Um, and what you see here is just an overall, um, say stack over transceiver with the chip.
[11:27] Um, you see the polymer waveguide um, attached to the photonic chip with the fan out and then also the empty ferrule the connection where you can go to the fiber array.
[11:42] Now, there is just to say a few things also on scaling this.
[11:44] I mean, one thing we are doing currently is say scale this to higher density larger number of channels.
[11:54] But you can also think about say looking back at the previous slides, you can also for example think about a substrate where these polymer wave guides are integrated on the substrate already and you flip chip your photonic chip onto this substrate and simultaneously make electrical and optical connections at the same time.
[12:11] So, we believe this is a scalable approach where you can not just
[12:15] scalable approach where you can not just go to high density but also simultaneous.
[12:18] go to high density but also simultaneous electrical and optical interfacing.
[12:23] We've demonstrated that this concept is.
[12:25] We've demonstrated that this concept is say solder reflow compatible which is of.
[12:27] say solder reflow compatible which is of course super important for say in the.
[12:30] course super important for say in the end.
[12:31] end say bringing this together with.
[12:32] say bringing this together with electrical assembly.
[12:33] electrical assembly.
[12:34] I see the time is going pretty fast.
[12:36] I see the time is going pretty fast.
[12:37] Um.
[12:39] So, all these concepts are based on in the.
[12:41] all these concepts are based on in the end the von Neumann.
[12:43] end the von Neumann computing architecture where you have a.
[12:45] computing architecture where you have a separation between processing and.
[12:47] separation between processing and memory.
[12:48] memory.
[12:48] I want to say a few things about say.
[12:50] I want to say a few things about say ways to overcome this bottleneck.
[12:53] ways to overcome this bottleneck um by going to what's called in memory.
[12:55] um by going to what's called in memory computing.
[12:57] computing.
[12:57] And you do this by going for example to.
[13:01] And you do this by going for example to um these kind of crossbar arrays that.
[13:03] um these kind of crossbar arrays that you've maybe seen.
[13:05] you've maybe seen.
[13:05] Imagine again you have these neurons.
[13:08] Imagine again you have these neurons and these layers of neurons and you have.
[13:11] and these layers of neurons and you have all these signals that need to be.
[13:12] all these signals that need to be weighted.
[13:15] weighted.
[13:15] In the electrical domain, what you can do is say apply a voltage.
[13:17] what you can do is say apply a voltage over a resistor and the current you get
[13:19] over a resistor and the current you get is the voltage times one over the
[13:21] is the voltage times one over the resistance voltage times the
[13:22] resistance voltage times the conductance.
[13:24] conductance. That's immediately a multiply.
[13:26] multiply.
[13:26] If you take all these currents and you
[13:28] If you take all these currents and you just put them together into one output
[13:30] just put them together into one output wire,
[13:31] wire, you accumulate all these individual
[13:34] you accumulate all these individual currents.
[13:35] currents. So, you do a multiply accumulate
[13:38] So, you do a multiply accumulate um
[13:39] um directly in the analog domain.
[13:42] directly in the analog domain. And coming back to the compute effort,
[13:44] And coming back to the compute effort, this does not scale order and square,
[13:47] this does not scale order and square, but order one, independent on how large
[13:50] but order one, independent on how large your
[13:51] your matrix is. You immediately get your
[13:53] matrix is. You immediately get your output only limited by RC time
[13:55] output only limited by RC time constants.
[13:56] constants. And in principle, you can do the same in
[13:58] And in principle, you can do the same in the optical domain.
[14:00] the optical domain. The um
[14:01] The um multiply is, for example, a reflection.
[14:04] multiply is, for example, a reflection. And the accumulate is, say, you're
[14:06] And the accumulate is, say, you're receiving all reflected powers optically
[14:09] receiving all reflected powers optically onto one detector.
[14:11] onto one detector. The interesting thing is that in
[14:12] The interesting thing is that in electrical domain,
[14:15] electrical domain, the crossbar is set as the architecture.
[14:17] the crossbar is set as the architecture.
[14:20] There's different kinds of concepts to realize the weights,
[14:22] realize the weights, because this is not trivial, because in
[14:24] because this is not trivial, because in the end, what you need is a tunable
[14:27] the end, what you need is a tunable memristor, a non-volatile resist
[14:29] memristor, a non-volatile resist resistor, where you change the
[14:31] resistor, where you change the resistance,
[14:33] resistance, um and then it stays.
[14:35] um and then it stays.
[14:36] In the optical domain, you can build a crossbar, or you can do an
[14:37] crossbar, or you can do an interferometer, or a diffractive
[14:39] interferometer, or a diffractive element.
[14:41] element. There's all kinds of options.
[14:45] options. What I will just quickly say is um
[14:46] um we have worked a little bit on these
[14:48] we have worked a little bit on these interferometric structures um that
[14:51] interferometric structures um that also others have been working on, but we
[14:52] also others have been working on, but we do it in a little bit different uh
[14:55] do it in a little bit different uh architecture.
[14:56] architecture. We have these cascaded Mach-Zehnder interferometers, where in
[14:58] Mach-Zehnder interferometers, where in the end, you can do a convolution um
[15:01] the end, you can do a convolution um by [snorts] in the time domain.
[15:04] by [snorts] in the time domain.
[15:05] And what happens is that you have, for example, a pixel value that you give in.
[15:08] example, a pixel value that you give in.
[15:09] >> [snorts] >> Um
[15:10] Um you have
[15:12] you have in a first
[15:14] in a first um tunable coupler, a splitting.
[15:18] um tunable coupler, a splitting.
[15:20] And then you have a longer delay line, where um say,
[15:22] in the end, there's a time separation of this signal.
[15:27] Meanwhile, your second pixel value comes in and it's also split.
[15:31] What happens then in the next step, what you see is that the second pixel value,
[15:37] if this goes part of the light that goes through the shorter arm, will meet the light that went through the longer arm
[15:43] um of the first pixel.
[15:45] So, this way you can compare these individual elements and build an in-memory computing optical convolutional processor.
[15:53] So, we did this and we've demonstrated that this actually works.
[15:56] You can do, for example, um edge detection.
[16:03] Um and I see that I did not add the um [snorts] the graph of this on how this looks, but it nicely works.
[16:06] You can do edge detection and so on.
[16:08] You can also do more complex forms.
[16:12] Um but the interesting thing here is to show you um
[16:17] whether this now really makes sense or
[16:19] whether this now really makes sense or not to do this in the optical domain.
[16:21] not to do this in the optical domain.
[16:23] So, is this now more power efficient or not?
[16:24] not?
[16:26] So, we did an analysis of the full optical link.
[16:29] How much power do we need to detect in order to get the required accuracy
[16:32] in effective number of bits.
[16:36] Um what does this mean for the full system?
[16:38] What are the powers of all the components that you have?
[16:41] Um and if we do this, we come in the end to a value
[16:44] of, say, 2.5 tera ops per watt, which is not too bad.
[16:48] But it's not, say, massively better
[16:50] as what you get for a GPU.
[16:52] So, what does this say?
[16:55] If you go to in-memory computing, it's not automatically that things are better.
[16:57] You have to design the whole system in the right way.
[16:59] What happens here is that just the number of calculations we can do between the input and the output is not large enough.
[17:00] Um
[17:02] On the other hand, we do have something here that is clearly differentiating.
[17:04] We can do this with ultra-high speed.
[17:06] We've demonstrated it at 30 gigasample
[17:21] We've demonstrated it at 30 gigasample per second.
[17:21] You can go even higher.
[17:23] per second.
[17:23] You can go even higher.
[17:23] Um so, this is maybe not more power
[17:26] Um so, this is maybe not more power efficient, but certainly it's faster.
[17:28] efficient, but certainly it's faster.
[17:28] On the other hand, because the number
[17:30] On the other hand, because the number optics is large, integrated optics is
[17:32] optics is large, integrated optics is relatively large, so we cannot do two
[17:34] relatively large, so we cannot do two complex kernels.
[17:37] complex kernels.
[17:37] So, this is a way to do ultra-fast detection, for example, of
[17:39] ultra-fast detection, for example, of certain dedicated features in a signal
[17:42] certain dedicated features in a signal that's coming in.
[17:43] that's coming in.
[17:43] >> [snorts]
[17:43] >> [snorts]
[17:43] >> And then maybe in the electrical domain
[17:44] >> And then maybe in the electrical domain you can look in more detail in other
[17:46] you can look in more detail in other options.
[17:48] options.
[17:48] So, because of the time, let me slowly
[17:51] So, because of the time, let me slowly stop.
[17:51] What I want to say is, okay,
[17:53] stop.
[17:53] What I want to say is, okay, optics for computing in the short term,
[17:56] optics for computing in the short term, clearly communication, potentially for
[17:58] clearly communication, potentially for dedicated applications in the long run,
[18:00] dedicated applications in the long run, also for computing itself.
[18:02] also for computing itself.
[18:02] And with that, I would like to end.
[18:04] And with that, I would like to end.
[18:04] Thank you very much for your attention.
[18:07] Thank you very much for your attention.
[18:07] >> [applause]