# WHY CO-PACKAGED OPTICS FAILED IN 2011 - AND WHY IT WON'T THIS TIME : IBM Europe

https://www.youtube.com/watch?v=-Y6UQEGG2N4

[00:09] One, two, three. Welcome back everybody.
[00:11] I hope you all enjoyed a great lunch and
[00:14] you're all fed and watered ready for
[00:16] this next uh session where we'll discuss
[00:19] through a keynote talk followed by by
[00:22] experts from three incredibly innovative
[00:25] startups on how Phutonix can enable the
[00:28] next generation of AI in both compute
[00:32] and the networking layer. Please allow
[00:34] me to first welcome and give the floor
[00:37] to Professor Bert Afrain. I hope you
[00:39] won't mind me. uh saying industry
[00:42] veteran and manager of co-packaged
[00:44] optics for IBM research in Zurich,
[00:47] Switzerland. Bert, the floor and the
[00:49] attention of everyone is yours. Welcome.
[00:51] Thank you.
[00:52] >> Thank you. Thank you, John.
[00:56] Okay, a veteran. Let's see. I want to
[00:58] start with this picture. This is the IBM
[01:00] research lab in Zurich where I'm
[01:02] working. And it's the first lab for IBM
[01:05] outside the United States. And I show
[01:07] this picture because this year we
[01:09] celebrate our 70th anniversary just for
[01:11] your information. So we are very well
[01:14] embedded in the European research
[01:16] landscape connections with universities,
[01:18] companies but of course also very
[01:20] strongly integrated in IBM research and
[01:22] IBM business units uh in the US and in
[01:25] Canada out of my team for example.
[01:28] So I am embedded in the semiconductors
[01:32] um field um more specifically in chiplet
[01:36] and advanced packaging.
[01:40] So
[01:42] my presentation I want to start with
[01:43] this picture um talking about veterans.
[01:48] This is roadrunner. It's uh the first
[01:51] supercomputing system um that reached
[01:54] one petaflop.
[01:56] And why do I show this picture? For
[01:58] several reasons. Um
[02:02] the first one is and we've seen this
[02:04] before. Um talking about say the scaling
[02:08] of AI inference and training and the
[02:10] compute effort. If you think back of the
[02:13] previous picture 2008, the strongest
[02:16] supercomputer is that long ago? 2008 or
[02:19] not? Okay. As a veteran, I would say
[02:21] that's not long ago, right? But this
[02:24] system now needs 10,000 days to train um
[02:28] one of the modern neural networks. Just
[02:30] to put this in perspective, just not
[02:32] looking at a graph alone, right?
[02:35] So what is it that it's all running on?
[02:39] It's the hardware that say is the
[02:43] transistors, the logic, the circuits and
[02:46] the memory, different technologies for
[02:49] memory. Um and all of this continue
[02:52] scaling and
[02:55] the transistors whereas maybe say 10
[02:58] years ago we were saying semos scaling
[03:00] is over now we see things continue but
[03:03] especially what is important is scaling
[03:05] of packaging bringing memory closer and
[03:09] closer to the accelerator and the
[03:13] processing unit going from for example
[03:15] the dims that we had 101 15 years ago to
[03:17] high bandwidth memory packaged closely
[03:19] on the um on the carrier substrate and
[03:23] of course communication um bringing it
[03:25] close on the package as well.
[03:29] And IBM continues to be involved in all
[03:32] these developments through
[03:34] collaborations also out of research with
[03:36] semiconductor companies with um
[03:39] manufacturing and oxide companies in
[03:42] order to advance this technology
[03:44] and out of say research and also out of
[03:47] my team in Zurich we are directly u
[03:51] connected to this.
[03:53] So talking about optics, we've been
[03:58] talking a bit about the real challenges
[03:59] that we have in optics. I think the
[04:02] basic challenge of optics is that it is
[04:04] complicated.
[04:06] So if you just look at an electrical
[04:08] link,
[04:10] you have copper wires, maybe you have
[04:12] some connectors in between, but your
[04:14] signals start electrical, the end
[04:16] electrical, and all in between is in
[04:18] principle relatively simple. Now look at
[04:21] this. If we go to an optical system,
[04:24] um we need this tremendous amount of
[04:26] additional components, the lasers, um
[04:29] the modulators, special drivers to drive
[04:32] these modulators. And that once we have
[04:35] all these super complex interfaces,
[04:36] optical that need to be clean, very
[04:38] accurately aligned, multiplexors, de
[04:41] multiplexers as well as amplifiers. So
[04:44] optics has a lot of advantages but it
[04:46] comes also with a tremendous amount of
[04:48] additional effort, additional components
[04:51] and additional assembly effort. Um and I
[04:54] think that is the basic challenge that
[04:55] we need to address. I was looking back
[04:58] through my slides that I was presenting
[05:01] talking about a veteran again. This was
[05:03] a slide I had back in 2011 and this was
[05:06] our road map and as you see it was still
[05:08] a little bit an old-fashioned computing
[05:09] system with these memory dims but for
[05:13] the rest we had again this system from
[05:16] 2008.
[05:18] Why do I put it here? This was the first
[05:20] computing system where pluggable optics
[05:22] was applied. This roadrunner system and
[05:25] then in 2011
[05:27] IBM built a system called perks. I will
[05:30] show you a bit more about that on the
[05:31] next slide which was a supercomputing
[05:33] system which was the first system where
[05:36] in the end cop co-ackage optics was
[05:39] actually applied. We didn't call it
[05:41] co-ackaged optics at that time. We said
[05:43] first level package optics or something
[05:46] like this but it was co- packaged
[05:49] optics. Um and then we had this cool
[05:52] road map um all the things we are
[05:54] talking about further integrating optics
[05:58] into the system into the board making
[06:01] massive connectivity. We did a
[06:03] demonstrator um at that point project
[06:05] called Terabus where we had polymer
[06:07] waveguides on the board and we could do
[06:09] massive connectivity between two kind of
[06:12] processor packages.
[06:14] um and then going to deeper integration
[06:16] electrical and optical also at chip
[06:18] level.
[06:20] So that was 2011
[06:23] and IBM then built this system as I
[06:26] mentioned
[06:27] and let's have a little bit of closer
[06:29] look what did this now mean. So you see
[06:33] here this package with the processor or
[06:37] in this case it was a ship switch chip
[06:39] and you see there 56 sites where the
[06:43] optical transceiver can be assembled
[06:46] and this was a vixelbased system. So
[06:49] each transceiver had I think 12 channels
[06:51] first at 10 later at 25 Gbit per second.
[06:55] And now just imagine you build an
[06:57] electrical system. What do you do? You
[07:00] assemble your chip on your carrier
[07:02] substrate. You set your put your carrier
[07:04] substrate onto the board and you're
[07:07] done. In electronics, you just make
[07:10] thousands of connections in one assembly
[07:12] step. Now, we have the optics in
[07:15] addition. So, at once, you need to build
[07:19] all these optical components with all
[07:21] the building blocks that we discussed.
[07:23] Uh, but you need to assemble them on
[07:25] this carrier substrate 56 times. And as
[07:29] a next step, you need to route the
[07:31] fibers and need to make sure that every
[07:33] fiber or fiber cable from every element
[07:36] finds the right spot at the front of
[07:39] your chassis. So it's a huge effort.
[07:44] So what happened
[07:46] at that time in 2011? We thought, wow,
[07:49] this is it. Now optics will be there
[07:51] massively.
[07:53] It didn't happen.
[07:55] Industry found ways around using optics.
[07:57] If you also look at the way um say the
[08:01] amount of communication that was in the
[08:03] system um over time it reduced
[08:07] tremendously. In the beginning it was
[08:09] roughly one p one bite communication per
[08:11] flop. Now it's less than 100 of a bite.
[08:14] Right? So all this did not happen at
[08:18] that time.
[08:21] But here we are again and you've seen
[08:24] the chart the chart uh on the lower
[08:26] right hand side before. Um what we've
[08:29] seen over time is this tremendous
[08:30] scaling of compute and the lack of the
[08:33] communication to the memory as well as
[08:35] the overall system communication
[08:38] and we are now in the AI era and things
[08:41] have changed and in fact we see a kind
[08:45] of reversal. Um in the past IBM was
[08:49] building systems with multi-chip modules
[08:52] where you had many processors on one
[08:54] carrier substrate. Um that was reduced
[08:59] because compute advanced so quickly
[09:02] um and now we are in the situation that
[09:05] we go back to multi-chip modules um and
[09:08] we need to advance the communication by
[09:10] bringing the optics closer. So here we
[09:13] are again.
[09:15] But what can we now do to overcome that
[09:18] the same issue will happen again? And I
[09:21] think that's based on what I stated
[09:24] before. We need in some way to overcome
[09:26] the overhead that optics brings in.
[09:30] So what I want to do is discuss a few of
[09:33] the concepts that we are working on. Um
[09:35] and let's see. So I'm in research but we
[09:41] have a direct collaboration with uh an
[09:44] IBM OSET um business unit in Canada in
[09:48] Bmont. Um they are key for also doing a
[09:52] lot of the assembly for the IBM
[09:54] mainframe systems as well as for the
[09:55] data central systems from IBM. They do
[09:59] modules subasssemblies and and full
[10:01] systems. Um and on the module side it's
[10:05] copackage copper direct fiber attach
[10:08] with V-grooves as well as um copackage
[10:11] optics um polymer waveguide attach. I
[10:13] will show you a little bit more about
[10:15] that. So this technology is available we
[10:19] use it for in-house as well as as a
[10:21] service for um external companies.
[10:24] roughly 20% of the whole um service uh
[10:28] done at BMont is for IBM internal 80% is
[10:32] for external clients.
[10:35] So just to show you a little bit of
[10:37] visualization on how we envision this
[10:40] polymer waveguide system. Um you see the
[10:44] carrier substrate and then you see here
[10:46] this fan out of the uh optical chip
[10:49] where what we do is we attach through
[10:51] adabatic coupling a polymer wave guide
[10:54] to the optical chip
[10:57] um to have a low loss connectivity
[11:00] between the chip and the polymer wave
[11:02] guides.
[11:04] um we can do a very high density
[11:06] interconnect at the chip site and then
[11:08] do a fan out as you see indicated here
[11:11] in order to do a connectivity to um to
[11:14] um a fiber connector an empty.
[11:19] What is very critical is that if we do
[11:22] these kind of assembly concepts um that
[11:25] we are able to integrate this whole
[11:27] process flow into um electrical
[11:30] assembly. So what we've done is make
[11:33] sure that um all the processes and also
[11:36] the say stability for example these
[11:38] polymer wave guides are solar reflow
[11:41] compatible which is what you see here.
[11:45] So
[11:47] this polymer waveguide approach um that
[11:50] we are using on the right hand side you
[11:52] see a little bit how this adabetic
[11:54] coupling concept is working. You have
[11:56] the silicon waveguide or the silicon
[11:58] nitrite waveguide on chip. We taper that
[12:00] down. Uh and through tapering it down,
[12:04] you force the light adabetically to
[12:06] transition as a super mode that is for
[12:09] the white silicon waveguide completely
[12:11] in the silicon to transition into the
[12:13] polymer. And the polymer waveguide in
[12:15] our case is made compatible um from a
[12:18] size mode size to the fiber.
[12:22] What do you do you you see on the left
[12:24] hand side is a way how we could for
[12:27] example extend this concept. So
[12:31] currently we are attaching a flex cable
[12:33] and um so it's a pure optical connect
[12:38] but if you think about
[12:40] putting these polymer wave guides onto
[12:43] the carrier substrate
[12:46] um and then do a flip chip attach as you
[12:49] see indicated there. Um you can imagine
[12:52] that we have a direct simultaneous
[12:55] electrical and optical connectivity of
[12:58] the uh chip to um to the system. Right?
[13:02] So in one assembly we do electrical as
[13:05] well as optical attach.
[13:07] So we did not yet show that but one of
[13:11] the things we showed is the optical
[13:14] connectivity. What you see here is a
[13:16] glass substrate with these polymer
[13:18] waveguides.
[13:19] Um and we have a silicon photonics chip
[13:23] with waveguides with these adabetic
[13:25] tapers. We do a flip chip attach of this
[13:28] chip onto this substrate with these
[13:30] polymer waveguides. Um and transition
[13:33] then from one input waveguide to another
[13:35] in order to visual visualize and see
[13:37] that we indeed have the coupling. Um and
[13:40] what we do here is in one attach we make
[13:44] 100 optical connections just in one flip
[13:46] chip attach. Right? So that's one of the
[13:48] challenges that we have in optics. While
[13:51] in electronics we can do say many
[13:53] connections at the same time in optics
[13:55] it's just still a few and this is a path
[13:57] forward to really get to many
[14:00] connections simultaneously and then
[14:02] potentially even simultaneous with
[14:04] electronics.
[14:07] um
[14:08] one step further and this is a project
[14:11] that we do together with DARPA is if we
[14:14] think about electronics
[14:17] um and we have a connectivity between
[14:20] chips mounted on a substrate um we have
[14:23] the connectivity in the silicon chip in
[14:27] the back end of line um the C4 attached
[14:30] to the substrate and then the
[14:31] connectivity through the substrate all
[14:34] electrical
[14:36] the goal here is to build simil similar
[14:39] functionality
[14:40] in the optical domain. So super high
[14:43] density optical interfaces. Um the goal
[14:46] here is say to do 3D routing of optical
[14:50] waveguides um where the pitch between
[14:53] the waveguides is as small as 3
[14:55] micrometer um with integrated turning
[14:58] elements for say in plane as well as out
[15:01] of plane redirecting of the light. um
[15:04] and then connectivity with bonding of
[15:06] chip to chip um connectivity through the
[15:09] substrate as you see indicated here. So
[15:11] this is a project we are currently
[15:13] driving. I still view this as something
[15:16] that is say more clearly further out but
[15:19] basic elements already I think also make
[15:21] sense to the discussion we are having
[15:23] here today in optical connectivity for
[15:26] CPO.
[15:27] So here you see a few of the basic
[15:29] building blocks we need to make through
[15:31] optical VAS, vertical VAS, the turning
[15:35] elements in order to go from vertical to
[15:37] optical connectivity.
[15:39] Um and these are all processes we are
[15:42] establishing in um IBM Yorktown Heights
[15:46] in the US as well as V and Zurich in our
[15:49] clean room.
[15:51] Um what you see here is an example of
[15:55] these um vertical waveguides that we
[15:57] have realized that we can measure from
[15:59] the top and then measure in the end the
[16:02] resonance um of the light coupled in in
[16:05] order to estimate what kind of losses we
[16:07] have just as a first visualization of
[16:10] this. Um similar um these are lateral
[16:14] mirrors um integrated for example in a
[16:16] ring resonator to estimate the losses.
[16:19] um we are currently at roughly 1 dB need
[16:22] to improve this further. We also have
[16:24] already similar losses to um for the
[16:27] vertical mirrors. So I view this kind of
[16:31] technology as a step also on the shorter
[16:35] term to get high density interfaces and
[16:38] say vertical redirecting of the light as
[16:40] is required also for example for
[16:42] detachable connectors. Um but overall uh
[16:45] what I want to say is we need more
[16:47] optical communication.
[16:49] Um we need tight integration of the
[16:51] optics as already presented by several
[16:54] people. Um
[16:56] uh of course several technologies are
[16:59] under evaluation.
[17:00] Um and the kind of technology that will
[17:04] say be applied for the various
[17:06] applications just depends on the local
[17:08] requirements. But a basic challenge and
[17:11] I think that will also drive a lot of
[17:13] say the choices that are made for these
[17:14] various concepts. The main aspect we
[17:16] need to in take into account is how to
[17:19] handle this overhead that optics brings
[17:21] and I think that's on the one hand
[17:23] through integration wafer level
[17:25] assembly. Um but I think this should not
[17:29] be limited to say the first level
[17:31] package. We also think need to think
[17:33] about as shown for example by and others
[17:36] as well Intel. We also need to think
[17:38] about the full assembly in the system
[17:42] onto the board and um say enable massive
[17:46] connectivity without the assembly
[17:48] overhead that we are seeing today. With
[17:51] that I would like to thank you very much
[17:53] for your attention.
[18:00] Thank you so much. Uh I think you know
[18:02] what's coming. Uh what can you do for
[18:03] others and what can others do for you?
[18:06] The kind of research that we've been
[18:08] doing has changed a bit over time. So in
[18:11] the past we were driving a lot of
[18:13] technology development ourselves in
[18:15] research in our clean room and then we
[18:18] did kind of licensing of this technology
[18:21] to potential partners. We've done a lot
[18:23] of that in my team. Right now it looks a
[18:25] little bit different. We have come say
[18:27] to a situation we are where we are more
[18:30] um say higher level of maturity. we are
[18:33] doing more say system design, device
[18:36] design but collaborate with partners in
[18:38] order to deliver the technology. So I'm
[18:41] very open to discuss with say you to see
[18:45] what kind of innovation we can bring
[18:47] into this. That's for sure.
[18:50] >> Thank you Darren.
[18:52] >> Yeah. Hi Darren Burns from Idex. Um, we
[18:55] were having an interesting conversation
[18:57] at lunch about the differences between
[19:00] uh engineering challenges and material
[19:03] science challenges and how often times
[19:06] if it's a materials problem, it takes a
[19:09] lot longer to solve for sometimes than
[19:11] an engineering problem. Um, do you see
[19:14] we have any major roadblocks associated
[19:19] with materials as we move through this
[19:21] transition? um you know and I think
[19:23] about polymer versus glass wave guides
[19:26] and the trade space between that is that
[19:29] are there a set of material science
[19:30] problems just to deal with still or is
[19:33] this really in the realm of let's just
[19:36] go solve some engineering problems and
[19:37] be done with it
[19:38] >> oh for me it's clearly more than
[19:40] engineering problems I would say right I
[19:42] mean there's still a lot of
[19:45] exciting material related aspects that
[19:48] need to be solved I think also in the
[19:50] presentations that we've seen before I I
[19:52] mean to some extent I'm really amazed
[19:54] how new functionalities arise because of
[19:59] new material concepts or new processes
[20:01] that are being used. Um I think in the
[20:05] end this also goes into the engineering.
[20:07] I mean if you engineer something out um
[20:10] and you want to make a reliable solution
[20:12] you really have to understand your
[20:14] materials ins and out. So for me there's
[20:16] not a direct discrepancy between
[20:18] engineering and material science. It
[20:20] goes hand in hand.
[20:24] >> Please tell us your name and your
[20:25] company.
[20:25] >> Okay. So um uh my question is um has
[20:28] some overlap. So with the first question
[20:30] so you mentioned this um polymer whip
[20:33] guides what is technology behind
[20:36] um to create such structures is based on
[20:39] 2ppp or griskllithography or
[20:41] conventional lithography. The second
[20:44] question is have you um tested the
[20:46] reliability of the polymer material
[20:50] um for example to check if can pass the
[20:54] this so-called yet reliability standards
[20:59] or reflow um compatibility.
[21:02] >> Mhm.
[21:02] >> So
[21:03] >> so we we do not realize these polymer
[21:05] waveguards ourselves. We purchase them.
[21:08] So this is a quasi commercial offering
[21:11] by another company. Uh but we did solder
[21:13] reflow testing and we did the tordia
[21:15] testing and it passed this.
[21:18] >> Cool. Thanks.
[21:20] >> Any further questions for bird?
[21:25] >> If not, uh let's thank you so much.
[21:28] >> Thank you.
