# Scale Out Networks and Scale Up Architectures with CPO

https://www.youtube.com/watch?v=vmXzRsWW9S0

[00:03] All right.
[00:05] All right, my name is Rajie Pancholi.
[00:08] I'm part of my name is Rajie Pancholi.
[00:12] I'm part of the optical systems team at Broadcom.
[00:14] Let's get into it.
[00:14] Uh, do I use this to advance?
[00:18] Perfect.
[00:20] Okay, I'm going to steal some of slides.
[00:20] Uh, he's presenting next week at PEC.
[00:24] So if you guys want more detail on kind of this and he's really going to go into a lot of good stuff.
[00:29] So come and attend that.
[00:32] But I'm going to steal these from him because I think it frames where we are.
[00:35] So we started this uh difference between scale up and scale out.
[00:40] And we arbitrarily drew a line uh based on cost.
[00:46] And cost here is price per gig, power, latency.
[00:50] All right.
[00:54] And what Nvidia did, thanks Golad and team, right?
[00:57] They created this electrical back plane and now you have NVL72.
[00:59] So you can move
[01:04] and now you have NVL72.
[01:04] So you can move the scale up to in the rack.
[01:07] the scale up to in the rack.
[01:07] Okay.
[01:07] Uh that's great.
[01:11] Okay.
[01:11] Uh that's great.
[01:11] What we see is that now in order to move
[01:13] What we see is that now in order to move that even further, you we need an
[01:15] optical back plane.
[01:15] Okay.
[01:18] So now with optics CPO one of them uh you can move
[01:23] your scale up to now in the row.
[01:28] All right.
[01:28] Why why is that important?
[01:31] Okay.
[01:31] So uh NVL72 NVL 144 64
[01:37] 128 at some point you need to grow your
[01:41] scale up the single node scale up that
[01:44] you have for compute.
[01:44] And so
[01:48] in order to do that, you know, the the
[01:51] copper goes away and you can't have the
[01:54] reach and the amount of connect
[01:57] interconnect that you need to connect
[01:59] say a hypothetical
[02:02] 1024 single node XPU scale up, right?
[02:02] So
[02:05] 1024 single node XPU scale up, right?
[02:07] So let's let's look out in the future a little bit.
[02:09] So how would we do this?
[02:12] Okay, so first we need a 200T switch.
[02:17] Okay. Uh we have, you know, 25 terra uh on each GPU.
[02:20] And now you can see with optics I can have, you know, 16 racks with 64 GPUs in there.
[02:27] I can have four racks of switches, 32 in each rack.
[02:31] And there you go.
[02:34] I have a 1024 single domain scale up.
[02:37] Uh fiber count is important.
[02:41] So, we need to start looking at by um and a little bit more detail here on the rack.
[02:49] Okay.
[02:51] Okay. Great. Optical backplane.
[02:53] Well, what is what does it actually mean?
[02:54] What does it look like?
[02:57] How are we going to do it?
[02:59] You can see some of the ideas here that we've we've done with some of our partners.
[03:02] Uh you know, you need a fiber shuffle, you need kind of blindmate connectors, you need to worry
[03:07] blindmate connectors, you need to worry about cooling and power.
[03:09] So this is about cooling and power.
[03:12] So this is stuff that we're uh working closely uh with the industry to develop.
[03:14] with the industry to develop.
[03:17] Okay. So let's take that even further.
[03:19] All right. So now if I have half a million GPUs
[03:24] uh okay how do I connect those? Well today with an electrical backplane I need three tiers of scale out to connect all those GPUs.
[03:33] Three tiers of scale out means extra switching, extra power, extra latency, extra cost.
[03:39] Okay, with the optical back plane now you can take that 1024 multiply by 500.
[03:46] Now you have a one tier scale out that's connecting half a million GPUs,
[03:50] right? Hopefully you guys all were at ROM's uh keynote yesterday, but that's scale up Ethernet.
[03:57] Now you can start to see, look, I have a single protocol that goes from scale up to scale out and it's Ethernet.
[04:03] Ethernet. Anyway, uh attend Near's talk next week.
[04:05] Uh he'll have more detail. So, where are
[04:08] Uh he'll have more detail.
[04:08] So, where are we now?
[04:11] Uh, hopefully you guys saw CMAX presentation in the morning.
[04:13] Uh, they were talking about uh our Tomahawk 5 Bailey switch that has a Tomahawk 5 in the middle that you can see if I have a pointer.
[04:25] It's got eight 6.4T Bailey engines around it.
[04:27] Each of those engines is F FR4.
[04:30] Okay.
[04:30] Uh, and you can kind of see the systems there.
[04:33] Last year I presented a slide this slide you know it was kind of like look CPO with integration and you know you don't have module connectors you don't have you know maybe the dust is a little less and you know maybe the link flaps you know they should go down okay well meta's data from this morning this is CMAX slide uh 1 million hours no link flaps with CPO fantastic Right.
[05:01] Uh that's pretty significant.
[05:05] We also presented on power last year.
[05:08] We also presented on power last year.
[05:08] We said, "Hey, the box should be about 5.5 watts."
[05:12] And okay, meta's data 5.5 watts.
[05:17] 65% reduction over pluggables, rettimed optics, and 35% over LPO, 100 gig per lane.
[05:26] So this is all 100 gig per lane.
[05:26] All right.
[05:27] What does all this mean?
[05:31] Like let's okay this is the money slide right this is showing a 90% training efficiency improvement on a cluster of 24,000 GPUs based on the meanantime between failures for CPO versus pluggables.
[05:49] Okay, so we've been talking about CPO, you know, is it lower power?
[05:52] Is it lower cost?
[05:54] Is it reliable?
[05:57] Right?
[05:57] So I think we're getting to that.
[05:59] But this is now something different.
[06:02] This is now saving.
[06:04] You have all of these clusters running jobs.
[06:06] They're running and all of a sudden you have a link flap.
[06:08] So now everyone has to
[06:10] have a link flap.
[06:10] So now everyone has to go back to the last checkpoint.
[06:12] So what go back to the last checkpoint.
[06:12] So what did you just do with CPO?
[06:14] Now you've enabled those checkpoints and those training jobs to continue for a lot longer, making your cluster more efficient.
[06:21] Now this is really one of the key values we see in CPO.
[06:29] Okay, so uh we last week we announced Tomahawk 6 Davidson.
[06:33] It's a 200 gig per lane, 100TCPO.
[06:36] Hopefully you guys saw that announcement.
[06:39] We've been doing this for a while.
[06:41] Our Gen One uh that was actually also a 100 gig per lane.
[06:43] Uh that was with a Tomahawk 4, something we called Humbolt.
[06:45] We had a small uh deployment with Tencent.
[06:47] Um you know, it was a lot of good learnings there.
[06:50] We learned operationally, you know, what's happening.
[06:54] How do you replace the laser modules?
[06:55] What happens when, you know, the fiber connector goes bad?
[06:57] Used a lot of that into our gen two with Bailey.
[06:59] That's the one uh Meta presented on this morning that Semock and Verall did great job on.
[07:00] And then Davidson,
[07:14] did great job on.
[07:14] And then Davidson, third generation, third generation, fewer link flaps, better traffic, fewer link flaps, better traffic, lower power.
[07:23] 102T. You can see Tomahawk 6 is in the 102T. You can see Tomahawk 6 is in the middle.
[07:25] It's got 16 6.4 TDDR Davidson engines around it to utilize all the 512 radics from that TDDR Davidson engines around it to utilize all the 512 radics from that switch partners many more.
[07:38] This is a OCP. So we uh we're working closely with these that we've publicly announced but also working together with many more partners.
[07:48] And again, we want a healthy ecosystem for CPO, right?
[07:53] This isn't going to happen with just Broadcom, right?
[07:54] It's going to take Nvidia, it's going to take others, right?
[07:59] Contributions. Um, so, you know, with our our first generation and and our second gen with Bailey, we had a custom uh laser source.
[08:07] It was a double high QFPDD uncooled. It was actually really great uh engineering uh work from the
[08:15] great uh engineering uh work from the team and that enabled us to you know get to a point with shipments and and and getting systems out.
[08:23] But I think as as we move forward and and a lot more people are doing CPO you know we believe that that should be very standardized.
[08:30] we are moving to an ELSFP.
[08:32] uh looking at liquid cooled uh and so that can be something that you know uh even even the module uh machines that we have that are supporting all of the pluggables today they can easily start making uh ELSps
[08:48] and then you guys saw the optical backplane you guys saw in the chassis there's fiber routing you know we're working with a lot of partners there to come up with solutions and and obviously we want to contribute that uh to the community
[09:00] additional information check out our website, go to CPO, contact me.
[09:06] Thank you.
[09:14] Thank you, Rajie. Um, Rajie was always
[09:18] Thank you, Rajie.
[09:18] Um, Rajie was always comfortable on stage, but now he's so comfortable on stage, but now he's so comfortable thanks to all the hard work of Meta.
[09:23] Yeah, there's a lot of hard work from that.
[09:25] A lot of It's not me, it's the team.
[09:26] I'm just wondering how much more comfortable can you get like five years from now when you ship a million ports of CPO is going to be you're just going to sit in a chair smoke a cigar and
[09:37] I look I I think it's it's fun and something that at least the team at at Brock we all believe in and I think it's it's it's fun to be involved right
[09:46] welld deserved please go ahead
[09:49] hi uh I have a question about well uh I think one one of the push back to CPO whether it's scale up or scale out.
[09:57] So one push back is the manufacturing uh capability.
[09:59] The manufacturing process is not so mature.
[10:02] Uh there's a lot of like manual uh manu process uh involved.
[10:05] So so my question would be like where do you think we are uh for the uh manufacturing ecosystem uh how mature it
[10:18] manufacturing ecosystem uh how mature it is and how mature it could be in next is and how mature it could be in next like two years?
[10:24] uh where when do you like two years? uh where when do you think the the metric the ecosystem could think the the metric the ecosystem could be mature enough so for CPO to gain like 20 30% share?
[10:33] 20 30% share?
[10:34] >> Yeah, great. Thanks for that question.
[10:36] Uh so you know it's going to take a lot of data, a lot more reliability, a lot more data, a lot more reliability, a lot more shipments for scale out before you can trust it for scale up.
[10:46] Right? Once you go and scale up that's that link flap is very expensive, right?
[10:52] And so so moving it from the scale out to the scale up I think is going to take a lot more work but that development and stuff has to start now right we have to start planning for it now because as you know these cycles uh you know there's a new GPU every like you know couple years and in order to meet that cycle your timing has to be on so yes we are ramping capacity we need to improve right the amount of volume we can ship and and we're working to build
[11:18] can ship and and we're working to build all that yeah.
[11:19] all that yeah.
[11:22] okay thank Thank you.
[11:23] Go ahead.
[11:26] Oh, hi Reggie. Nice talk. You know, last year I asked you a question.
[11:27] You said that's an unfair question.
[11:32] I said uh and the question was uh if you can reduce the the price by as you had promised like about 45%.
[11:40] Uh I understand many people are reluctant to be you know all tied into single company for the total solution optics.
[11:47] uh are you uh are you putting that offer out because I saw you you know you were very you know you were willing to take you know purchase orders you know are you offering uh the price such that it would be at uh 45 let's say at the because most of the solutions have 50% are copper and 50% are pluggable optics how do you price your product uh to compete with pluggable optics from a from a price and cost?
[12:16] Sure. So, first of all, we're not the only ones doing CPO.
[12:19] only ones doing CPO.
[12:22] Uh, okay. Uh, so there's going to be a healthy uh way that our customers can modulate pricing.
[12:29] Uh, you know, and I think to your second point, uh, how do we price it?
[12:34] Well, you saw some of that data on 90% training efficiency.
[12:37] So, put the like how much did that just save a data center?
[12:41] So, you're not really willing to reduce that.
[12:46] Look, we're not going to comment on pricing, but look, I think the value of of the system and I think the value of the technology is there, right?
[12:54] And now you know what the benefit is to the data centers and to the operators.
[12:59] I mean, I think that there's that's significant.
[13:03] Okay, thank you.
[13:06] All right, next question.
[13:09] Question about the um copper stuff.
[13:11] I mean you guys I've seen presentation in the past where it's like there's going to be this combination of CPO and CPC.
[13:15] I does your package support both?
[13:18] I mean could I put a co-ackage copper wire on
[13:20] could I put a co-ackage copper wire on there just as just as much as I could.
[13:22] there just as just as much as I could put a CPO on there?
[13:23] put a CPO on there?
[13:23] Yeah, I mean we work with our customers, right?
[13:26] Uh what systems they want.
[13:28] Hypothetically you could put down an engine on half of it.
[13:30] Like our our Humbolt actually that first generation one, we actually did half electrical, half optical.
[13:32] So it had, you know, four engines and then, you know, the rest was copper, right?
[13:34] So traces to the front panel.
[13:36] So you could envision if you need a half half CPU, you know, half CPO, maybe half, you know, uh, something that you configure.
[13:39] Yeah.
[13:40] I mean, if that's what the end customer really wants and, you know, we're having in your architecture you've shown, do you see that happening where you'll have you'll be selling a lot of switches that are just, you know, random assortments of copper and optics?
[13:42] I I think Galad said it really well in the last talk.
[13:44] um you know there's a use case for all optical CPO there's a use case for pluggables there's a use case for maybe a half half solution so it's just going to depend on kind of you know what the need is on on kind of the stack.
[14:21] what the need is on on kind of the stack.
[14:22] okay thanks.
[14:24] yeah thanks.
[14:25] go ahead.
[14:28] uh you showed a a chart with your offerings from 2024 all the way to 2028.
[14:32] offerings from 2024 all the way to 2028 at what point is it is the production transitioning completely to TSMC.
[14:34] at what point is it is the production transitioning completely to TSMC uh for the pick and uh.
[14:37] transitioning completely to TSMC uh for the pick and uh.
[14:39] the pick and uh sorry could you repeat the rest of that.
[14:41] sorry could you repeat the rest of that what part is it transferring where.
[14:43] what part is it transferring where.
[14:44] to TSMC.
[14:49] okay so so for Davidson uh we are using coupe uh so that's a process from TSMC to attach our EIC to the pick uh and so.
[14:52] coupe uh so that's a process from TSMC to attach our EIC to the pick uh and so.
[14:56] to attach our EIC to the pick uh and so so that it was non-TSMC based solutions.
[15:00] so that it was non-TSMC based solutions.
[15:03] uh before that we had you know uh most of our cos is done at TSMC but we had other ways to do the pick and and the attach um that I think are public.
[15:05] of our cos is done at TSMC but we had other ways to do the pick and and the attach um that I think are public.
[15:07] other ways to do the pick and and the attach um that I think are public.
[15:10] You can look those up.
[15:12] Uh but yeah, uh with Davidson, I think, you know, because we see the large volume ramp that we need to hit, it's nice to have TSMC as one of our partners.
[15:15] Davidson, I think, you know, because we see the large volume ramp that we need to hit, it's nice to have TSMC as one of our partners.
[15:18] see the large volume ramp that we need to hit, it's nice to have TSMC as one of our partners.
[15:21] Our partners.
[15:21] Thank you.
[15:25] All right, with that, let us thank Rajie.
[15:28] All right, with that, let us thank Rajie one more time.