# K8s Maxxing with AI-Native Platform Engineering Stack with OpenChoreo

https://www.youtube.com/watch?v=XZ-ya4rqRKA

[00:00] Hey, hey, hey.
[00:26] Heat up here.
[01:19] Hey.
[01:19] Hey.
[02:03] Hey,
[02:11] hey, hey.
[02:21] You're welcome.
[03:31] Heat.
[03:31] Heat.
[03:55] Welcome to the show.
[03:55] Another Thursday, another live show and some friends are with me today.
[03:59] I'm excited to get into it.
[04:00] So, we're going to skip all the news and formalities and just say, "Hey, if you didn't know, this is also a podcast.
[04:06] Been doing that podcast thing for many years, like eight years now.
[04:08] So, in case you didn't know, this is actually there's two podcasts now.
[04:11] So, we talk about Kubernetes, Docker, and Cloud Native things on the left.
[04:17] And if it's very specifically about the topic of AI and our DevOps and how we use agents to manage everything, like that's that's a new podcast that came out last year.
[04:28] So, Dealer's Choice, you could do both.
[04:30] You
[04:32] I recommend both.
[04:34] I think they're pretty great.
[04:36] The guy who runs them is, you know, he's not he's all right.
[04:38] But, but the guests, they're they're they're the reason you listen to that show.
[04:41] So, this is another episode.
[04:43] We record it live, so you can be here and ask us questions and talk to my guests.
[04:48] And that's why we're here.
[04:50] Let's do it.
[04:52] Welcome to the show.
[04:54] We got in the middle there, we got Luck and on the right, Samira, both from WSO2.
[04:58] Uh Luck Mall, tell us tell us about yourself.
[05:00] Who are you?
[05:02] Why are you here?
[05:04] How did you get this how did you get this gig to be on the internet?
[05:07] Yeah.
[05:10] Uh I am uh working as a VP and distinguished engineer at WSO2 also a coro uh DMO for coro.
[05:16] Yeah, I started my career as a system engineer uh in 2001 and plus 25 years I engaging with building many many platform.
[05:27] Yeah, here we are today.
[05:29] Nice.
[05:29] Samira, how about you?
[05:32] Hey Brett, I'm good.
[05:36] Um, I'm Samir Jas, one of the co-maintainers of Open Korea at W2.
[05:41] Um, I started my career in the middle as you know, you can think of programming language design, compile engineer, platform architect, and then you'll you'll hear more about how abstractions and compilation throughout this talk.
[05:51] That's me.
[05:53] We'll get a gist by the end of what you really are into.
[05:55] I'll get it.
[05:58] Okay.
[05:58] Um all right.
[06:00] So, you all reached out.
[06:00] This is a new project.
[06:04] Um um I'm always excited about new CNCF projects because it kind of it once they've accepted into the sandbox process, we kind of get an idea.
[06:11] I feel like it's reading the tea leaves a little bit.
[06:14] Not everything is a winner.
[06:14] Not every project makes it to graduation, but it's an interesting way, I think, to like separate sort of the hobby projects or the experimentations from things that teams are taking seriously because you don't really go for for sandbox status unless you're taking things seriously.
[06:27] So, I always I that's where I sort of like wake up and start paying attention to the to the new
[06:33] projects.
[06:34] And I'm excited to talk about this today because I'm obsessed with AI agents.
[06:37] I'm obsessed with like how we manage infrastructure with agents involved.
[06:41] How do we put MCP in our cluster safely?
[06:44] How do we how do we sit on our harness as our primary mechanism for interacting with um I don't know 20 to 40 years of complexity that we've all built up.
[06:56] Like the number of abstractions we all have to deal with today is insane.
[07:00] And it's amazing that anyone can actually go to a conference and have a common conversation because there's a thousand tools we're all using.
[07:06] So I'm excited to get into this.
[07:09] Um how did the project get started?
[07:11] deal.
[07:13] Who's going to who's going to pick the first question like how how did this project come out of the the company?
[07:17] Where was that idea from?
[07:20] Uh let me go first.
[07:24] Uh so we uh we started this uh uh software offering called Coro SAS 5 years back.
[07:30] uh it's uh it start as a uh the IP pass initially
[07:34] like a integration platform as a service.
[07:37] then it evolved into the you can run all other services within the platform itself.
[07:41] so it's it it evolved into the internal develop platform uh couple of years later.
[07:46] then uh we it's like a we name is also coming as a choreography.
[07:51] it's we shorten it to the choreo.
[07:53] it's like a choreography your microservices uh to uh in a in in a certain way.
[07:59] like a kubernetes do the orchestration.
[08:01] we do choreo cor choreography for all your microservices.
[08:05] that that how the name also came then uh.
[08:08] so we we were offering this as offering to our customers enterprise customers.
[08:12] it's fully managed by WSO2.
[08:16] then I think two years back we we are we were we were running coro version two and two years back we started coro version three project.
[08:24] so next version of the our current offering then eventually become the uh open coro.
[08:29] and uh in WSO2 all our software open source so we thought of.
[08:35] okay why not we are running we we we having this software running as our sales why not open source it that's how this open curio started then uh this year January we have donated into the CNCF foundation now we are a sandbox project
[08:52] yeah all happening so fast um that's kind that's a similar path by the way to how Docker was invented because it was a it was a platform company saying, "Hey, this piece of our platform is pretty interesting, but it's also at this point maybe a utility. Let's let's just give it away."
[09:08] Turned out that that worked. Um, so it's not a bad not a bad idea to do this. So, yeah.
[09:17] Uh, how does how does this project like what was the unique problem we were trying to solve with this? Like I say we I'm I'm a part of the team now. How is it how is it how does the team look at it from like this is a unique thing in a
[09:35] world where we've got dozens of Kubernetes dros?
[09:39] In fact first off I should say do you even call do you call this?
[09:40] Do you think of this as like a distribution of Kubernetes?
[09:45] Not really.
[09:45] No, like do you see it as like a vanilla Kubernetes with some things bolted on or like how is this different from like a like a rancher?
[09:59] Um okay, I'll I'll take that question.
[10:02] Um so I in my opinion so open coro is something you something we have built on top of kubernetes.
[10:10] Mhm.
[10:10] So you can use any any Kubernetes DRO brancher EKS AKS like any Kubernetes and then you can just install open query on top of it.
[10:20] Okay.
[10:20] I think the unique problem that it solve.
[10:22] Yeah.
[10:22] Yeah.
[10:23] Go ahead.
[10:23] No go ahead.
[10:25] Keep going.
[10:25] So the idea behind is that so you have Kubernetes and all the other tools and then so you have a developer platform.
[10:32] I think um what we have seen in the industry is so you can you have this
[10:36] developer platform with Kubernetes and AOCD and all the things um glued on and then you expose that layer to developers and you build the developer experience around it portals MCPS.
[10:48] um then we have figured out that that way you expose the platform complexity to your developers so they'll have to learn Kubernetes they'll have to learn all the tools I think what's unique about here is I don't know whether it'll work out.
[11:02] We build an middle layer abstraction layer on top of Kubernetes and other tools and that's what Lakmal said earlier through like six years we have figured out the abstraction layer that took some time for us to build and then that abstraction layer helps for us to build a developer experience layer.
[11:20] So developers they're like kind of abstracted away from Kubernetes per se.
[11:25] They can still see what's going on but um they may not they may not to learn a lot of things about Kubernetes other tools.
[11:33] And so that's one of the unique players.
[11:37] in the platform.
[11:39] So when we talk specifically about open open coro is it is it installing all the parts of this?
[11:45] Okay.
[11:46] So we're looking at for the for the audio audience we're looking at a visualization.
[11:47] Um, I love diagrams because they really help me.
[11:52] The conceptualness of like what a thing is is sometimes instead of a bullet list.
[11:55] I love these.
[11:58] So, uh, I brought that up and there's all sorts of it's to me it looks like a full-fledged fully built out Kubernetes cluster with all the extras, right?
[12:06] Like all the necessary things that a sort of an enterprise cluster would have.
[12:07] It's it's got these data complain components.
[12:12] You know, you're mentioning keta for scaling.
[12:13] You've got selium in the networking.
[12:15] you've you've got, you know, API gateways in there.
[12:19] And then you showed part of the diagram.
[12:20] There's the workflow part where it's Argo workflows and uh build packs in there.
[12:25] And then you've got observability layers where you've got probably I don't see Loki in there, but like Prometheus and all the Openelotel stuff and then over to the side there's open coro as this control plane.
[12:35] So it's almost like to me it's like the control plane on top of the
[12:41] Um, so is is this really just focused about those things on the left and you're showing how it interacts or is this actually helping us implement the rest of this puzzle?
[12:49] I was a little confused by that.
[12:51] Yeah, I I think the main idea behind the having the unified control plane across all these uh capabilities like uh so uh platform engineers or developers or even now AI agents can interact with control plane uh via different protocols and then control plane do the all the orchestration across all these other planes like you mentioned the data plane with the selium network policy with the scale zero uh with the kada.
[13:21] All these thing can orchestrate using a control plane with the governance enforce through the control plane.
[13:29] That's the main idea behind.
[13:29] So what what have seen as a problem even we expose all these services to the end end user it can be
[13:38] platform engineer it can developer then.
[13:40] you can't enforce the governance but.
[13:42] having this single control layer we can.
[13:45] have the guard raise policy enforcement.
[13:47] and single uh plane of glass we can see.
[13:51] everything within the control plane.
[13:53] So that help to orchestrate entire.
[13:56] different uh across entire different.
[13:58] planes that planes uh built using the.
[14:02] modular architecture in open coro.
[14:04] So for example if you take a API gateway it.
[14:07] can be come from the uh WS gateway K.
[14:11] gateway so Kong gateway it doesn't tie.
[14:14] to a single vendor so it's it's it's.
[14:17] it's building an entire ecosystem around it.
[14:19] uh it's not only the platform itself.
[14:22] users can pick and choose what they want to use in their data plane uh.
[14:28] so that the abstraction layer help to uh say.
[14:31] orate all these different window tooling that they want to use in the in the.
[14:35] their planes.
[14:37] Okay.
[14:37] Exactly. Yeah.
[14:39] So, so as an if I'm a platform engineer,
[14:42] and I'm listening to the show, it probably means I have platforms today, right?
[14:45] I've already got Kubernetes in production.
[14:47] I've already got a lot of these components.
[14:50] So is the idea here that this particular project is for implementing the left side of my screen of the specific open coro control plane components and then including those system components that would link or integrate me with the APIs of those other things.
[15:04] Is that how is that sort of where you're drawing your boundary of where your project ends?
[15:10] Yeah.
[15:10] Yeah, I think that that's a good way to put it Brett.
[15:12] So our main component would be the control plane.
[15:15] uh that that you see on the left of the screen and then at the same time there are certain system components that are running in all the other planes as well data planes, observability plane, workflow planes.
[15:31] So if you're already running these projects then it's a m it's just a matter of like installing our system components and organizing
[15:39] your platform architecture in this way.
[15:42] Yeah.
[15:43] Right.
[15:45] And then and then I think what Lakmar said about abstractions is that when you think about this developer platforms primary users are platform engineers as you said and then primary consumers are developers.
[15:58] Right.
[15:58] Right. So so they will perhaps they will use the experience plane UI MCP server and things like that.
[16:06] They will say I I want a project I want a component I want component to deployed on de environment.
[16:10] Right. and the control plane understands that and and executes and compile that into Kubernetes and other projects.
[16:18] So in a way that other projects understand right so that's how these arrows and everything that's how it that's that's that's the idea behind these arrows.
[16:27] Yeah. And when I'm looking at this diagram um I mean from from sort of a platform team perspective um I mean I recognize all the names right it's a lot of lot of lot of terminology uh from
[16:41] different from different uh CNCF projects but um are you thinking of this as AI is the sort of main interaction point I see here that we've got like there's I think there's a a web console or something like a phys another the back.
[16:57] Oh, it's using backstage UI.
[17:00] Yeah, using backstage.
[17:03] Okay. So, uh do you think about this from the perspective of like AI is going to probably be increasingly the way we interact with Kubernetes?
[17:11] So, this thing, you know, we've got these arrows in here of how things are interacting with the Kubernetes cluster.
[17:15] And it feels like it's all going through the open coro control plane to get there because you've got this list of like we could use the U backstage UI presumably that would be for humans only.
[17:25] Um, unless maybe there's some pretty graphs for the AI to look at.
[17:28] I'm not sure why it would use that.
[17:30] But then you've got MCP, CLI, and API as these other interaction endpoints.
[17:37] And they're not when I think when I see this diagram, I think, oh, this allows me to put another
[17:41] layer of abstraction for my AI to work through.
[17:43] And so I'm not running my AI directly against the cube control API, uh, or you know, the Kubernetes native API.
[17:50] I've sort of got this enhanced layer uh with additional functionality presumably maybe some I see you have authorization and access control which is a big thing for me lately like how do we how do we minimize the blast radius of giving agents access to different tokens and like not giving them root you know god rights to the cluster every day u which I see a lot of people doing by just giving them the cube control admin abilities is is that kind of the the goal of of where you see agents being responsible for all these clusters.
[18:22] Yeah, exactly. That that's that's that's the you describe this correctly, Brad.
[18:27] So, uh what we want to have is uh now when the developer platform, we we give the golden path, the guard rails, the policies to our human developer.
[18:36] The same way we want to expose this golden path to the AI agents that interact with
[18:43] within the platform.
[18:46] Uh let me try to share a a single slide in uh uh in my machine.
[18:50] Uh you can see this slide right?
[18:54] Yeah.
[18:54] Uh yeah.
[18:58] So in in open curio the the agents find first class citizens.
[19:00] So they they they can at the moment current release we we exposing two MCP servers.
[19:05] One is the open courier control pane MCP server.
[19:07] Other one is the open core observability MCP server.
[19:12] So even external agents can interact with this MCP as well as internal agent inbuilt agents can interact with MCPC servers.
[19:19] Now when when they interact with NCP servers they have the the permissions and all the guardrails will apply same as the developer uh uses in the platform.
[19:29] So that's the main idea behind it's a unified experience whether it's a human developer human s or a agent uh working with the uh s or as a developer.
[19:44] Very cool.
[19:46] All right.
[19:46] So this shows up as a whole bunch of MCP tools that I could plug in locally to my local.
[19:54] So if I've got cloud code running, I plug the MCP endpoints in.
[19:55] Um do you know like the estimated tool count like what are we what are we dealing with here?
[20:03] 100 100 tools 20 tools.
[20:06] Uh yeah uh we have uh depending on the the user persona interact with MCPA.
[20:09] So we have the developer persona we have the the SR persona they give different uh tooling based on the their role.
[20:20] Yeah.
[20:20] So based on that they can do different uh uh different uh activity.
[20:26] Uh again let me show roughly 100
[20:31] I would say 100 tools altogether
[20:34] a not insignificant amount of tools but that's a good point.
[20:36] So what you're saying is based on who I'm I'm aing as which role I'm aing as I I get a scoped set of MCP tools based on my permissions
[20:45] essentially.
[20:46] >> Yes.
[20:47] >> Yeah. Exactly.
[20:48] >> So uh here I am I'm looking as a super
[20:51] admin. This entire uh tools is exposed
[20:54] as a super super admin. So I have the
[20:58] 126 tools for the control plane and nine
[21:00] tools for the observability plane
[21:04] >> but based on their role the persona uh
[21:07] engagement this uh count will be vary
[21:10] depending on the what they want to do
[21:13] >> right. So all right
[21:18] what does this okay so these agents are
[21:20] coming um are I see on like the diagram
[21:24] I'll show on my screen real quick you're
[21:26] listing AI agent modules here you got s
[21:28] sur fops and architect are these def are
[21:32] these things that are that are coming
[21:33] defined like and and what does that
[21:35] agent mean to me like if I'm on my local
[21:37] harness does this mean that those are
[21:39] longunning agents that are sitting in
[21:41] the cluster like how do I interact with
[21:43] those agents
[21:45] So uh
[21:46] >> yeah go ahead Sam.
[21:47] >> Yeah.
[21:48] >> Okay. Sure.
[21:50] So I think uh yeah um I think what Rakal
[21:53] said earlier we have internal agents and
[21:56] external agents. They both use the same
[21:58] experience plain MCP servers tools. Um
[22:03] cloud code is external agent. This these
[22:05] phops sur and architecture agent they
[22:07] are like platform agents running in the
[22:09] cluster.
[22:10] >> Yeah. and they are reacting to certain
[22:12] events in a way. For example, if you um
[22:16] if there's an alert like lot of 500
[22:19] errors are going on, then our SR agent
[22:22] will react and then it'll give you some
[22:25] report on what's going on why this is
[22:27] happening. So you can configure that way
[22:29] you can configure the the agent with
[22:30] certain alerts types of alerts memory
[22:33] memory uh memory like if you are
[22:36] consuming high memory certain limit that
[22:38] SR agent will again react Sure.
[22:41] >> Phen agent is also similar in that way.
[22:45] >> Yeah,
[22:45] >> I see. So each of those agents comes
[22:48] sort of predefined. It sounds like like
[22:50] they they have a scope to them and then
[22:52] presumably I guess I'm somehow through
[22:54] the control plane I'm plugging those
[22:56] into like my messaging apps or other
[22:58] like into Slack or whatever and that
[23:00] those those are how those agents reach
[23:03] out to me is through do they go through
[23:05] a middle layer of like an alerting
[23:06] system like Prometheus has for alerts
[23:08] like how how does that path look for the
[23:11] agents
[23:13] letting us all know the problems? Yeah,
[23:16] at the moment it's going a middle layer
[23:18] like alerting system. We have the log
[23:20] based alerting uh with the open search
[23:22] and other layer and the prometheus for
[23:24] the metrics. So agents also going the
[23:27] same layer. Uh this alerting system can
[23:30] be uh configure with the pag dity your
[23:33] select channel or the other channels
[23:35] that how the architecture currently
[23:36] working. Yeah.
[23:37] >> Yeah. And so this feels like it's
[23:40] designed to be sort of um I mean we I
[23:43] think the website or something said
[23:44] batteries included. I think I included
[23:46] that in the in the thumbnail. it because
[23:48] it a lot of the conversations that I
[23:50] have um I mean I'm lucky to be in this
[23:53] uh this DevOps Agentic DevOps guild that
[23:56] I run and we meet weekly so I get to
[23:58] talk with dozens of teams around like
[24:00] what they're trying to do right now in
[24:02] terms of everything agents related to
[24:05] infrastructure and this exact stuff
[24:08] right so people are sort of in that
[24:10] middle mode right now where we've the
[24:13] the local agent harness is kind of
[24:15] become common knowledge like we've all
[24:16] experimented with gueies and tuities and
[24:19] multi- aent management and sessions and
[24:21] and skills like we're all you know this
[24:24] is the year of playing with skills and
[24:26] but then the minute that I have to start
[24:28] running agents I'm I just call these
[24:30] things server agents to try to define
[24:32] the difference these 20 these longunning
[24:35] always looping agents that are sitting
[24:38] there checking something or uh
[24:40] continually pulling for something or
[24:42] waiting for a web hook maybe if they're
[24:43] event driven these things are probably
[24:45] going to be everywhere we're going to
[24:46] have them all over the
[24:47] and you you're providing a few of those
[24:51] that are that are sort of out of the box
[24:53] defined to do specific things. So I
[24:55] don't have to go find an agent SDK and
[24:58] make up my ideas of what an agent might
[25:00] do in my cluster because that kind of
[25:01] feels like what a lot of us are doing
[25:03] right now is we're we're sort of making
[25:04] up we're writing little agents
[25:07] everywhere. We're making it up as we go
[25:08] along because we don't really know what
[25:09] parts that they we would automate or
[25:11] which parts they would be good at. you
[25:12] know what what parts of summarization
[25:14] and human judgment do they do well and
[25:17] what context do they need and that
[25:18] that's so but it feels like the rest of
[25:20] this system is the rest of this system
[25:22] to help them with the context problem
[25:24] because that always feels like a really
[25:26] hard problem right now around how what
[25:28] context do I need to give my agent to
[25:30] make it useful and not hallucinate and
[25:32] how do I get at that stuff at at the
[25:34] time it needs it.
[25:35] >> Yeah, exactly. So I think that the main
[25:37] idea behind we have this ecosystem
[25:40] within the ecosystem we have the agents.
[25:41] So in community or even uh open core
[25:46] maintainers will release a agent then
[25:48] our users can pick and choose this agent
[25:50] run in their open coro system. So for
[25:53] example uh for XR agent the intention
[25:56] behind that okay uh when when a uh when
[25:59] I say when something happen s agent can
[26:03] uh trigger it and we feeding all the
[26:06] blocks matrices config changes code
[26:09] changes to the sur agent within that
[26:11] window then uh what happened so it can
[26:14] generate the uh RCA report root code
[26:17] trans report and also alerting to the s
[26:20] human s when they come into the system
[26:23] they already have the road coalis report
[26:25] uh there that is providing through the
[26:28] sur agent and also we are not stopping
[26:30] there we we this agent can provide the
[26:34] remediation action as well okay this is
[26:36] a quick fix if you want to uh fix this
[26:39] issue so then at the moment it's a human
[26:42] in the loop s human s can apply these
[26:45] fixes but I mean we have the permission
[26:50] model where we can give some of the this
[26:53] agent to automatically apply these fixes
[26:55] uh themsel rather waiting to s we can
[26:59] say okay it's only config change I will
[27:02] allow my SR agent to apply the change
[27:04] and fix the issue so eventually it can
[27:07] take the more like activity but it's
[27:11] again it's had the human control uh uh
[27:14] in the loop so that's how the uh we have
[27:17] designed it like we said uh the main
[27:20] intention behind it So we running a
[27:24] system agent you can use you say uh
[27:27] survey agent we know what are the
[27:29] context we need we can we should pro uh
[27:32] feed into this agent to provide a better
[27:34] uh better result. So that's how this
[27:36] inbuilt agents are uh acting.
[27:40] >> Yeah. And and I also think like these
[27:42] agents is a good um if you're running
[27:44] something this is these agents are good
[27:47] starting point for you to come up with
[27:49] your own agents as well using and then
[27:52] you can replicate with you can come up
[27:54] with your own agents because the context
[27:57] problem is kind of solved in within the
[27:59] platform but Lakmal said so you can
[28:02] because these are all about not exposing
[28:05] Kubernetes details into the agent but
[28:08] exposing the open query abstraction
[28:10] layer.
[28:11] >> So that way it's it's what we have
[28:13] figured out. I mean not sure whether it
[28:15] will work.
[28:16] >> That's the hard part. Yeah.
[28:18] >> Yeah. Figured out is this abstraction
[28:19] field helped us to give the right
[28:22] context also.
[28:24] >> Yeah. Um all right. So,
[28:28] I wrote a I've I sent a blog post out
[28:30] this morning and it was related it was I
[28:32] was trying to relate it to this show and
[28:35] I sometimes like in the case of Open
[28:37] Coro I deliberately don't go too deep
[28:40] with learning it before the show because
[28:41] I want to act like that naive user and I
[28:44] don't luckily I don't have to act
[28:45] because I'm really dumb on this. So like
[28:47] I I didn't understand where the
[28:48] boundaries were and what the goals were
[28:50] and like exactly what these agents are
[28:51] doing. But um I tried I spent the last
[28:56] week writing this
[28:58] article around
[29:00] what I think an AI
[31:01] Hey Brett, it seems uh people can't hear
[31:04] you.
[31:06] The I'm seeing some comments on the
[31:07] line.
[31:08] >> Can you not hear me?
[31:10] >> Yeah, we couldn't hear you for a few
[31:12] minutes.
[31:12] >> Now we can. Yeah.
[31:13] >> Okay. Well, let me start my rant over.
[31:16] Um, pretend that that was a coffee
[31:17] break. Um,
[31:21] so I was I was writing this article over
[31:23] the last week and I thanks for the chat.
[31:26] People were chatting and I thought they
[31:27] just had great questions that I wasn't
[31:28] looking at their chat yet.
[31:31] Um, but I wrote this article around what
[31:34] is an AI Kubernetes platform even mean?
[31:37] And what we're really talking about here
[31:39] is something that I had to define in
[31:41] this newsletter as what I'm calling type
[31:44] three AI on Kubernetes. It sounds really
[31:46] nerdy, but if you've been going to
[31:48] KubeCon's since the invention of chat
[31:50] GPT, um, for most of the history of what
[31:55] we now call Gen AI or whatever, um,
[31:58] thanks thanks everybody in chat. I
[31:59] appreciate you all looking out. Um, if
[32:03] you ever want to know who's actually
[32:04] watching in chat, you just have to do
[32:06] something dumb like I do and then people
[32:07] will say, "Hey."
[32:09] Um,
[32:10] >> and that works.
[32:11] >> And yeah, that's that's that's what gets
[32:12] people gets people talking. Uh, I'm just
[32:14] going to say say, um, hello. Hello, Cody
[32:18] Bites. What's up, Papa? Uh, Williams
[32:21] here. Uh, Kenna
[32:26] Ron Ron, sorry if I mispronounced that.
[32:30] TEA. Uh yeah, and then you
[32:36] and then you
[32:39] um
[32:41] hello everybody for being here. Uh and
[32:43] Jose at the top. All right. So back to
[32:45] the chat, back back to the the point
[32:47] here. The point here is
[32:49] um we are not talking about AI ops and
[32:52] MLOps. I mean, technically, you could
[32:54] use this thing to manage that, too. But
[32:57] for those of us that go to KubeCon,
[32:59] there's been I've been ranting a little
[33:00] bit for a couple years now that there's
[33:02] been this weird disconnect where when a
[33:05] lot of people in the industry talk about
[33:07] Kubernetes and AI, they're actually
[33:09] talking about running AI or making AI,
[33:11] building models, reinforced uh
[33:13] reinforcement uh learning, running
[33:16] inference, and that that is not my
[33:19] world. Like I am not that person, right?
[33:21] It's interesting from an architecture
[33:22] and engineering perspective, but I don't
[33:24] think I'll ever like that. To me, that's
[33:25] a special specialty and it might get
[33:27] easier to do and some of us might have
[33:29] to have that new role, but I feel like
[33:30] there's still a significant maybe even
[33:32] majority of us that won't be doing that.
[33:35] our our new job is to build out these
[33:38] agents and understand where the you know
[33:40] the features and the edges of models can
[33:43] be implemented so that we can automate
[33:46] and build more because I see this whole
[33:48] thing as just like VMs the cloud the
[33:51] invention of the PC uh you you over my
[33:55] 30-year career we've had major pivoting
[33:57] points in technology that essentially
[33:59] allowed us to scale ourselves as a human
[34:02] we went from managing 10 servers in the
[34:03] 90s to 100 servers with VMs to a
[34:06] thousand servers with the cloud maybe
[34:08] containers allowed us to run 10,000 pods
[34:11] per per admin and then now this is the
[34:15] next level and to me what you all are
[34:17] building is exactly that layer of
[34:19] abstraction which is the agents and
[34:22] their context management and permission
[34:24] management essentially uh that layer is
[34:28] now this new level of abstraction that
[34:30] will hide some of the complexity
[34:34] like we don't have to know every cube
[34:35] control command in the world anymore. In
[34:37] fact, I don't even know how we all get
[34:38] Kubernetes certified because if if we're
[34:40] not going to be running cube control and
[34:42] have to know every option, what is a
[34:44] Kubernetes admin test other than just
[34:46] understanding architecture? But my rant
[34:48] here is that like I had to go through
[34:49] this post took me a week a week to write
[34:51] because I was trying to theorize around
[34:53] we're really talking about agentic
[34:55] operations and that's where I'm that's
[34:57] the reason I have the new podcast.
[34:59] That's the reason this this project is
[35:00] exciting is this this is that new layer.
[35:04] This is the abstraction that we're all
[35:06] kind of searching for because I think
[35:08] we've we've understood the local harness
[35:10] now after having those for a little over
[35:12] a year. I think a lot of us are like,
[35:14] "Yeah, I get this. We have all sorts of
[35:16] neat ways of managing these little local
[35:17] agents, but this new nebulous layer
[35:21] that's sitting in front of our
[35:22] infrastructure is still not fully
[35:25] realized. So, I feel like you all are
[35:27] pretty early in the game in terms of
[35:28] saying we we we can help. We have this
[35:31] defined component and we can shove it in
[35:34] there. And uh I'm even trying to draw
[35:37] out a diagram of what
[35:41] if we're all on a maturity path to
[35:45] we started with the agent harness
[35:47] locally and the end goal is like we've
[35:50] maximized the agent
[35:54] assistance that we're all going to have
[35:55] like the the dozen agents we're going to
[35:57] have two dozen agents like we've
[35:59] maximized all the functions and features
[36:01] of what models can do for us as DevOps.
[36:04] platform engineers, SRRES, right? Like
[36:07] all of us are managing this
[36:08] infrastructure and there's a dozen tools
[36:12] and a thousand features we all now need
[36:15] to learn and we're learning skills and
[36:16] agent files and MCPs and and we're all
[36:19] that can go ary and go crazy. But
[36:22] eventually the idea is we're improving
[36:23] our productivity. We're improving our
[36:25] management. We're hopefully making
[36:27] things more secure and reducing reducing
[36:29] outages and like everything's getting
[36:31] better, but at the end of it, what are
[36:34] we doing? And I think a year ago, we
[36:36] were all kind of nervous like are our
[36:39] jobs going away? Like that's been a big
[36:41] conversation. Like are we is there only
[36:43] going to be one DevOps person in the
[36:44] company? But to me, this is the job.
[36:46] Like this is the new job is you all
[36:48] built this. Now we've got to implement
[36:50] it. Now we've we got to understand it.
[36:52] We got to understand the components.
[36:54] Where do the agents help? Where do where
[36:56] do they not help? Like where do we still
[36:57] need the human in the loop? Like you
[36:59] said, um I'm pitching this to you
[37:01] because this graph didn't exist and a
[37:04] day ago. So I'm kind of I'm saying like
[37:06] it seems like open coro is like at the
[37:08] end of this chart somewhere in the
[37:10] production area on the green part. Um
[37:13] and I just wondered if that's if any of
[37:15] that is anything that you would agree
[37:16] with or or that I'm am I crazy? Does
[37:19] this all sound like a good idea to talk
[37:21] to talk about?
[37:22] >> No, I think this is great Brett. I think
[37:24] you are correct. So yeah, we we it's
[37:27] evolved last one year within one year we
[37:30] can run agent in production like helping
[37:33] to the different personas engaging
[37:34] within the platform like so we we I I I
[37:38] believe the generating code problem is
[37:41] almost solved now with the code or the
[37:43] cursor or codec right. So people can
[37:46] generate code uh write application uh
[37:48] within a minutes uh but then when the
[37:51] moment so we call it as w coding right
[37:54] everyone called it as w coding when the
[37:56] moment they hit the deployment their w
[37:59] is ending because now with vcoded
[38:03] application has to be promoted into the
[38:06] production but still the platform
[38:08] doesn't support uh they have to go away
[38:11] from their agent they have to either
[38:13] create a ticket or like uh go to the
[38:15] sales service portal and deploy their
[38:17] application. But what we have seen uh
[38:20] within the our uh uh skills and the MCP
[38:24] servers they can use the same same uh
[38:27] agent they using to develop uh their
[38:29] code they can say okay I can say uh I
[38:32] want to deploy uh the my application
[38:35] into the development environment just
[38:37] you figure it out how to do it. So now
[38:40] it's we call it pipe deployment. So open
[38:42] codio try to fix uh uh fill the gap uh
[38:46] by deployment part how the operational
[38:49] side of it. So you can write uh agent or
[38:52] you can write your code using agent. Now
[38:53] you use you you can use the same agent
[38:55] to deploy your application into the
[38:57] production and operation uh with support
[38:59] of open coro abstraction as well as the
[39:02] the tools supporting through the MCB
[39:05] servers. That's that's where we we want
[39:08] to play uh around this uh platform
[39:10] engineering side of it.
[39:12] >> Yeah.
[39:13] >> And I think you mentioned some something
[39:16] about how abstractions help agents,
[39:19] right? I think the way I think about
[39:21] that is more abstractions fever tokens.
[39:25] >> So they don't have to learn Kubernetes,
[39:26] deep yl and coupubectl.
[39:30] um like I think the same applies
[39:32] programming languages in a way writing
[39:34] assembly versus writing in Java or C# or
[39:37] Go right
[39:39] few tokens in that way. So they they
[39:41] have more context with less tokens. I
[39:45] think that is also that's what I'm
[39:47] working on these days. See whether we
[39:49] can establish that.
[39:51] Yeah, that's a um I think we're all very
[39:54] quickly, especially these last three
[39:55] months when it comes to tokens, um we're
[39:58] very quickly understanding that this is
[40:00] not f this is finite and that
[40:03] >> um just kind of like Kubernetes itself
[40:07] and we've all had that moment where
[40:09] early in our Kubernetes journey, we
[40:11] would realize how much infrastructure we
[40:14] actually need to just run the apps like
[40:18] uh you know when you build out a full
[40:20] scale Kubernetes cluster there is a
[40:22] there is a non
[40:24] nonins insignificant amount of
[40:26] infrastructure to manage the
[40:27] infrastructure and that um back when we
[40:31] first started the journey on Kubernetes
[40:33] I don't think we all really understood
[40:34] like the eventuality of dozens of
[40:36] different controllers and dedicated
[40:39] nodes in the control plane that we were
[40:41] going to have to have just to just to
[40:43] organize and herd the cattle, right? Uh
[40:45] to to her the cats as uh they might say.
[40:48] And um I feel like now the same thing is
[40:51] with tokens. It's like we got all
[40:53] excited and we're like we're going to
[40:54] put it everywhere and now we're going to
[40:58] very quickly end up in a world where
[40:59] we've got budgets and now we've got to
[41:01] optimize our agents and we have to care
[41:03] about the model and we have to we have
[41:05] to one of the conversations we're having
[41:06] in the agentic devops guild is around
[41:08] evows and how can we use evals
[41:11] uh to sort of figure out which model uh
[41:14] if I can if I can apply a bunch of
[41:16] skills if my agents are really just
[41:18] behaving with a bunch of skills and
[41:19] context management stuff how can I uh
[41:22] sort of evaluate or programmatically
[41:24] determine or test that maybe I could use
[41:27] Kimmy or Miniax or Quinn or some uh sort
[41:32] of openweight model that's super cheap
[41:34] for my very specific little you know
[41:36] FinOps niche agent whereas it maybe
[41:39] doesn't need Opus because maybe that's
[41:41] the one that I need because I have dumb
[41:42] prompts and I I type dumb things and
[41:44] maybe my agent in the cluster can be
[41:47] running really cheap because none of us
[41:49] want to put these in production and then
[41:50] find out a month later that it's too
[41:53] expensive to run and now I have to shut
[41:54] it down and and make the humans do it
[41:56] again because I don't think that's a
[41:57] world I want to go back to.
[42:00] >> Um
[42:01] >> that's so we have a demo. Um what are we
[42:07] going to demo? What do we want to what
[42:08] do we want to show off? Yeah, I think uh
[42:12] uh let's let's show that how your
[42:15] external agent can do wipe deployment
[42:19] and interact with the platform uh uh
[42:22] with your uh code maybe. Then uh I will
[42:25] I I will also show how like inbuilt
[42:28] agent S agent can help S persona human
[42:32] persona creating the automated root
[42:35] cause and his report and providing the
[42:37] remediation action within itself. So
[42:39] it's it's helping uh it's not replacing
[42:42] the human it's like I said it's helping
[42:45] the human to more more productive. So
[42:47] that's that's how we envision uh how
[42:50] these agents will uh playing in this
[42:52] platform.
[42:53] >> Yeah I know a couple of companies that
[42:55] are playing in this space. We've had any
[42:57] shift on the show. We've had um Menrol
[43:00] on the show and these are both AI
[43:02] startups that are building agents to
[43:05] help with infrastructure CI/CD um sort
[43:08] of helping um helping you with fault
[43:10] remediation and and the pro approach
[43:13] that I've been seeing everybody take is
[43:15] like it's just sort of the normal
[43:17] pattern of engineering approach. We walk
[43:20] before we run, you know, we we run
[43:22] before we sprint. and and that means on
[43:25] day one you might be read only like
[43:27] agents can't do things. They can just go
[43:29] look up things
[43:30] >> and and that they're just a little
[43:32] helper assistant and they're not going
[43:34] to they have no control over anything.
[43:36] They're just here to f help you find
[43:37] information in this sea of of infinite
[43:40] information that we have about our
[43:41] infrastructure. And then then then like
[43:44] with MLE, the team with MLER that's been
[43:45] on the show, like um they're now to the
[43:48] point where like very specific use
[43:50] cases, they're letting the agent have a
[43:53] tiny bit of control because it's
[43:54] predictable. They can test and make sure
[43:56] that it's that it works, that it's as
[43:58] declarative as possible so that it is
[44:00] predictable and they they start to to
[44:03] let that happen. Um is that something we
[44:06] can do with open coro? Like does this
[44:08] have sort of a default stance of I'm
[44:10] only going to be read only? How does how
[44:13] do the permissions work that way?
[44:15] >> Yeah. So, uh agent can do uh it's not
[44:18] only the read only at the moment but you
[44:19] can control it. Uh we have the
[44:21] permission model where you can give the
[44:24] different tool to the U your agents. It
[44:26] can be read only tools, it can be write
[44:29] operational tools as well based on the
[44:31] what we want to do. You can allow this
[44:33] permission because agent have their own
[44:35] identity and within the identity they
[44:37] have the permission. So based on the
[44:39] permission they can use different tools.
[44:41] If if you are confident enough that your
[44:44] agent is acting performing well you can
[44:47] give the give the right permission do so
[44:50] some some level of factory for example
[44:52] they can give uh you can give like okay
[44:54] if there are configuration issue agent
[44:56] can go and fix itself uh the
[44:58] configuration uh fixes. So you can you
[45:01] can fine-tune how you your agent can
[45:04] interact with your platform.
[45:05] >> Yeah. All right. Let's do it.
[45:09] I like fine-tuning. Um, I've been
[45:12] wondering about a lot of this around
[45:15] like how how is per how are permissions
[45:19] going to work? How are like so much of
[45:22] our permissions models today feel like
[45:24] they're not designed for an agentic
[45:26] world? And um how how are we going to
[45:29] track the actions of these things? like
[45:32] every little step they take uh is is
[45:34] hopefully going to be put into some sort
[45:36] of observability platform. So I'm I am
[45:38] very curious to see how what what your
[45:40] all's vision is. So what are we looking
[45:41] at?
[45:42] >> Yeah. So uh this is the backstage portal
[45:46] of open coro. So this I am looking as a
[45:49] super admin. I can see everything in
[45:51] this uh in this case. So this something
[45:53] like a you can see like a site map kind
[45:56] of thing. It's it's a how the platform
[45:58] resources what are the platform
[45:59] resources at at glance. So I have the
[46:02] environment I have the data pane control
[46:05] pane so I can see what are the resources
[46:09] and this is something a single project
[46:11] in this project we I have five
[46:13] components there are uh resource
[46:15] components like radius postgress and
[46:19] there are API service and front end and
[46:21] back end service. So these are like a
[46:23] composite application that you can
[46:25] deploy into the open coro and uh then
[46:29] this is the cell diagram we generate
[46:32] dynamically when you deploy application
[46:34] into open coro is protected via a
[46:38] bounded uh boundary like a bounded
[46:40] context. So you can't go outside of this
[46:43] sale boundary. You can't call uh the
[46:45] services uh by crossing the sale
[46:47] boundary unless you explicitly expose
[46:49] the services. This is very important.
[46:51] the agent agent equality as well. You
[46:53] can you you have you can protect your
[46:55] agent you can give the sandboxing
[46:57] environment to agent uh what they want
[46:59] to do within the your system. So that
[47:01] support of the box uh in the open curio
[47:05] uh and we have the logs and metrices
[47:07] everything uh uh within the uh within
[47:10] system. So let me jump into the agentic
[47:13] experience. Uh let me do a load here
[47:16] also. So here I'm I'm trying to uh uh
[47:21] demonstrate how external agent can
[47:23] interact with MC your open coro. So like
[47:26] I say I I have uh two MCP servers
[47:30] configured and also I have uh few skills
[47:34] uh added into the
[47:37] uh uh open career developer and open
[47:39] career platform engineer skill to
[47:41] demonstrate everything in one one one
[47:43] cloud code. Uh but you can based on the
[47:45] your persona your permission you can you
[47:47] can uh uh unload load your skills.
[47:52] >> Yeah.
[47:53] >> Yeah. So this is something so this is
[47:55] something sorry. So I'd have like MCP uh
[47:57] I'd have these skills and MCP configured
[47:59] in my local harness at all times and
[48:03] >> Yes.
[48:03] >> Yeah.
[48:05] >> Yeah. So I can I can I can say like I
[48:09] can give prompt uh within the code uh
[48:12] code or the agent say like this all say
[48:23] something like this. So it it will this
[48:25] is like a read only action. So basically
[48:27] it interact with MCP servers and it will
[48:29] uh get all the my uh component and where
[48:33] they deployed uh within my platform uh
[48:36] and and also let me show that uh after
[48:39] it's uh
[48:41] so and we are we are seeing this now
[48:44] more more people are more into the
[48:46] agentic environment and the UI is I
[48:49] would say uh second class now uh because
[48:53] even UI we have to develop a UI to cater
[48:56] the end user but with the agentic world
[48:58] they can they can ask their own view
[49:01] with the agent uh it will generate the
[49:05] uh what they want to see
[49:06] >> you can see it generate output saying
[49:09] okay I have multiple projects and where
[49:12] this projects are deployed kind of
[49:14] scenario so let's say one one one
[49:17] scenario this is my go service deployed
[49:19] into the development let me double check
[49:22] it is in the uh default there's a my go
[49:27] if I go to the pipeline it's in the
[49:31] deployment environment uh de development
[49:33] environment so let me give to my agent
[49:36] say
[49:46] something like this so this is like you
[49:48] typed promote yeah sorry you typed
[49:50] promote my go to uh staging
[49:52] >> staging Yeah, it's like a writable
[49:55] action now. It's not on read only now.
[49:57] It can create a uh like a release a
[50:01] release binding should be able to
[50:02] promote into the upper environment with
[50:04] all the configurations.
[50:06] >> Now does it how does it know what the
[50:08] word promote means?
[50:10] Is that a part of the skill?
[50:11] >> Because part of the skill
[50:15] >> developer skill it know the promote. You
[50:18] can see it created the all the binding
[50:20] releases
[50:21] >> and let me verify it here.
[50:25] >> Is that good use case for like UIS
[50:28] visualize what you have done?
[50:30] >> Yeah.
[50:30] >> To verify it. He want to verify it.
[50:34] >> Yeah. I'm I'm I more and more lately I
[50:36] just I ask it okay show me the results
[50:38] in a dashboard and it'll just build me
[50:39] an HTML dashboard. This is a little more
[50:42] interactive than but uh so so much of
[50:45] the of the knowledge that you can get
[50:46] out of uh out of a out of a model. I
[50:48] mean a 2I is great but like when you're
[50:50] seeing everything in a in a asky
[50:52] character format that's a pretty limited
[50:53] view. So I'm I'm always like use your
[50:56] front end design skills bro just make me
[50:58] a fancy dashboard. Um
[51:01] all right so I'm looking So we're
[51:02] looking at like the list of agents.
[51:03] Yeah.
[51:04] >> Yeah. So so this ecosystem. So uh I have
[51:08] like a skills I want to show that there
[51:10] are bunch of skills uh developed by the
[51:13] community and the uh open core
[51:16] maintainers. So uh let me show something
[51:20] uh that wipe deployment thing. So this
[51:22] is the developer skill opener skill uh I
[51:26] already uh loaded into my uh the cloud
[51:29] code. You can see you can give this kind
[51:32] of prompt. Okay, in this scenario, this
[51:35] is famous Google uh Micros
[51:40] component.
[51:41] >> I can give the same prompt here. Uh for
[51:43] example, let me give this in here. Uh
[51:46] and I I will say uh say create new
[51:52] project
[51:54] and use before name space.
[52:04] So uh now normally it it because we we
[52:08] are feeding the skills how to migrate
[52:11] from the general kubernetes or the how
[52:14] to use the local file system to read
[52:16] your code and deploy into the uh open
[52:19] coro. It should be able to deploy all
[52:21] these 12 microservices into the open
[52:23] coro less than couple of minutes like uh
[52:26] it it get all the uh open coro component
[52:30] abstraction everything uh by agent
[52:33] itself it's it's like a w deployment the
[52:36] developer can start with the coding and
[52:38] they can use the same agent tool to do
[52:40] the w deployment. So does this okay so
[52:44] behind this is
[52:47] like what are the things that are
[52:48] actually happening is open coro creating
[52:51] like an argo
[52:53] app set like is it like get opsing into
[52:58] github like exact I'm trying to figure
[53:00] out exactly what's happening
[53:02] >> yeah so in this uh thing uh open ker
[53:06] will use the already build images
[53:08] because it's Google mic service already
[53:10] build images
[53:11] >> it will create a component uh within the
[53:14] bring your own docker image uh model and
[53:18] it will create a configurations uh uh to
[53:20] microser to interact each other uh uh
[53:24] and it will deploy into the developer
[53:26] environment with the release and release
[53:27] binding. So it it's it's it's know the
[53:30] what the what are the abstraction in
[53:32] open coro it's know the what the
[53:33] kubernetes abstraction in the google
[53:36] microser it can convert the bio
[53:38] kubernetes abstraction into the uh the
[53:41] open core abstraction and deploy into
[53:43] the open coro
[53:44] >> okay so
[53:47] >> it's talking to the API of open coro and
[53:49] it's essentially creating new resources
[53:51] in kubernetes through that process but
[53:55] are these new are these new like open
[53:57] coro is own resource type. Is that is
[54:00] that what we're I'm assuming you have
[54:02] some custom res resource types.
[54:04] >> Exactly. Yes. That that whole I think u
[54:07] if you have access to the Kubernetes
[54:09] we can show those the custom CRDs that
[54:12] we have built. So like from a developer
[54:14] perspective, you have a concept called
[54:16] project that is your whole application.
[54:18] That's what Lakmal is showing. And then
[54:20] and there's another concept called
[54:22] component. Component represent the
[54:24] Kubernetes workload,
[54:27] right? And and each component can be in
[54:29] multiple environments. So these are all
[54:32] the terms that the APIs that we have
[54:34] built on top of Kubernetes and other
[54:36] projects.
[54:37] >> Okay. So it's creating an open.
[54:40] >> Oh, sorry. Go ahead.
[54:41] >> No, no. I was just going to say these
[54:44] skills and MCP tools they um these tools
[54:48] create the abstractions by looking at
[54:50] the GCP microservices uh manifests.
[54:53] >> Okay. Yeah. I'm trying to envision uh
[54:56] what what ends up like how to correlate
[55:00] this to the things we know today like
[55:02] git ops repos with argo cd definition
[55:05] files in them that that get you know
[55:08] pulled in because argo is watching or
[55:11] flux is watching a different repo and
[55:13] it's looking for yaml changes and uh I'm
[55:16] assuming does this change that workflow
[55:19] does like how I I remember seeing
[55:21] githops on the diagram so I'm kind of
[55:23] thinking like how does this corl relate
[55:25] to sort of that that GitOps loop that
[55:27] we've seen so many teams adopt.
[55:30] Yeah, I think these um the CRDs right um
[55:33] so they you can put all of these in in
[55:37] your GitHubs repo and then you can
[55:38] configure ago CD configure flux the
[55:41] normal workflow will work as it is right
[55:45] >> okay
[55:46] >> and so there are two modes right here I
[55:48] think we are using more click cops click
[55:50] ops approach but you can configure the
[55:53] same thing with uh your githops so
[55:55] >> oh I see talk to yeah
[55:58] >> yeah so in this case What we're saying
[55:59] is like um I mean yeah we're looking at
[56:02] a web dashboard but we're not really
[56:04] >> we're just I mean for the for the audio
[56:05] listeners we're we're uh we're looking
[56:08] at a dashboard we're not really we're
[56:10] not we're we're vibe opsing I guess not
[56:13] click ops because we're not clicking on
[56:14] the dashboard we're we're using the
[56:16] agent but the agent is causing just to
[56:18] be clear this is actually a question so
[56:19] the agent is causing open coro to create
[56:22] new resources inside the cluster for
[56:26] deploying this this uh the Google uh
[56:28] microser demo and I that is an
[56:32] alternative to
[56:34] having an agent that I just say hey
[56:36] we're going to make a new PR in YAML and
[56:38] you're going to push that to GitHub and
[56:39] then Argo or Flux is going to pick that
[56:41] up later. Um because I guess I could I
[56:44] could tell these agents or I could give
[56:45] them skills or change the skills somehow
[56:47] >> so that that it knows that that's my
[56:50] workflow and that I'm just using it as
[56:52] sort of a readonly partner, but kind of
[56:55] just like we're not supposed to go into
[56:56] the AWS console and click ops away
[56:58] there. Like I don't necessarily want
[57:01] >> Open Coro controlling my my
[57:05] changing my infrastructure
[57:07] outside of my GitOps loop. I I'm
[57:10] actually just curious. Do you think that
[57:11] the GitOps approach
[57:14] is less important if we have this sort
[57:16] of chain of events and or or do you like
[57:20] do you see enterprise teams possibly
[57:22] move shifting to more of this vibe
[57:23] approach since we have something in the
[57:24] middle or do you think that you think
[57:26] that GitOps is like here to stay and
[57:28] that's still this sort of more mature
[57:29] safe approach. Yeah, I I would say uh
[57:32] when you come to the lower environment
[57:35] uh this agent directly creating this
[57:38] custom resources will play a big role.
[57:41] Uh because uh but maybe in in production
[57:44] when you promote it in the production uh
[57:46] people will use uh githops or
[57:49] declarative way of defining these these
[57:51] things uh into the production. uh but
[57:54] eventually if if if the human will
[57:57] confident enough with your agents and uh
[58:00] trust your uh agent eventually I would
[58:03] say uh uh people agent can directly call
[58:07] to the MCPS and create the uh like uh
[58:10] the the custom resources directly within
[58:13] the cluster itself.
[58:15] >> Yeah, the way I think
[58:16] >> not not in the this year but maybe in
[58:18] future. I mean one one the way I think
[58:21] about that is I don't know one advantage
[58:24] of githopsy is you have the the tracing
[58:26] right you you know what what happened
[58:28] exactly over the years probably
[58:32] yeah dev environment probably you don't
[58:35] need githops like you can give
[58:37] developers full capability do whatever
[58:40] you want but for other environments
[58:43] I don't know in my view like you would
[58:45] we I would still use githops to control
[58:47] the workflow Right. Yeah.
[58:49] >> Yeah. I was trying to imagine um
[58:54] >> I'm in I'm in chat. I'm lost in chat. We
[58:56] got some good questions. We're gonna get
[58:57] to the questions in a second. Um uh and
[59:00] my friend Laura's here. Hi. Um what's up
[59:03] Conrad and uh Tjaz. Sorry if I
[59:06] mispronounced that. Um the I' I've
[59:10] started to wonder if like if the agents
[59:12] will eventually sort of like
[59:16] we're interacting with the local agent
[59:17] on our cloud code and we're using these
[59:19] skills and like if possibly I mean I
[59:23] already see I think everyone's already
[59:24] writing the YAML with agents like we're
[59:26] already at the point where teams that
[59:28] have are using cloud code they're not
[59:30] handwriting Kubernetes YAML anymore
[59:31] right like to have the agent do that why
[59:32] would why would I want to do that
[59:33] anymore and um I wondered if there Uh
[59:37] because there is a mode in in Argo where
[59:40] like you can sort of do things
[59:42] retroactively where maybe you make
[59:44] changes but then you document them after
[59:46] you're after you're uh you're done. And
[59:48] I wondered if agents possibly created a
[59:50] future where GitOps was more maybe not
[59:54] the first thing we did but maybe more of
[59:55] a a system of record but done after and
[59:58] and if agents sort of were pushing us
[01:00:01] into a a faster evolution of this. Do
[01:00:03] you see uh do you before we get to the
[01:00:05] questions I had one question from
[01:00:07] earlier um a constant conversation we're
[01:00:10] all having is around sandboxing and um
[01:00:15] specifically like our local agents um
[01:00:18] one of the one of the challenges is like
[01:00:22] I'm just doing stuff on my local machine
[01:00:24] all day and I don't necess and every AI
[01:00:26] that I open up by default even if I even
[01:00:28] if I use like cloud code built-in
[01:00:29] sandboxing um chances are it has access
[01:00:32] to my AWS, my my my Docker CLI, my cube
[01:00:35] control CLI, like my Terraform, all
[01:00:37] these things. And those keys are all
[01:00:38] already off and it could just deploy to
[01:00:42] things. And so one of the one of the
[01:00:45] things that I'm trying to work on with
[01:00:46] some some friends is like where can we
[01:00:50] draw a boundary around sandboxes? This
[01:00:51] almost feels like a very perfect
[01:00:53] scenario of these MCP tools and these
[01:00:56] skills are maybe all in a Docker sandbox
[01:00:59] which is more of like a VM with a with
[01:01:01] the harness running inside it and I only
[01:01:04] use that particular harness because it
[01:01:06] has these particular uh permissions and
[01:01:09] I only spin that one up when I want to.
[01:01:11] It's almost like per environment uh
[01:01:14] configuration and each Docker sandbox
[01:01:16] tends to have its own configuration. um
[01:01:18] that it's the way at least it works now
[01:01:20] is it's very things are very isolated
[01:01:22] including all your keys and everything.
[01:01:23] It it doesn't really bring a lot into
[01:01:25] it. So you're sort of isolating
[01:01:27] everything into different harnesses. So
[01:01:28] I'd have like my prod harness, my my
[01:01:31] staging harness and I can interact with
[01:01:33] those environments
[01:01:35] um and keep them safe away from like my
[01:01:38] normal day-to-day harness that kind of
[01:01:40] has maybe too many like a lot of small
[01:01:42] teams too many keys are on their local
[01:01:44] machines, right? Like they they know it.
[01:01:46] They're out there. Um, if you're a team
[01:01:48] of three, chances are you probably got
[01:01:50] the production Terraform key on your
[01:01:52] machine and you're just like you, you
[01:01:54] know, not to type the Terraform
[01:01:56] commands, like you know that that's the
[01:01:57] human thing. But now that we have the
[01:01:59] agents, I think people getting started
[01:02:00] getting a little more concerned. They're
[01:02:01] like,
[01:02:01] >> "What if it accidentally picks the wrong
[01:02:03] environment or if I accidentally have
[01:02:04] the wrong key and it just starts going?"
[01:02:06] Um so this feels like a very very
[01:02:08] interesting approach of each environment
[01:02:11] or u I'm assuming this thing manages can
[01:02:13] manage multiple clusters in one open
[01:02:14] coro or is it a per cluster thing?
[01:02:17] >> It's one multiple clusters you can
[01:02:19] manage within open core.
[01:02:20] >> Yeah. Um, so yeah, so I almost treat it
[01:02:23] like an environmental gateway for my
[01:02:25] agents where I'm like I I call this
[01:02:28] production and I have this special
[01:02:30] Docker sandbox or whatever sandbox I
[01:02:32] want to apply and the keys to production
[01:02:35] even if it's readon are only accessible
[01:02:38] from that thing and maybe also I'm going
[01:02:40] to have some GitHub keys in there so
[01:02:42] that my agents in that sandbox can also
[01:02:44] write to the GitOps repo. Maybe they can
[01:02:46] make the PRs for me. Um, but I I'm kind
[01:02:49] of I'm now realizing how these things
[01:02:50] are coming together where I can have the
[01:02:52] one local agent that can see the
[01:02:54] infrastructure through open coro but can
[01:02:57] also implement GitOps changes based on
[01:02:59] the the information that it's getting
[01:03:00] out of the infrastructure. So if the
[01:03:02] pods and and you know a back off a loop
[01:03:06] a pull back off loop I can see that the
[01:03:10] agent can determine that write me the
[01:03:11] githops in the same session it writes
[01:03:14] the githops diff. We're pushing that to
[01:03:16] a PR
[01:03:17] >> and that's how I'm going to be iterating
[01:03:19] on my cluster now. Or maybe I'm doing it
[01:03:20] in Slack while I'm at lunch and it's all
[01:03:22] just through a Slack bot and I'm not
[01:03:24] even
[01:03:24] >> Yeah, that's true.
[01:03:26] >> Um, anything uh sorry, I didn't mean to
[01:03:28] cut you off, but anything more in the
[01:03:29] demo?
[01:03:31] >> Uh uh I can show like the SAR agent uh
[01:03:37] uh how how the S agent will kick off
[01:03:39] with the uh failure scenario.
[01:03:42] >> Okay. But before you do that, um I I'll
[01:03:45] ask you these questions to make sure
[01:03:46] that we have them uh have them at top of
[01:03:48] mind here. William's asking uh I like
[01:03:50] it's cloud code that can interpret or
[01:03:52] understand uh promote in the staging.
[01:03:54] Can we use a private LLM to interact
[01:03:56] with open coro instead of cloud code?
[01:03:59] >> Uh yeah. So it's it's totally cloud
[01:04:02] code. It's it's your uh your model,
[01:04:04] right? So it's anthropic model. You can
[01:04:06] use any codeex or any other agent. It's
[01:04:08] just MCP servers you interacting with.
[01:04:11] You can use any LLM in your local agent
[01:04:14] and also in the inbuilt agent also you
[01:04:16] can configure your LLM. You can either
[01:04:19] open a AI or whatever model you can
[01:04:22] configure it. It's it's a configuration
[01:04:25] uh for us,
[01:04:26] >> right? Yeah. Picking the model for the
[01:04:28] agents uh is is a good thing because
[01:04:31] we're we're going to we're I got a
[01:04:33] feeling we're going to need to get
[01:04:34] cheaper models for the for the cluster
[01:04:36] for the always on activity.
[01:04:38] >> Uh yeah. Uh and then um
[01:04:42] tjas I'm just going to I'm just I'm
[01:04:44] going to say that uh does does the agent
[01:04:46] have visibility into inter cell
[01:04:48] dependency traffic or is it context
[01:04:50] window completely blinded to anything
[01:04:52] outside its assigned project
[01:04:58] uh it's depending on the the different
[01:05:00] agent interact with different context.
[01:05:03] So for example uh say uh in this case sr
[01:05:06] and uh when when when that
[01:05:09] troubleshooting it can uh go beyond the
[01:05:13] one project because some some component
[01:05:16] will interact with the other component
[01:05:17] within other project. So so if there are
[01:05:20] interaction this agent can look at what
[01:05:23] happening in the other other component
[01:05:25] in other project. So it's it's b based
[01:05:27] on the context that agent are
[01:05:29] interacting with they can go beyond the
[01:05:31] single project or multiple project
[01:05:34] >> right okay and just to be clear on like
[01:05:37] the scope of this thing if I go back um
[01:05:41] let me go back to my uh to the web page
[01:05:45] diagram
[01:05:46] um when we're looking at the
[01:05:50] we're looking at the design here um it
[01:05:53] doesn't appear like you're trying to
[01:05:54] boil the ocean here where open coro is
[01:05:58] the single control plane for AWS for GCP
[01:06:01] for for all the other things that we
[01:06:03] might have to manage it it's it's trying
[01:06:06] to remain Kubernetes focused right so
[01:06:08] like the lowest layer in our
[01:06:09] infrastructure stack is I guess m maybe
[01:06:13] it's could possibly get some of the
[01:06:14] technically OS level stuff underneath
[01:06:16] Kubernetes but that's that feels like
[01:06:18] that's the the dropping off point
[01:06:19] because I don't see like clouds listed
[01:06:21] in here.
[01:06:24] Yeah, we we we have uh we are running on
[01:06:26] top of Kubernetes. We are not even
[01:06:28] managed in the Kubernetes with open
[01:06:30] core. We we build on top of the
[01:06:33] Kubernetes. So you can run open core on
[01:06:36] EKS or AKS but we are not managing the
[01:06:39] EKS or AKS. So that how we we we
[01:06:42] architect but uh open query has this
[01:06:45] resource abstraction where you can you
[01:06:48] can manage the cloud resources within
[01:06:51] the open core. So we have integrated
[01:06:53] with crossplane at the moment. So open
[01:06:56] career control plane can in talk with
[01:06:59] the crossplane integration and via the
[01:07:01] crossplane it can create a say bucket in
[01:07:04] AWS and it can uh uh do the life cycle
[01:07:07] management of S3 bucket. So it's provide
[01:07:10] a single unified control pane to manage
[01:07:13] it but uh not it's it's up to the users
[01:07:17] if they were to use uh page in different
[01:07:20] way that's totally up to the users.
[01:07:22] Yeah,
[01:07:23] >> just to add to that and then
[01:07:26] on the screen you saw like multiple
[01:07:27] planes, right? Control plane, data
[01:07:29] plane, observability plane and you can
[01:07:31] run each plane in its own Kubernetes
[01:07:34] cluster or you can run all planes in one
[01:07:36] cluster and you can run your control
[01:07:38] plane locally, you can run data plane in
[01:07:41] AWS or it's as Akmar said it's the the
[01:07:45] way it's architected. It's you can
[01:07:47] configure the way you want and then like
[01:07:50] like for example if the data plane is in
[01:07:52] AWS and control plane is somewhere else
[01:07:55] you don't have to expose the Kubernetes
[01:07:58] API server to the internet
[01:08:00] >> there's a certain agent running in the
[01:08:02] in the data data plane that creates an
[01:08:04] outbound connection to the control plane
[01:08:07] >> right
[01:08:07] >> like that's that's sort of the standard
[01:08:09] these days
[01:08:10] >> right so yeah like so this thing this
[01:08:13] thing basically if it's going to help
[01:08:15] you if you're if If you're going to use
[01:08:16] this to help manage something, it needs
[01:08:18] to be something running in Kubernetes
[01:08:19] and it needs to be probably like a
[01:08:21] something around the CNCF ecosystem of
[01:08:24] tooling or something like that. So this
[01:08:25] isn't trying to like replace Terraform
[01:08:28] or you know control my cloud formation
[01:08:31] um you know stuff like that. Yeah. Yeah.
[01:08:33] Or or run or run my uh
[01:08:35] >> my uh
[01:08:37] >> my other infrastructure like forcell or
[01:08:39] something like that. Yeah. It seems it
[01:08:41] so that's that gives me a nice boundary
[01:08:43] because like when we talk about the
[01:08:44] future of what it's means to be an S sur
[01:08:46] devops platform engineer whatever like
[01:08:49] where where do the edges of these tools
[01:08:51] exist because um sure I can manage
[01:08:53] everything from cloud code in theory at
[01:08:55] this point but it probably I probably
[01:08:58] need very scoped things in order to keep
[01:08:59] it in line and having this thing well
[01:09:02] defined as like it manages it it runs on
[01:09:04] top of Kubernetes so things that you can
[01:09:06] do in Kubernetes it can help with but
[01:09:08] don't you maybe can shoehorn in a bunch
[01:09:10] of things like uh uh for AWS or
[01:09:13] whatever, but maybe it's not the right
[01:09:15] tool to do everything all in one place
[01:09:17] because obviously there's a ton of
[01:09:18] context outside of Kubernetes that it
[01:09:20] would probably need. And
[01:09:21] >> um yeah. Yeah. So there's probably other
[01:09:23] tools for that. At the end of the day,
[01:09:24] we're going to have dozens of agents, I
[01:09:26] feel like, and we're just going to
[01:09:27] >> we're going to have an agent management
[01:09:28] plane where we're all just staring at at
[01:09:30] agent configs and skill files. That's
[01:09:32] like that's our new job. More markdown.
[01:09:35] Um all right. So sorry, back to the
[01:09:37] demo. I know we need to wrap it up. So,
[01:09:38] uh, was there anything else you wanted
[01:09:39] to show real quick? Sorry, I I kept
[01:09:41] distracting you.
[01:09:42] >> Yeah. Yeah. I don't know. We have time,
[01:09:45] Brett.
[01:09:46] >> Yeah, we do. We do. If you if you want
[01:09:47] to do it.
[01:09:47] >> Okay.
[01:09:48] >> For sure.
[01:09:48] >> Okay. Let me show
[01:09:50] >> the internet's always on. So, as long as
[01:09:51] we're we're still able to sit in our
[01:09:53] seats.
[01:09:54] >> Okay. So, in this demo, I will use this
[01:09:57] project. So, it's uh let me go to the
[01:09:59] front end. Uh, let me go the URL. So,
[01:10:03] this is application. This is like a type
[01:10:06] URL shortener. You can see it will use
[01:10:09] uh uh progress radius to uh retrieve
[01:10:12] this kind of uh uh shorten URL. Let me
[01:10:16] mimic a failure scenario and see how the
[01:10:19] agent will kick off. So let me do a like
[01:10:22] if we let me go to the uh different
[01:10:26] component. So we go to the API service
[01:10:28] component.
[01:10:30] Let me change some configuration. Let me
[01:10:32] go and say this my process URL. I will
[01:10:37] misspell it like I will add another
[01:10:40] double s here just to make a failure.
[01:10:44] Let me deploy that fail
[01:10:46] misconfiguration. Now my website would
[01:10:50] fail right? So you can't retrieve the uh
[01:10:54] the records. So now what happened in
[01:10:57] open coro I have configured uh in front
[01:11:01] end service I have configure
[01:11:05] alert
[01:11:07] uh good guy here. So I configure alert
[01:11:10] saying this alert. Okay, if I see a 500
[01:11:15] in my logs for a five five for one
[01:11:18] minute, I will trigger error and it can
[01:11:21] call the page duty for the s also it
[01:11:23] call the s agent running within the open
[01:11:26] query system. So sorry agent will get
[01:11:29] context. Uh let me go here. Go to the uh
[01:11:34] incident. It's create incident. I can
[01:11:37] acknowledge it. Now SR agent will uh
[01:11:41] trigger and collecting all the the the
[01:11:44] the matrices logs and the config changes
[01:11:47] everything and try to generate a report.
[01:11:50] Uh it can generate report within a one
[01:11:53] two one one two minutes. uh and and also
[01:11:56] it's not so stopping there. It can come
[01:11:59] up with the fix as well. So let wait uh
[01:12:03] uh until the agent to come up with the
[01:12:05] report. So it's it's it's reduced the
[01:12:07] meanantime to resolution from hours to
[01:12:10] kind of minute kind of thing. Let's say
[01:12:12] helping SR personas like they normally
[01:12:16] what happen they when they come to
[01:12:17] system they have to correlate all the
[01:12:19] different different metrices everything
[01:12:21] and to find this uh root cause and then
[01:12:24] they have to find the uh remediation
[01:12:26] action as well but now with the agents
[01:12:29] with the context we providing to the
[01:12:30] agent and the all the tools we providing
[01:12:33] it's it's very powerful it's it's it's
[01:12:37] within the minutes it can generate the
[01:12:39] >> uh resources. Yeah, Lakmal, if you could
[01:12:42] go to the cell diagram quickly. I think
[01:12:44] we can
[01:12:47] I think what you what you did was like
[01:12:49] so there's a postgress connection to the
[01:12:51] API service.
[01:12:52] >> So you change you break that connection
[01:12:55] and the front end now fails. So
[01:12:57] basically SR agent gets the whole
[01:12:58] context all the project
[01:13:00] >> logs and everything it has access to the
[01:13:02] whole project. I think that's yeah that
[01:13:04] context is we provide to the agent
[01:13:08] >> agent.
[01:13:09] >> Yeah.
[01:13:10] So, uh, and the key part there was that
[01:13:13] you have an alert defined for that
[01:13:17] error. Is it an error log or a metric? I
[01:13:19] guess that's I'm not sure which which
[01:13:21] one, but it's looking for something and
[01:13:22] there's an alert defined.
[01:13:24] So, it essentially kind of like web
[01:13:26] hooks. I don't know if it's technically
[01:13:27] a web hook, but like it web hooks the
[01:13:29] agent to like wake it up and say, "Hey,
[01:13:31] >> this event fired."
[01:13:33] >> Okay.
[01:13:34] >> Yeah. Yeah. So, basically, uh, it's not
[01:13:37] continuously monitoring. it's running
[01:13:39] but whenever it's trigger a event it act
[01:13:42] upon that event and uh uh do that
[01:13:45] activity
[01:13:46] >> right so we're not burning tokens on
[01:13:48] this thing all day
[01:13:49] >> right like it's not sitting there and
[01:13:51] constantly scanning logs and and and
[01:13:53] putting those into an agent that would
[01:13:55] be that would be
[01:13:56] >> okay you can you can see the report is
[01:13:58] already available you can see even
[01:14:00] though it's lo alert is triggered from
[01:14:03] the front end it uh high confidently say
[01:14:07] okay there's issue in the API service
[01:14:11] uh API service and you can't resolve the
[01:14:15] process URL. So it's it that is that the
[01:14:19] root cause and also it provide the quick
[01:14:22] fix and say okay if you apply this
[01:14:25] configure change engine change it should
[01:14:27] be fixed. Let let me apply and see just
[01:14:30] apply it and if I reload this one
[01:14:33] yeah it's fix the issue. So uh it's
[01:14:36] because of the we providing the all the
[01:14:38] context changes everything it easily
[01:14:40] identified the fix as well. So but what
[01:14:43] what what but but what what important
[01:14:44] thing is now still this is a human in
[01:14:47] the loop but if you go to the permission
[01:14:49] model in in open coro. So here uh you
[01:14:54] can see RC agent there are few uh read
[01:14:57] only tools we uh given to this agent. So
[01:15:02] but if you can if you want to make it
[01:15:04] writable you can give okay uh write
[01:15:07] right right permission with the right
[01:15:09] tool it should be able to fix the issue
[01:15:11] as well. So that that you can control uh
[01:15:14] within the your permission model.
[01:15:16] >> Yeah. So okay. So if you're already to
[01:15:18] the point of it doing sort of auto
[01:15:19] remediation,
[01:15:21] >> does it does it when you when you were
[01:15:24] looking at that um um incident page and
[01:15:28] it was sort of giving you recommended
[01:15:30] >> uh it had it had ideas for how to fix
[01:15:32] it.
[01:15:33] >> Um does it have some sort of confidence
[01:15:35] score or a way to like know this is the
[01:15:38] most likely fix? This is the one that I
[01:15:40] would like. So yeah, I saw high
[01:15:41] confidence there that you're labeling it
[01:15:43] sort of high confidence. That's good.
[01:15:48] >> And unlikely causes. Okay. So it's it's
[01:15:51] sort of theorizing around things. It's
[01:15:53] it might be and things that it also
[01:15:54] might not be.
[01:15:56] >> Um
[01:15:56] >> yeah.
[01:16:01] >> So if it's not high confidence, would it
[01:16:03] like would it if you set it into auto
[01:16:05] mode, would it not try to fix it if it
[01:16:07] was like low confidence?
[01:16:09] Yeah, we we can have that uh
[01:16:10] configuration if it is only
[01:16:12] automatically fixing if there's high
[01:16:14] confidence. So we can we can give that
[01:16:16] configurations.
[01:16:17] >> What were you going to say Samira?
[01:16:20] >> No no exactly. I was going to say we can
[01:16:22] come up with a score like if it's
[01:16:24] confident level is 80 80% 85% then go
[01:16:28] ahead and fix it. If not wait for the
[01:16:31] human something like that.
[01:16:32] >> Is that I'm just curious. This is how
[01:16:34] dumb I am. Uh, is that score just asking
[01:16:37] the LLM what is your confidence on a
[01:16:40] percentage? Is that you're just
[01:16:41] prompting it?
[01:16:43] >> I think it's asking LLM.
[01:16:45] >> LM.
[01:16:46] >> Yeah, it's just asking what how how
[01:16:47] confident are you on a scale of one to
[01:16:50] 100?
[01:16:51] >> Okay. Okay. I just I wondered if there
[01:16:53] was some sort of magic uh magic like
[01:16:55] confidence on each message that comes
[01:16:58] out of the you know the uh the output of
[01:16:59] the tokens or something like that
[01:17:00] because I'm like I I don't know about
[01:17:02] any of this. But it's not the first time
[01:17:03] in a in a product I guess that I've seen
[01:17:05] sort of the confidence rating of things
[01:17:07] that it's reviewing. Um I think menrol
[01:17:09] does the same thing for their CI/CD
[01:17:11] agent where they have like a confidence
[01:17:13] uh ability and uh and they have sort of
[01:17:16] the same idea of like it's read only by
[01:17:17] default but in certain modes certain
[01:17:20] situations in certain agents if you want
[01:17:21] to give them limited write access. So in
[01:17:24] the background this is really like you
[01:17:25] went and edited a Kubernetes resource
[01:17:28] right or I mean through through open
[01:17:30] coro you you edited one broke it and
[01:17:32] then it went and re-edited
[01:17:34] uh an open coro. So in in the in the
[01:17:37] background it's actually going and
[01:17:38] writing to the Kubernetes API on a
[01:17:40] particular resource definition and
[01:17:43] fixing the problem in the sort of normal
[01:17:44] Kubernetes way. I guess I'm I'm I'm
[01:17:46] asking and saying this all because like
[01:17:49] there's no magic here. This is really
[01:17:50] just behaving in the same patterns that
[01:17:52] humans would have done had we not had an
[01:17:54] agent in the loop.
[01:17:56] >> Exactly. Yeah.
[01:17:58] >> All right.
[01:17:59] >> This is pretty.
[01:18:01] >> Go ahead.
[01:18:01] >> Yeah. No, no, no. Sorry. I was going to
[01:18:04] say, yeah, the agent updated open corr
[01:18:08] >> and then that eventually gets
[01:18:10] >> updated into Kubernetes data plane
[01:18:13] >> stuff. Yeah.
[01:18:14] >> So, yeah. So, it's kind of like where
[01:18:15] Argo is like we're the resource that
[01:18:17] creates the resources. It's like kind of
[01:18:20] like that. Yeah, exactly. Okay.
[01:18:22] >> Um,
[01:18:23] >> this is really cool. I didn't I didn't
[01:18:24] expect to actually see the level of uh
[01:18:28] the level this level of pre-builtin
[01:18:31] functionality so early in the project uh
[01:18:33] for for being a a sandbox project
[01:18:36] because this this view right here is
[01:18:38] something that we we don't really get in
[01:18:41] a lot of other tools. Uh and there
[01:18:43] certainly isn't like remediation
[01:18:45] abilities in a lot any other tools. So
[01:18:48] yeah, it is a pretty slick demo for for
[01:18:50] an open source project. So very cool
[01:18:53] guys. Thanks.
[01:18:56] All right. So how can people get
[01:18:58] started? They go to open coro.dev
[01:19:01] and
[01:19:03] are is this a helmchart? Like how do we
[01:19:06] how do we get started with implementing
[01:19:07] this? We well step one got to have
[01:19:09] kubernetes some running somewhere,
[01:19:11] right? So like
[01:19:14] >> so yeah if you if you go to open coro
[01:19:16] dev documentation page there are a
[01:19:19] couple of ways for you to get started.
[01:19:21] First thing if you want to just try out
[01:19:23] open coro um so we have this quick start
[01:19:26] guide that'll basically the quick start
[01:19:29] guide that will basically you just need
[01:19:31] docker for that. So docker and
[01:19:33] kubernetes we will do that we will
[01:19:34] there's one command that will install
[01:19:36] kubernetes in docker
[01:19:38] >> and then that will install open coro. So
[01:19:40] you you get a once it is done you get a
[01:19:43] UI you get MCP access all that that's
[01:19:46] just 10 minutes it won't uh pollute your
[01:19:50] local environment in that you can just
[01:19:52] destroy it after that is done the other
[01:19:54] option is you need to have Kubernetes
[01:19:57] cluster on your local machine install
[01:19:59] open coro there the third option is
[01:20:02] install open coro in cloud environments
[01:20:06] that's that's sort of the way how we
[01:20:07] have structured getting started
[01:20:11] Yeah. Does this Okay. So, it's got some
[01:20:13] persistence to it. I'm assuming this
[01:20:14] thing does this thing have like
[01:20:16] databases as a part of its deployment.
[01:20:18] Does it have like a Reddus or like what
[01:20:20] when we talk about the infrastructure
[01:20:22] inside of Open Coro? I mean, we've got a
[01:20:25] web portal. We've got some obviously
[01:20:27] some MCP endpoints. We've got some
[01:20:29] system components based on other things
[01:20:31] you're installing. Um, what's that
[01:20:33] persistence layer look like for us that
[01:20:34] are going to have to deploy this on our
[01:20:36] clusters?
[01:20:39] Um for the I think for the whole open
[01:20:42] core control plan I would say right now
[01:20:44] the main persistent layer is etc
[01:20:47] whatever the kubernetes use for
[01:20:49] backstage you can configure uh postgrace
[01:20:53] or any other database that backstage uh
[01:20:56] recommends
[01:20:56] >> right
[01:20:57] >> right now it uses yeah I think apart
[01:21:00] from that
[01:21:01] >> there's no persistent
[01:21:03] >> yeah also for a like the SR agent You
[01:21:08] can configure it's coming with SQL light
[01:21:11] with the saving the incident and
[01:21:12] providing the context into the future
[01:21:15] incidents but you can configure a pro
[01:21:17] posgress or any other database.
[01:21:20] >> Yeah, I was actually wondering too like
[01:21:22] I was trying to imagine like this thing
[01:21:24] is this thing is doing things these
[01:21:26] agents are doing things and we probably
[01:21:28] want a history of that. Is this thing
[01:21:30] able to like log into your existing like
[01:21:32] is it logging through Kubernetes to your
[01:21:34] existing? So it's essentially creating
[01:21:35] more monitoring and logging data in your
[01:21:39] infrastructure by the act of it just
[01:21:41] being there and doing things. Is that is
[01:21:42] that where I would go to see hey what
[01:21:44] does my agent do in the last 24 hours?
[01:21:46] Is this something like I would just look
[01:21:47] at my normal monitoring and logging
[01:21:49] platform for that?
[01:21:50] >> Yes, it it will do all the audit logs to
[01:21:53] the uh log itself like we are using
[01:21:55] depending on the log module. It can be a
[01:21:57] open search with the different index
[01:21:59] with for the audit audit logs what what
[01:22:02] agent do what the human do everything is
[01:22:04] tracked uh within the logs itself.
[01:22:07] >> Okay. And the uh and all the data like
[01:22:10] about the infrastructure
[01:22:13] is it is are the agents just pulling
[01:22:15] that real time or do we do you have to
[01:22:17] like deal with caching you know
[01:22:20] infrastru infrastructure graphs or any
[01:22:22] do you have to do anything like that
[01:22:23] into optimize in the open coro layer?
[01:22:26] Yeah, at the moment it's a real time uh
[01:22:28] we are not using any uh uh vector DB or
[01:22:31] something like that uh for the caching
[01:22:33] or something like that. At the moment
[01:22:34] it's it's it's uh real time pulling the
[01:22:37] data but we give the the time interval
[01:22:39] the context saying okay within this time
[01:22:41] interval you have to pull the logs
[01:22:43] matrices. Okay,
[01:22:44] >> it it will just look in that interval
[01:22:46] and uh get the all the data to
[01:22:49] troubleshoot.
[01:22:50] >> Right. So it's so it once it sees an
[01:22:52] event, it already knows exactly the
[01:22:54] second that that event happened. So then
[01:22:56] it can limit itself to so that you're
[01:22:59] not burning a bunch of tokens by boiling
[01:23:00] the ocean and scanning every log on
[01:23:03] >> an infinity. Okay.
[01:23:05] >> Yeah. Perfect. Uh is it does it need to
[01:23:07] look at like graphana graphs like or is
[01:23:10] it just simply doing prom queries and
[01:23:13] and just getting that data real time as
[01:23:15] well?
[01:23:16] >> It's the data to the observability MCP
[01:23:19] server. It's uh just going to the MC MCP
[01:23:22] servers.
[01:23:23] >> Yeah.
[01:23:25] >> Search.
[01:23:25] >> Yeah.
[01:23:26] >> Yeah. Sorry. What would you say? I was
[01:23:28] talking over top of you.
[01:23:31] >> Directly from observability MPV service
[01:23:34] talk to the open search and get the
[01:23:38] data.
[01:23:38] >> Yeah. Yeah. Yeah. Yeah. so that it's a
[01:23:40] little bit simpler than it having to
[01:23:42] constantly uh figure out everything and
[01:23:45] store it in some sort of cache which I
[01:23:47] mean in theory that would make the agent
[01:23:48] faster so that it didn't have to query
[01:23:50] everything itself when you start to work
[01:23:51] with it but it also has a lot of
[01:23:53] complexity and
[01:23:54] >> and resource utilization which gets back
[01:23:56] to the point of
[01:23:58] >> at some point everybody has had that
[01:24:00] infrastructure where they realize they
[01:24:01] they have a a cluster that has more
[01:24:03] resources used for the infrastructure
[01:24:05] than the actual apps themselves. uh
[01:24:07] which you know has happened you know
[01:24:09] sometimes when we have small clusters
[01:24:10] and we're getting started like we we're
[01:24:11] like yeah
[01:24:12] >> you know the the app only cost $100 a
[01:24:14] month to run but the infrastructure
[01:24:15] costs 10,000 a day.
[01:24:18] >> That's yeah well this has been very
[01:24:20] cool. Uh we could honestly I could talk
[01:24:22] to you for another hour about this
[01:24:23] because I'm really I'm really interested
[01:24:24] in like the patterns and the and the
[01:24:27] architecture design that you've learned
[01:24:28] because I think these these are things
[01:24:29] that we're going to have to do the same
[01:24:31] thing for our cloud infrastructure.
[01:24:33] We're going to have to do the same thing
[01:24:34] for our line of business apps. We're
[01:24:35] going to have to figure out how to give
[01:24:36] these agents un a deeper understanding
[01:24:39] of that infrastructure so that we can
[01:24:40] depend on them more and and rely less on
[01:24:43] like the human the human tribal
[01:24:45] knowledge that we all have about how
[01:24:47] these systems are put together and where
[01:24:49] all the bodies are buried and where all
[01:24:51] the uh the friends my friends of mine
[01:24:53] they call it the sins of the data
[01:24:54] center. Where are the where are the sins
[01:24:56] of the data center at and this thing is
[01:24:58] going to have to know all that. this
[01:24:59] this this thing is really when I say
[01:25:02] that I mean like dozens of agents that
[01:25:03] all have certain context and certain
[01:25:05] control. Um, but it all kind of feels
[01:25:07] like it's coming back to this still this
[01:25:10] this uh Victor Farsick who runs the
[01:25:12] DevOps toolkit channel friend of mine uh
[01:25:14] he he talks about that like to me to him
[01:25:17] cloud code is the center of the universe
[01:25:18] and he manages everything through it and
[01:25:20] that anybody who's creating a project
[01:25:22] that isn't expected to be local harness
[01:25:25] first as my interface is is creating an
[01:25:28] outdated project. So this kind of feels
[01:25:30] like this this uh is doing exactly what
[01:25:33] he he was predicting over the last year
[01:25:35] which is uh you know that the harness is
[01:25:38] is my gateway to all this infrastructure
[01:25:40] and everything else needs to be there to
[01:25:42] help me to help me be like local agent
[01:25:45] harness first. Um and and it's pretty
[01:25:47] cool to see that to see that take place.
[01:25:49] So all right are you going to be at
[01:25:51] KubeCon North America in the fall? I
[01:25:54] guess that's the next Well, we actually
[01:25:55] we got KubeCon India. Did that did that
[01:25:56] just happen or I can't remember if it
[01:25:59] just happened.
[01:25:59] >> I think that's a few weeks away ago.
[01:26:02] Yeah.
[01:26:02] >> Yeah.
[01:26:03] >> I keep I keep seeing traffic about it on
[01:26:04] the internet. I'm just not I'm not able
[01:26:06] to go. So I don't know when the actual
[01:26:07] date is. So we got
[01:26:08] >> So uh are there is there going to be
[01:26:10] anyone at at KubeCon India?
[01:26:13] >> Yes, there are few maintenance will fly
[01:26:16] into KubeCon India and also uh Coupon
[01:26:21] Japan as well. Uh it's happening on I
[01:26:23] think uh couple of months right. So that
[01:26:26] we are applying flying to the Japan for
[01:26:28] the as a maintainers. Uh
[01:26:31] >> we'll be in all coupons going forward.
[01:26:35] >> Yeah, all of them. Uh I didn't realize I
[01:26:39] realized we might have not have answered
[01:26:40] William's question. Uh he was asking can
[01:26:42] we deploy with via DockerHub, Helm, and
[01:26:45] do we does it all have to be in the same
[01:26:47] cluster or can it be installed in
[01:26:49] another cluster?
[01:26:52] >> Well, since it's multicluster already.
[01:26:54] >> Yeah. It could tech it technically could
[01:26:56] be in its own cluster if you wanted to
[01:26:57] do that, but it does it's probably going
[01:26:59] to need certain components that are
[01:27:01] inside of each cluster, right? So that
[01:27:03] it can get the the API data and
[01:27:05] everything that it needs out of that
[01:27:06] cluster. Yeah,
[01:27:07] >> we we are installing uh some agents
[01:27:09] within the different clusters to
[01:27:11] communicate with the control pane. How
[01:27:12] that the communications happen? It can
[01:27:15] be a multicluster. We are just
[01:27:16] installing the agent in that particular
[01:27:18] cluster and then it can communicate with
[01:27:21] the control pane.
[01:27:22] >> Nice. And there are a bunch of Helm
[01:27:24] charts available, I think, to answer his
[01:27:25] question.
[01:27:26] >> Yeah.
[01:27:28] >> Yeah. Uh, all right. So, people get
[01:27:29] started at the open coro.dev website and
[01:27:32] then I think you're on uh LinkedIn. I'm
[01:27:34] trying to think what the project is on
[01:27:36] LinkedIn and X. I'm trying to remember
[01:27:38] where where there's socials.
[01:27:41] >> We have the X and LinkedIn pages. Uh,
[01:27:45] and also we have the Slack channel, CNCF
[01:27:48] Slack channel. uh we were actively uh
[01:27:51] engaging with there.
[01:27:53] >> Yes.
[01:27:53] >> And we use GitHub discussions a lot. If
[01:27:55] you want to engage with us, we'll be
[01:27:57] there.
[01:27:58] >> I I love I love GitHub discussions. I I
[01:28:00] wish they were used more. Um uh rather
[01:28:03] than issues like I feel like a lot of
[01:28:04] things just need to start in discussions
[01:28:05] before they go to issues. And maybe some
[01:28:07] projects do that now where they not they
[01:28:08] don't even let you post things in
[01:28:10] issues. You have to you have to start
[01:28:11] the discussions first.
[01:28:12] >> Got an uh yay AI ruining open source for
[01:28:15] us. Um
[01:28:16] >> at least for now we're figuring it out.
[01:28:18] Um I'm thank you both so much for being
[01:28:20] here. Uh you can follow the project like
[01:28:22] we said on LinkedIn or on X or join the
[01:28:25] Slack and the is it is it the Kubernetes
[01:28:28] Slack or the CNCF Slack?
[01:28:30] >> CNCF Slack.
[01:28:31] >> CNNCF
[01:28:32] >> CNCF Slack.
[01:28:32] >> Slack open coro channel I would say.
[01:28:35] Yeah.
[01:28:35] >> Yeah. Okay. I'm in both and I can never
[01:28:38] remember which projects are in which or
[01:28:40] if the Kubernetes Yeah. So it's in the
[01:28:42] CNCF Slack. You can uh probably just
[01:28:43] Google that and find that. uh tell your
[01:28:45] agent to sign you up like don't you
[01:28:46] don't go do it yourself. Um thank you so
[01:28:49] much Luck Maul and Samira for being here
[01:28:51] and we will hopefully I I'm looking
[01:28:53] forward to seeing what's next in the
[01:28:54] project. What real last last quick thing
[01:28:56] what's next? What's what big PRs are
[01:28:58] coming down the pike since we're talking
[01:29:00] open source here.
[01:29:03] >> More agents.
[01:29:04] >> More agents. Could I could have guessed
[01:29:07] it. Could have guessed it.
[01:29:09] >> All right.
[01:29:09] >> All about AI.
[01:29:11] >> Yeah. More more of the AI overlords. All
[01:29:13] right. Thank you so much for being here.
[01:29:15] Uh, thanks for everybody in chat for for
[01:29:17] being with us and we'll see you in the
[01:29:19] next one.
[01:29:20] >> Bye everybody.
[01:29:21] >> Thank you.
[01:29:21] >> Thanks.
[01:29:22] >> Bye.
[01:29:23] >> Bye.