Full Transcript
https://www.youtube.com/watch?v=D27gGn2kwzA
[00:17] Oh no.
[00:33] Oh my god.
[00:47] There's
[01:09] So why do you have to go?
[01:19] Should we try?
[01:21] Yep.
[01:24] Can you hear me?
[01:24] Yes.
[01:27] Can you hear me?
[01:27] Yes.
[01:27] Is the recording on?
[01:31] Is the recording on?
[01:33] recording on?
[01:33] Okay, good.
[01:34] Okay, good.
[01:34] Great.
[01:36] Great.
[01:36] Hi everybody.
[01:38] Hi everybody.
[01:38] My name is Lulan.
[01:40] I'm really happy to see so many people here because I thought that everybody's already so tired that nothing interests anymore.
[01:43] tired that nothing interests anymore.
[01:45] But but great to have you here.
[01:49] We are from MER
[01:51] the concert that was originally founded as MER but has been known as Wund for 10ish years but then this week we returned to be just because we wanted to remind ourselves of what we are.
[01:56] 10ish years but then this week we returned to be just because we wanted to remind ourselves of what we are.
[02:02] Um here's Tommy.
[02:06] here's Tommy.
[02:08] Hi nice to meet you.
[02:11] I'm one of the co-founders of Meara uh which was in between going back to Mea known as Vunder crowd and vunder
[02:13] uh which was in between going back to Mea known as Vunder crowd and vunder
[02:20] Mea known as Vunder crowd and vunder and whatever and we've always confused.
[02:22] and whatever and we've always confused with wunder bow and wand and wunder.
[02:25] with wunder bow and wand and wunder whatever so not anymore.
[02:29] Uh anyway today we are here to tell you a story from universities.
[02:34] Um let's go to the next slide.
[02:39] So we're in Athens and it does feel pretty appropriate because uh this is the place where the I would say education was born, right?
[02:47] So there was this man called Socrates wandering around the streets somewhere around here and he was asking questions.
[02:58] He was asking them relentlessly and even annoyingly some might say.
[03:04] Um but here's the thing about Socrates.
[03:08] Uh he wrote anything down.
[03:11] So if you wanted to get an answer, you needed to find him and he was the only interface basically.
[03:20] But that worked well because he was brilliant and because there were just a
[03:21] brilliant and because there were just a few dozen of people actually following.
[03:24] few dozen of people actually following him.
[03:27] But if you fast forward two and a half thousand years and we've got universities and this like the content.
[03:29] half thousand years and we've got universities and this like the content the amount of content is is endless.
[03:35] the amount of content is is endless.
[03:35] There's there's like policies and guides and FAQs and and whatnot and still the students don't find the answers and we know they don't.
[03:44] students don't find the answers and we know they don't.
[03:46] So what they do is that they they phone and then they eventually give up, right?
[03:53] phone and then they eventually give up, right?
[03:57] And and the problem therefore is not the content really.
[04:00] not the content really.
[04:00] The problem is always always in access and that's what we are talking about today.
[04:04] and that's what we are talking about today.
[04:06] today. How to really create accessible content.
[04:10] How to really create accessible content.
[04:16] So let's go.
[04:16] Every universities as much as I know have more than enough content.
[04:19] Every universities as much as I know.
[04:19] The best
[04:22] have more than enough content.
[04:24] The best universities are not the ones that have the most.
[04:27] They are the ones that can deliver the answers quickly.
[04:30] And here's the business case.
[04:32] Here's the real business case that we've been working on.
[04:37] Confusion is actually really expensive.
[04:40] It's really expensive.
[04:45] Every unanswered questions email.
[04:47] Every email becomes a queue and every cube means that there is a person who needs the answer.
[04:49] There's a like overloaded uh service desk or something like that.
[04:55] is really really expensive and that's where the real cost is really hiding.
[04:59] and that's the case that you can use in favor of your business.
[05:05] So clarity is not a nice to have for university.
[05:10] it is the real competitive advantage if you start thinking about it.
[05:15] and now will tell you something about that.
[05:23] will tell you something about that.
[05:23] Thanks Ola.
[05:25] Thanks Ola.
[05:29] we see this pattern everywhere we go.
[05:34] we see this pattern everywhere we go.
[05:37] it's just not only the universities,
[05:40] not only the universities, it's not even only the companies,
[05:43] it's not even only the companies, it's every organization
[05:46] every organization that has the same um problem.
[05:51] that has the same um problem.
[05:51] There might be multiple authorative
[05:54] knowledge bases,
[05:56] tens of them even.
[05:58] Uh in the case of universities,
[06:02] there might be internets,
[06:05] studentf facing knowledge bases for curriculum uh systems.
[06:09] curriculum uh systems.
[06:10] There might be IT help desks
[06:14] uh systems.
[06:14] There might be IT help desks um uh facility
[06:15] uh facility booking systems and so on.
[06:18] booking systems and so on.
[06:21] All of them are equally correct,
[06:25] All of them are equally correct, authorative.
[06:27] authorative and at the same time they also are owned.
[06:31] and at the same time they also are owned by different teams quite often.
[06:37] and this brings in fragmentation.
[06:42] and this brings in fragmentation in the ownership.
[06:44] in the ownership updating and managing the content.
[06:47] updating and managing the content.
[06:50] uh making sure that they all have the same tone of voice and.
[06:55] they all have the same tone of voice and so on.
[06:57] so on.
[07:00] In the end, the quality drifts because of.
[07:02] because of all of this.
[07:09] Uh and this is uh happening in every organization having lots of content.
[07:13] happening in every organization having lots of content.
[07:18] Uh the thing that needs to be solved as.
[07:21] Uh the thing that needs to be solved as Ula.
[07:22] Ula explained is to make it uh accessible.
[07:28] explained is to make it uh accessible, make it feel like you are in control.
[07:30] make it feel like you are in control when you try to use the content.
[07:35] when you try to use the content.
[07:35] It cannot be just a list of blue URL.
[07:39] It cannot be just a list of blue URL links.
[07:41] links on a search page on somewhere um some.
[07:46] on a search page on somewhere um some some uh Wikipedia.
[07:46] Oh, I'm sorry. Vicki um instructional um pages.
[07:57] um pages.
[07:57] The users such as the students of the university don't want to see the blue URLs.
[08:06] don't want to see the blue URLs.
[08:06] They want to have access to the content, sorry, the answer in a way that feels natural and at the same time authorative.
[08:22] If this sounds familiar, it might be that you have seen this happening in
[08:28] that you have seen this happening in your own organization or in your client's organizations.
[08:35] client's organizations which might have a lots of fragmented content equally correct.
[08:40] content equally correct.
[08:44] When we started this project, we thought the the hardest part would be the A. I think choosing the model, fixing the prompts, doing evolves, fixing the prompts again, choosing the embedding model, choosing the chunking for the embeddings.
[08:51] we thought the the hardest part would be the A. I think choosing the model, fixing the prompts, doing evolves, fixing the prompts again, choosing the embedding model, choosing the chunking for the embeddings.
[08:55] hardest part would be the A. I think choosing the model, fixing the prompts, doing evolves, fixing the prompts again, choosing the embedding model, choosing the chunking for the embeddings.
[08:58] choosing the model, fixing the prompts, doing evolves, fixing the prompts again, choosing the embedding model, choosing the chunking for the embeddings.
[09:00] fixing the prompts, doing evolves, fixing the prompts again, choosing the embedding model, choosing the chunking for the embeddings.
[09:04] doing evolves, fixing the prompts again, choosing the embedding model, choosing the chunking for the embeddings.
[09:08] choosing the embedding model, choosing the chunking for the embeddings.
[09:11] the chunking for the embeddings. But it wasn't that.
[09:14] wasn't that. Uh the the hardest part was the finding a solution that answers users need.
[09:19] Uh the the hardest part was the finding a solution that answers users need.
[09:24] finding a solution that answers users need.
[09:29] Back to easy peasy.
[09:35] Easy peasy.
[09:37] Yeah, that was something that most of you maybe are familiar with like cases familiar to any organization, not just universities, I guess.
[09:49] But here's what I want you to think.
[09:53] Um, everything a student or a staff member actually needs to know, it exists.
[10:01] It does already exist.
[10:04] So someone someone wrote it, someone someone proved it, someone published it, right?
[10:11] It's sitting on your Drupal instance or in your subsite or or um wherever PDF that was made in 2020 and nobody deleted it yet.
[10:21] So So it's all there.
[10:24] The knowledge you've invested in it and you you're instilling it.
[10:28] So
[10:31] Instilling it.
[10:33] So when a student then hits the wall?
[10:36] When a student then hits the wall doesn't find the answer, they don't go to your subsides, right?
[10:37] They don't they don't know what to do.
[10:42] They keep so that's not a content problem because the content is there.
[10:45] That's an access problem and it has a measurable cost like I said in stock hours and in support hours and everything.
[10:47] Uh the opportunity is not there to create more content.
[10:49] It's to make the content work for people.
[10:52] I think yesterday in his keynote speech that you should always start with us.
[10:55] It goes with this case as well.
[10:58] It's it's so well said you should always start with UX, not the technology.
[11:02] The technology part is actually relatively easy when you know where to go.
[11:06] So, this is my favorite part.
[11:10] That was a
[11:33] So, this is my favorite part.
[11:33] That was a revelation to me.
[11:36] Um, we've been we've been talking about CMS for like what 20 years I guess and better CMS, better workflows, better navigations, um, whatever.
[11:50] I think somewhere along the way we confuse the tool with the goal because right who actually gets really excited over managing content.
[12:00] I mean nobody really wants to manage content if they want people to get answers.
[12:06] And when you start thinking about that way um the what I'm proposing here is that stop think about your your your website as a content repository that is somewhere over there and people navigate it.
[12:25] It's by thinking about it as a like answering giving machine or something like that.
[12:29] That's where you actually find the right answers.
[12:31] So not better
[12:34] find the right answers.
[12:38] So not better menus, not better CMS answers and that's menus, not better CMS answers and that's a different design problem than what we a different design problem than what we started with.
[12:43] Um it's different conversation to have with your leadership to be honest and this is This is what we actually spent last three months trying to fix for the University of any organizations I would say.
[12:58] So in practice um this this case is about uh we'll be showing a demos um of this actual solution.
[13:09] It's about the Helsing University of Helsinki and it's um Uni help service.
[13:21] Uh a quick note about University of Helsinki.
[13:23] It's a one of the biggest universities in Aoro Europe.
[13:26] Um with uh 35,000 students, three official languages and lots of content.
[13:33] Um in this uh case
[13:38] And lots of content.
[13:42] Um, in this uh case, we have migrated content from seven different upstream systems.
[13:45] And it took us three months to implement this with three engineers, couple of service designers, one very great product owner focusing purely on this uh project.
[14:06] Well, the question was build an AI search.
[14:10] Um, the question was to make the answering machine that Ola referred.
[14:23] And keeping that in mind, we were able to focus and prioritize work the whole project implementation phase.
[14:35] But that was really intense three months.
[14:37] That was yes.
[14:37] Yes.
[14:37] Um.
[14:41] That was Yes. Yes.
[14:41] Um so how did we build it?
[14:44] So how did we build it?
[14:44] Naturally we used Drupal but not the traditional way of storing content in it.
[14:57] Drupal is not the content store for the main data.
[14:57] It has a supportive role though but the main data is kept in the upstream um systems which we consider the canal systems.
[15:14] We are just providing answers um to the users sort of a automated service desk self-service desk basically um so we didn't store content in Drupal as nodes.
[15:35] We used data pipelines module which is
[15:42] We used data pipelines module which is very good.
[15:48] It go I passes Drupal when uh reading the data from external systems.
[15:51] Reading the data from external systems into elastic search and in between.
[15:54] Into elastic search and in between there's a transformation happening where.
[15:58] There's a transformation happening where we do the chunking of the data before we.
[16:01] We do the chunking of the data before we send it to the embedding uh model for uh.
[16:06] Send it to the embedding uh model for uh the vector uh retrieval and then uh do.
[16:10] The vector uh retrieval and then uh do some manipulations when needed for uh.
[16:14] Some manipulations when needed for uh metadata.
[16:16] For instance, uh the database.
[16:19] Uh the database is elastic search for the for our.
[16:22] Is elastic search for the for our service.
[16:24] Service.
[16:24] Uh Drupal as I said is having this supportive role.
[16:28] It handles the configuration.
[16:31] Configuration.
[16:31] We use a lot of feature flags for ve very many things starting from Slack.
[16:37] Very many things starting from Slack notifications to.
[16:43] notifications to uh which
[16:45] uh which functionalities are available at given
[16:47] functionalities are available at given time in the UI for the end users.
[16:50] We have these kill switches for I don't know dozen of different functionalities
[16:55] know dozen of different functionalities related to the search
[16:58] related to the search itself.
[16:59] itself. Drupal also manages the um orchestration
[17:02] Drupal also manages the um orchestration of the data retrieval uh from those
[17:07] of the data retrieval uh from those upstream systems.
[17:07] We have uh roughly 45 crunch jobs running on daily basis or
[17:15] crunch jobs running on daily basis or hourly basis.
[17:19] Um we for instance track if there's a
[17:21] we for instance track if there's a sudden spike in the the usage of the
[17:24] sudden spike in the the usage of the search and have a alert.
[17:30] Um in such cases
[17:31] cases u the permissions are handled through
[17:34] u the permissions are handled through Drupal and so on.
[17:37] Drupal and so on. Elastic search itself has a W duty.
[17:42] Elastic search itself has a W duty. It has the de vector um uh database role.
[17:47] has the de vector um uh database role.
[17:48] We don't have a separate vector database.
[17:53] Um elastics works very well with with um vector um searches when there's less than million items less than million documents in the in the database.
[18:04] Um it also has this traditional lexical search term based search um role.
[18:13] We have this hybrid mode where each search um combines the lexical plus the vectorbased semantic searching in one cure.
[18:30] Basically usually we have this usually there is some special cases but most of the cases it's a one query um so we have a micros service next to Drupal uh built with the NodeJS uh using the
[18:47] uh built with the NodeJS uh using the fastify framework.
[18:50] fastify framework.
[18:53] um that f uh micros service has a couple of different endpoints that we um provide the data through.
[19:01] So the front end um search mini application which is react based is utilizing those um microservices.
[19:13] microservices running in NodeJS and uh very important part of the um workflow is the structured JSON usage in the uh LLM um API calls.
[19:33] So uh I think quite many nowadays rely on the structured JSON but and and those the major providers all provide nowadays the the option to
[19:48] provide nowadays the the option to choose the JSON schema uh mode.
[19:52] choose the JSON schema uh mode um which makes it easy to validate actually it's.
[19:58] it easy to validate actually it's validated on their side in in um in our case we were using Azure Open ID.
[20:02] validated on their side in in um in our case we were using Azure Open ID.
[20:06] So their API makes uh certain that there's this schema um implemented in the responses from the LLM.
[20:10] their API makes uh certain that there's this schema um implemented in the responses from the LLM.
[20:15] implemented in the responses from the LLM.
[20:21] Yeah.
[20:23] And then there's a that was the biggest layers but then there's also in between.
[20:26] that was the biggest layers but then there's also in between uh in this microservices level there's a cury rewriting intent detector uh lang language detector and so on.
[20:29] uh in this microservices level there's a cury rewriting intent detector uh lang language detector and so on.
[20:35] detector uh lang language detector and so on.
[20:39] so on. There's um many small non LLM non AI um parts uh built with
[20:43] so on. There's um many small non LLM non AI um parts uh built with
[20:49] non LLM non AI um parts uh built with traditional algorithms very cheap no
[20:53] traditional algorithms very cheap no machine learning there which will detect
[20:56] machine learning there which will detect if for instance you
[20:59] if for instance you uh use different language than the
[21:02] uh use different language than the current UI is um having
[21:07] And
[21:10] And >> can we say something?
[21:11] >> can we say something? >> Yes.
[21:11] >> Yes. >> Let's jump to the first
[21:14] >> Let's jump to the first uh video um which I recorded like an
[21:18] uh video um which I recorded like an half an hour ago.
[21:21] half an hour ago. Let's see if we
[21:23] Let's see if we >> just because he wasn't happy. He wasn't
[21:25] >> just because he wasn't happy. He wasn't happy with the original one and then he
[21:27] happy with the original one and then he scared me
[21:29] scared me >> like completely by saying that I'm going
[21:31] >> like completely by saying that I'm going to change it. I'm like half an hour
[21:33] to change it. I'm like half an hour before the show. Yes. Thank you.
[21:35] before the show. Yes. Thank you. >> Yes. So this clip is about
[21:40] >> Yes. So this clip is about uh very basic use case. A student lands
[21:45] uh very basic use case. A student lands on this service um executes a um search
[21:53] on this service um executes a um search and gets the response.
[22:00] And this is the UI the part that we are
[22:03] And this is the UI the part that we are now focusing on in this service.
[22:06] now focusing on in this service. asking about the thesis deadline and
[22:10] asking about the thesis deadline and retrieval augmented generated response
[22:13] retrieval augmented generated response comes from the L uh LLM
[22:16] comes from the L uh LLM giving the summarization of the whole
[22:21] giving the summarization of the whole data pool and we have this let's stop
[22:25] data pool and we have this let's stop for a while here
[22:28] for a while here no
[22:30] no sorry uh I wanted to highlight this
[22:34] sorry uh I wanted to highlight this references part.
[22:40] Um, and this is this was actually
[22:43] Um, and this is this was actually something we figured out was necessary
[22:45] something we figured out was necessary to have in this project and required
[22:49] to have in this project and required also the JSON schema, JSON structured
[22:53] also the JSON schema, JSON structured approach in the LLM responses.
[22:56] approach in the LLM responses. we needed to have a separate field from
[22:59] we needed to have a separate field from the LLM to respond which
[23:03] the LLM to respond which documents from the actual uh provided
[23:07] documents from the actual uh provided contexts where you using to generate the
[23:11] contexts where you using to generate the response the the summarizations the
[23:14] response the the summarizations the answer. So with each response from the
[23:18] answer. So with each response from the LLM
[23:19] LLM we get
[23:21] we get uh well multiple fields but three of
[23:24] uh well multiple fields but three of them are very important. The first one
[23:26] them are very important. The first one is the answer textual answer. We just
[23:29] is the answer textual answer. We just saw an example of it. The second is a
[23:34] saw an example of it. The second is a references field which is which is an
[23:36] references field which is which is an array of objects containing the title of
[23:40] array of objects containing the title of the referenced uh
[23:44] the referenced uh content plus the URL of that content. So
[23:48] content plus the URL of that content. So based on that information we get this
[23:51] based on that information we get this references uh section to the answer. And
[23:54] references uh section to the answer. And the third important one was
[23:57] the third important one was has information field. It's a boolean
[24:01] has information field. It's a boolean whether you have
[24:05] whether you have answer or sorry whether you have
[24:08] answer or sorry whether you have information or not. So the LM is
[24:11] information or not. So the LM is required to answer if it thinks that yes
[24:17] required to answer if it thinks that yes I have an answer for for um
[24:21] I have an answer for for um this question. So, we were utilizing
[24:24] this question. So, we were utilizing those
[24:26] those and this was a happy path. Let's go to
[24:30] and this was a happy path. Let's go to the harder ones.
[24:31] the harder ones. >> I'm specialized on the hard stuff.
[24:34] >> I'm specialized on the hard stuff. >> Yeah, there's a question.
[24:36] >> Yeah, there's a question. >> Sure.
[24:37] >> Sure. >> Yes. Hey, go ahead. I love questions.
[24:46] >> Sorry.
[24:50] >> Good question. Well um we actually went
[24:53] >> Good question. Well um we actually went back and forth with the internet and the
[24:56] back and forth with the internet and the well all the other except internet they
[24:59] well all the other except internet they are public
[25:01] are public >> sources. Yes the internet one is tricky
[25:03] >> sources. Yes the internet one is tricky one. uh we actually have a feature flag
[25:08] one. uh we actually have a feature flag how to how to use it. But right now the
[25:11] how to how to use it. But right now the the configuration says that if the user
[25:14] the configuration says that if the user is not um logged in
[25:18] is not um logged in there is SSO actually involved. If the
[25:21] there is SSO actually involved. If the user is not logged in, we skip the
[25:24] user is not logged in, we skip the internet.
[25:25] internet. But we have a option that the admin can
[25:29] But we have a option that the admin can change on change on the fly that has a
[25:34] change on change on the fly that has a uh uh so the other option would be that
[25:38] uh uh so the other option would be that yes
[25:39] yes take those internet contents also into
[25:43] take those internet contents also into consideration. But if the summary or
[25:48] consideration. But if the summary or well if the summary would include
[25:51] well if the summary would include something from those internet contents
[25:54] something from those internet contents display a prompt to log in. So there's a
[25:57] display a prompt to log in. So there's a sort of a log icon and so on.
[26:00] sort of a log icon and so on. >> But that's actually quite quite a nice
[26:03] >> But that's actually quite quite a nice bridge to to what I'm going to say next.
[26:06] bridge to to what I'm going to say next. Next.
[26:10] There we go. No. What did I do?
[26:14] There we go. No. What did I do? >> Yes.
[26:16] >> Yes. Um yeah, uh you know when Google gets it
[26:19] Um yeah, uh you know when Google gets it wrong, you search again. Um but then if
[26:22] wrong, you search again. Um but then if the university health portal or whatever
[26:25] the university health portal or whatever gets it wrong, then a student misses the
[26:27] gets it wrong, then a student misses the deadline or files the wrong form or
[26:31] deadline or files the wrong form or misunderstands his rights or something
[26:33] misunderstands his rights or something like that. So, so um so the stakes are
[26:36] like that. So, so um so the stakes are really different here and uh that was
[26:40] really different here and uh that was kind of a thing that shaped everything
[26:42] kind of a thing that shaped everything we did because um
[26:45] we did because um this I think the part where most of the
[26:47] this I think the part where most of the AI projects quietly failed. Um not on
[26:51] AI projects quietly failed. Um not on the technology like I said it's not the
[26:53] the technology like I said it's not the hardest part but trust they failed on
[26:55] hardest part but trust they failed on trust and uh that was that's that's in
[27:00] trust and uh that was that's that's in an institutional context this way. Um it
[27:04] an institutional context this way. Um it has a specific meaning and that question
[27:06] has a specific meaning and that question was actually all about it because we
[27:08] was actually all about it because we have to know whe this whether this
[27:10] have to know whe this whether this answer is correct right um then we want
[27:14] answer is correct right um then we want to see where it came from is it
[27:16] to see where it came from is it connected to what information and and
[27:19] connected to what information and and then one of the questions was that does
[27:22] then one of the questions was that does the system know when it shouldn't be
[27:24] the system know when it shouldn't be answering at all and that was the third
[27:27] answering at all and that was the third one. So uh these these were not after
[27:31] one. So uh these these were not after that was actually something that was
[27:33] that was actually something that was included in the original brief. So we
[27:36] included in the original brief. So we started with it and um
[27:40] started with it and um we weren't there to build an impressive
[27:42] we weren't there to build an impressive AI feature. We wanted to to build
[27:46] AI feature. We wanted to to build something that the University of
[27:47] something that the University of Helsinki can can really stand behind
[27:50] Helsinki can can really stand behind proud of and say that this really helps
[27:53] proud of and say that this really helps our students and whatever stakeholder it
[27:56] our students and whatever stakeholder it is.
[27:57] is. >> And now he's gonna again do these show
[28:00] >> And now he's gonna again do these show how
[28:02] how Uh a quick word about how did we solve
[28:04] Uh a quick word about how did we solve the trust issue in in practice.
[28:07] the trust issue in in practice. Uh we just saw uh this this the the
[28:11] Uh we just saw uh this this the the references. That's the first point. We
[28:13] references. That's the first point. We have to show where
[28:17] have to show where uh from which uh content and and context
[28:21] uh from which uh content and and context the the uh the the answer is based on.
[28:24] the the uh the the answer is based on. There's a link there's a references. If
[28:27] There's a link there's a references. If there's an um if there's some issues um
[28:31] there's an um if there's some issues um people will know who to contact. Hey,
[28:34] people will know who to contact. Hey, you haven't expired information on your
[28:37] you haven't expired information on your page. For instance,
[28:39] page. For instance, um there's the hybrid uh surface. So,
[28:42] um there's the hybrid uh surface. So, what we just saw was the AI generated um
[28:46] what we just saw was the AI generated um summarization,
[28:48] summarization, the answer, but below that there's also
[28:51] the answer, but below that there's also the traditional
[28:53] the traditional result cards.
[28:55] result cards. um that people can then review
[28:59] um that people can then review themselves and choose which one to
[29:02] themselves and choose which one to follow.
[29:03] follow. Uh and then there's this has information
[29:07] Uh and then there's this has information thing which means that we want to make
[29:11] thing which means that we want to make sure that we don't provide fake
[29:16] sure that we don't provide fake summaries.
[29:18] summaries. Um so the LLM is is um required
[29:25] Um so the LLM is is um required Um the system prompt says that you have
[29:29] Um the system prompt says that you have to be very certain that this um this
[29:32] to be very certain that this um this answer has proper information before you
[29:37] answer has proper information before you set this has information to true. Uh it
[29:41] set this has information to true. Uh it will reply something even if it says
[29:44] will reply something even if it says that I might not have a good information
[29:48] that I might not have a good information and then we hide the answer in that
[29:51] and then we hide the answer in that case.
[29:53] case. Um the last point is that of course
[29:57] Um the last point is that of course naturally this is a rack search. So it
[30:01] naturally this is a rack search. So it is based on only the contacts that we
[30:04] is based on only the contacts that we provide to the LLM. So uh I think the
[30:09] provide to the LLM. So uh I think the maximum amount of documents we send is
[30:12] maximum amount of documents we send is 20 at the moment. So we send 20 docs and
[30:16] 20 at the moment. So we send 20 docs and it then reads them considers them as the
[30:20] it then reads them considers them as the only contacts and then
[30:23] only contacts and then And which are those 20 items? They are
[30:25] And which are those 20 items? They are based on the vector search or well
[30:31] based on the vector search or well vector plus the lexical search made by
[30:34] vector plus the lexical search made by the elastic search
[30:36] the elastic search and we'll have another video I believe
[30:41] and we'll have another video I believe and this is about um the language. So,
[30:45] and this is about um the language. So, as I said, the microservices layer has a
[30:48] as I said, the microservices layer has a multiple different um tiny um levels.
[30:54] multiple different um tiny um levels. One of them being the language detector.
[30:58] One of them being the language detector. Let's have a quick look. This is a
[30:59] Let's have a quick look. This is a Finnish UI. As you can see, the user
[31:01] Finnish UI. As you can see, the user hasn't selected English or Swedish from
[31:04] hasn't selected English or Swedish from the uh drop down here. types in an
[31:08] the uh drop down here. types in an English question and the um algorithm
[31:12] English question and the um algorithm finds out that it's an English displays
[31:15] finds out that it's an English displays a chrome sort of um call to action
[31:19] a chrome sort of um call to action element and the user can click it and
[31:21] element and the user can click it and then taken to English UI.
[31:26] then taken to English UI. Cool.
[31:28] Cool. Um then
[31:31] Um then moving on uh about the uh
[31:35] moving on uh about the uh project itself. It was as said a tight
[31:38] project itself. It was as said a tight three months months. Uh couple of points
[31:43] three months months. Uh couple of points which are important. We didn't rebuild
[31:48] which are important. We didn't rebuild anything regarding the content
[31:49] anything regarding the content management systems. We crawl the
[31:52] management systems. We crawl the existing um systems in some cases. So
[31:56] existing um systems in some cases. So scraping out the the content from the
[31:59] scraping out the the content from the existing systems and in some cases using
[32:01] existing systems and in some cases using uh JSON API uh JSON API APIs uh in
[32:07] uh JSON API uh JSON API APIs uh in Drupal u sites that was um quite trivial
[32:11] Drupal u sites that was um quite trivial actually to implement because uh Drupal
[32:14] actually to implement because uh Drupal as you know is is providing a
[32:17] as you know is is providing a great way of structuring data. Uh we had
[32:20] great way of structuring data. Uh we had the data pipelines module
[32:23] the data pipelines module um and and chron jobs which uh are um
[32:28] um and and chron jobs which uh are um you utilizing trust commands which
[32:30] you utilizing trust commands which actually run the uh pipelines
[32:34] actually run the uh pipelines and and we kept that one question
[32:38] and and we kept that one question as the the the priority.
[32:42] as the the the priority. we need to answer the the student or
[32:45] we need to answer the the student or actually it would be better to say
[32:47] actually it would be better to say answer the user because student is only
[32:50] answer the user because student is only one user uh segment. There's a there's a
[32:55] one user uh segment. There's a there's a um teachers uh staff members,
[32:58] um teachers uh staff members, researchers and so on and and then um
[33:02] researchers and so on and and then um potential students also uh also
[33:05] potential students also uh also international students and so on
[33:08] international students and so on >> researchers whatever. Yeah,
[33:10] >> researchers whatever. Yeah, >> a lot.
[33:11] >> a lot. >> All right. But moving on um another
[33:14] >> All right. But moving on um another video
[33:16] video uh this is about
[33:19] uh this is about uh the lexical search. So we have a
[33:21] uh the lexical search. So we have a capability of the semantic search you
[33:24] capability of the semantic search you might know doesn't work well if you type
[33:27] might know doesn't work well if you type in something like that. A code or a name
[33:30] in something like that. A code or a name of a teacher for instance wouldn't
[33:34] of a teacher for instance wouldn't necessarily uh provide good um results
[33:37] necessarily uh provide good um results with uh semantical search. So
[33:40] with uh semantical search. So vector-based search wouldn't necessarily
[33:43] vector-based search wouldn't necessarily come up with good um results. So we have
[33:47] come up with good um results. So we have this lexical part. So uh the traditional
[33:50] this lexical part. So uh the traditional BM23
[33:52] BM23 uh five um search term based search in
[33:56] uh five um search term based search in other words which is looking for that
[33:59] other words which is looking for that exact term in um in the content.
[34:06] Yes.
[34:08] Yes. Um and so
[34:11] Um and so we wanted to
[34:14] we wanted to highlight what we learned
[34:18] highlight what we learned summarizing it that
[34:22] summarizing it that measure things.
[34:25] measure things. figure out what is the correct way u
[34:27] figure out what is the correct way u things you want to measure and
[34:32] things you want to measure and implement it it then
[34:36] implement it it then that's the only way to actually improve
[34:39] that's the only way to actually improve otherwise it's only based on who shouts
[34:42] otherwise it's only based on who shouts loudest and and and and not based on the
[34:45] loudest and and and and not based on the actual
[34:47] actual uh usage volumes and the usage data and
[34:51] uh usage volumes and the usage data and so on.
[34:54] so on. So
[34:55] So I want to show another video because I
[34:58] I want to show another video because I think these are much better than us
[35:01] think these are much better than us talking.
[35:03] talking. >> I didn't know that you had time to make
[35:05] >> I didn't know that you had time to make so many.
[35:06] so many. >> So we have this sort of a uh analytics.
[35:09] >> So we have this sort of a uh analytics. This is Drupal UI for the admins only.
[35:12] This is Drupal UI for the admins only. We have as you can see um multiple
[35:16] We have as you can see um multiple dashboards. This is one of them. Um it
[35:19] dashboards. This is one of them. Um it has a health score, health score trend
[35:22] has a health score, health score trend which is going down. There's the uh pie
[35:25] which is going down. There's the uh pie chart uh displaying what sort of a uh
[35:28] chart uh displaying what sort of a uh information people are looking from the
[35:30] information people are looking from the searches and there's um anomalies. So
[35:35] searches and there's um anomalies. So this is a uh this is utilizing AI but in
[35:39] this is a uh this is utilizing AI but in an algorithm way
[35:43] an algorithm way we have uh measurements we have a loads
[35:45] we have uh measurements we have a loads of statistical data
[35:48] of statistical data which we then um
[35:51] which we then um dig into and then feed the um
[35:56] dig into and then feed the um sort of the
[35:58] sort of the results to LLM to summarize them in a
[36:01] results to LLM to summarize them in a verbal way. And we also have a AI um
[36:07] verbal way. And we also have a AI um uh tool here in the dashboards
[36:11] uh tool here in the dashboards where you can actually by prompting
[36:15] where you can actually by prompting asking the same questions that you would
[36:17] asking the same questions that you would be able to look yourself using these
[36:20] be able to look yourself using these different um dashboards.
[36:24] different um dashboards. So there's an easy way if you want to
[36:26] So there's an easy way if you want to have a if you're not that um data driven
[36:30] have a if you're not that um data driven yet as an admin. So you are able to ask
[36:32] yet as an admin. So you are able to ask like okay what do you think why is the
[36:35] like okay what do you think why is the health score dropping down or um what's
[36:40] health score dropping down or um what's the the sort of the highest volume week
[36:43] the the sort of the highest volume week in the past two months and so on.
[36:48] All right. Um and I do have
[36:52] All right. Um and I do have another video
[36:54] another video but this is the last one I believe
[36:58] but this is the last one I believe um
[36:59] um about the trustness uh trustfulness
[37:05] um a case where it can be a student, it
[37:08] um a case where it can be a student, it can be a staff member, whoever
[37:11] can be a staff member, whoever executes a search and gets a result and
[37:15] executes a search and gets a result and then finds out clicks this report an
[37:17] then finds out clicks this report an issue in the AI summary
[37:22] issue in the AI summary explaining
[37:24] explaining why she thinks this this answer is
[37:27] why she thinks this this answer is somewhat uh wrong. There's some issue.
[37:30] somewhat uh wrong. There's some issue. For instance, in this case, um there was
[37:34] For instance, in this case, um there was something missing, some additional
[37:36] something missing, some additional information
[37:38] information the user in the user's perspective and
[37:41] the user in the user's perspective and and and then those informations are um
[37:47] and and then those informations are um stored in Drupal
[37:49] stored in Drupal as a flaggings
[37:52] as a flaggings uh where
[37:54] uh where uh the admin can go and and well getting
[37:58] uh the admin can go and and well getting the notification about someone reporting
[38:00] the notification about someone reporting an issue uh about this AI summary and
[38:03] an issue uh about this AI summary and then um reviewing it and then deleting
[38:07] then um reviewing it and then deleting it if that
[38:09] it if that was handled.
[38:12] was handled. Yep.
[38:14] Yep. >> Yeah. Um
[38:17] >> Yeah. Um now we're getting to what I think is the
[38:19] now we're getting to what I think is the coolest part of the project. Um this is
[38:23] coolest part of the project. Um this is on a surface a help portal. Yes, it is.
[38:26] on a surface a help portal. Yes, it is. Um but that's not the interesting part.
[38:28] Um but that's not the interesting part. The interesting part is that University
[38:30] The interesting part is that University of Helsinki
[38:32] of Helsinki uh decided that knowledge
[38:36] uh decided that knowledge should be accessible and wanted wanted
[38:38] should be accessible and wanted wanted to invest in it and um publishing
[38:42] to invest in it and um publishing content is easy. Making it accessible is
[38:45] content is easy. Making it accessible is not making it usable is not and uh we
[38:49] not making it usable is not and uh we actually have a very impressive road map
[38:52] actually have a very impressive road map that we're getting to um now that we've
[38:55] that we're getting to um now that we've done this and we know what we're going
[38:58] done this and we know what we're going to build on it. But Tommy, it's again
[39:01] to build on it. But Tommy, it's again your show.
[39:02] your show. >> Well, we can summarize it into three
[39:04] >> Well, we can summarize it into three different themes. Um,
[39:08] different themes. Um, becoming a proactive
[39:11] becoming a proactive um
[39:14] um personalization
[39:15] personalization that's in the road map. We have the SSO
[39:17] that's in the road map. We have the SSO in place. Personalization can be many
[39:21] in place. Personalization can be many things. For instance, something small
[39:24] things. For instance, something small like your past searches and so on or
[39:27] like your past searches and so on or then um um building a profile out of
[39:31] then um um building a profile out of your searches. Maybe the to be able to
[39:35] your searches. Maybe the to be able to promote something that you might be uh
[39:39] promote something that you might be uh interested about um or would be very
[39:43] interested about um or would be very important to you.
[39:45] important to you. Um
[39:48] Um the understanding
[39:51] the understanding is key thing.
[39:53] is key thing. That's why we have all those dashboards
[39:55] That's why we have all those dashboards in place uh for different use cases and
[40:00] in place uh for different use cases and also for testing out which one is the
[40:04] also for testing out which one is the most valuable for the for the admins. Uh
[40:08] most valuable for the for the admins. Uh is it the crafts if it are the the the
[40:11] is it the crafts if it are the the the pure um data
[40:15] pure um data uh tables the best one with numbers only
[40:17] uh tables the best one with numbers only or is it the the the AI assistant chat?
[40:22] or is it the the the AI assistant chat? Um
[40:24] Um and then those those AI summarized
[40:28] and then those those AI summarized um trends for instance we saw in the
[40:31] um trends for instance we saw in the dashboard
[40:32] dashboard uh
[40:34] uh explaining that which content if there's
[40:37] explaining that which content if there's a gaps in the content for instance
[40:41] a gaps in the content for instance um yeah that sort of things.
[40:44] um yeah that sort of things. >> Yeah and as you probably have noticed
[40:46] >> Yeah and as you probably have noticed I'm not the technical person he is. So
[40:48] I'm not the technical person he is. So I'm closing this with a commercial point
[40:51] I'm closing this with a commercial point of view. Um this is not just what you
[40:55] of view. Um this is not just what you saw is not just sort of feature road
[40:58] saw is not just sort of feature road map. Um
[41:01] map. Um it's it's it's it's description of a
[41:03] it's it's it's it's description of a system that reacts to questions. Yes. Um
[41:06] system that reacts to questions. Yes. Um but it's it's something that anticipates
[41:09] but it's it's something that anticipates your needs, right? And it it improves
[41:13] your needs, right? And it it improves over time. I find that really important.
[41:17] over time. I find that really important. And That's something that leads to the
[41:19] And That's something that leads to the takeaway, the revelation that I had
[41:21] takeaway, the revelation that I had while uh working on this project. Um the
[41:25] while uh working on this project. Um the organizations and universities that will
[41:27] organizations and universities that will win this race are not the ones with the
[41:29] win this race are not the ones with the most content,
[41:31] most content, they are the ones that actually use
[41:34] they are the ones that actually use clarity as infrastructure.
[41:37] clarity as infrastructure. And when you keep that in mind, you can
[41:40] And when you keep that in mind, you can build amazing stuff. I think this was
[41:42] build amazing stuff. I think this was pretty amazing. It's going to be even
[41:43] pretty amazing. It's going to be even more. So it's it's basically like
[41:47] more. So it's it's basically like Socrates but scales better I guess
[41:50] Socrates but scales better I guess something like that.
[41:52] something like that. >> A lot better than Socrates.
[41:54] >> A lot better than Socrates. >> I think he was really okay.
[41:57] >> I think he was really okay. >> Good.
[41:58] >> Good. >> Yeah. Great. Thank you.
[42:00] >> Yeah. Great. Thank you. >> Thank you.
[42:06] >> Any any questions?
[42:12] Before
[42:17] the project started, what was uh the
[42:19] the project started, what was uh the data research on what type of questions
[42:22] data research on what type of questions the students asked if they're brand new
[42:25] the students asked if they're brand new to the university, midway or ending the
[42:28] to the university, midway or ending the near the end of their their time there?
[42:30] near the end of their their time there? What type of questions they asked?
[42:33] What type of questions they asked? >> You mean the end users?
[42:35] >> You mean the end users? >> Yeah, the students. What what is your
[42:37] >> Yeah, the students. What what is your data on what type of questions they ask?
[42:40] data on what type of questions they ask? have data on that
[42:42] have data on that >> uh
[42:44] >> uh or did you make an assumption of
[42:45] or did you make an assumption of building a system that they would ask
[42:47] building a system that they would ask questions about the university?
[42:49] questions about the university? >> That's a good question. I don't actually
[42:50] >> That's a good question. I don't actually know.
[42:52] know. >> Well,
[42:54] >> Well, we we because we collect everything.
[42:59] we we because we collect everything. We provide answers to to any questions.
[43:05] We provide answers to to any questions. So, we don't
[43:08] So, we don't have an opinion. We don't Don't judge
[43:11] have an opinion. We don't Don't judge them. Whatever they ask is is possible
[43:14] them. Whatever they ask is is possible to ask if it's in the source material.
[43:20] to ask if it's in the source material. >> So what you basically say is that we
[43:23] >> So what you basically say is that we didn't build a machine that answers the
[43:27] didn't build a machine that answers the most questions. We have a list of those.
[43:30] most questions. We have a list of those. >> So we kind of approached with that in
[43:34] >> So we kind of approached with that in mind like any question.
[43:38] mind like any question. >> Yeah. I work with myself at where we are
[43:40] >> Yeah. I work with myself at where we are and my hypothesis assumption is that
[43:43] and my hypothesis assumption is that they have humanistic questions of their
[43:47] they have humanistic questions of their perspective of what they want to study
[43:49] perspective of what they want to study >> whereas your engine is based on the data
[43:51] >> whereas your engine is based on the data of the university. So if they ask like
[43:53] of the university. So if they ask like what would be a good job for me to get
[43:55] what would be a good job for me to get while studying here that will pay me a
[43:57] while studying here that will pay me a good salary.
[43:58] good salary. >> Ah
[43:58] >> Ah >> if you don't have that article you won't
[44:00] >> if you don't have that article you won't be able to apply it.
[44:01] be able to apply it. >> Yes. But that's the that's the thing to
[44:03] >> Yes. But that's the that's the thing to be proactive then uh if we see and and
[44:07] be proactive then uh if we see and and and and measure the um the searches so
[44:10] and and measure the um the searches so the utilization the uses of the uh the
[44:13] the utilization the uses of the uh the service. So based on that information,
[44:15] service. So based on that information, if we see that there's a gap in the the
[44:17] if we see that there's a gap in the the the
[44:19] the content that there's no articles about
[44:23] content that there's no articles about that sort of um discussion for instance,
[44:26] that sort of um discussion for instance, then that needs to be filled in with the
[44:29] then that needs to be filled in with the new content naturally.
[44:31] new content naturally. >> So has this system started that
[44:33] >> So has this system started that conversation within the university?
[44:34] conversation within the university? We're missing this kind of content to
[44:36] We're missing this kind of content to make the AI results better.
[44:38] make the AI results better. >> Yeah. Exactly. Yes. Yes. But I think
[44:41] >> Yeah. Exactly. Yes. Yes. But I think that's
[44:43] that's uh that's quite old school issue even
[44:46] uh that's quite old school issue even with the era before AI's involved or
[44:49] with the era before AI's involved or anything any rack searches were invented
[44:55] uh people used to track from the Google
[44:57] uh people used to track from the Google analytics like okay what searches people
[45:01] analytics like okay what searches people are using and and and then uh not
[45:05] are using and and and then uh not getting results from our site um
[45:10] getting results from our site um the our mentation just uh does it a bit
[45:15] the our mentation just uh does it a bit better because I I don't I I'm not sure
[45:18] better because I I don't I I'm not sure if you noticed but um the dashboard for
[45:22] if you noticed but um the dashboard for the admins it does summarize
[45:26] the admins it does summarize uh
[45:29] uh for each day what was uh the themes that
[45:33] for each day what was uh the themes that were um searched for it and and we also
[45:39] were um searched for it and and we also have in in in the Slack
[45:41] have in in in the Slack channel in each morning there's a
[45:46] channel in each morning there's a there's a automated message u doing the
[45:49] there's a automated message u doing the same thing like okay yesterday's
[45:52] same thing like okay yesterday's search was up uh 45%
[45:57] uh and the the search themes was uh
[46:00] uh and the the search themes was uh these um
[46:04] these um and so on
[46:09] >> we need to d it So if there are any more
[46:12] >> we need to d it So if there are any more questions maybe if the speakers are
[46:14] questions maybe if the speakers are available on the break.
[46:18] >> Yes. Thank you.
[46:20] >> Yes. Thank you. >> Thank you so much.