Snowflake Cortex LLM Features Explained in 13 Minutes

Full Transcript

https://www.youtube.com/watch?v=i0kd-gEPU4Y

[00:00] Broadly, there are two groups of artificial intelligence features.
[00:05] Cortex and machine learning.
[00:07] Cortex is essentially an umbrella term for the collection of features that use large language models or LLMs.
[00:13] These are a type of deep learning model that are trained on vast amounts of text, billions or trillions of words, which give them the ability to predict the next word or token of a given sentence.
[00:24] Machine learning in Snowflake is comprised of in-built ML functions for the now more traditional AI techniques like classification and forecasting and also includes a way to make your own ML functions.
[00:37] Cortex AI SQL is basically about extending the SQL language with AI related functions you can execute anytime you'd use a function normally.
[00:46] There are many available but let's zoom in on one to help us get a better sense of what they're used for.
[00:53] AI_complete has two required input arguments.
[00:56] The first is a string we call the prompt.
[00:59] This is typically a question you'd like
[01:01] This is typically a question you'd like answered, like in this example, but can be used in more creative ways, as we'll see in a moment.
[01:08] The second is a string specifying a model name.
[01:10] This is an LLM you would like to use to generate the output.
[01:12] Snowflake have their own models, but also support LLM from OpenAI, Anthropic, Meta, Mist AI, and Deepseek.
[01:20] Here's the comprehensive list.
[01:23] The choice of which model will affect the output and potentially the price to execute as well with Snowflake's own models being the cheapest.
[01:31] Okay, let's say I've set the model to OpenAI 04 mini and I'm calling the function on its own with the question, what is Snowflake?
[01:40] The output will be a string creating what is technically called a completion and what we'd informally call a response to the question.
[01:47] There are also model parameters we can add to further control the output like temperature which controls the randomness of the output of the model and guardrails which is a snowflake feature which further filters models for potentially harmful responses.
[02:02] potentially harmful responses.
[02:04] In a more practical sense, we could call this model to answer questions about our data.
[02:08] This is an example from the Snowflake documentation where we're using OpenAI's GPT model 4.1 to summarize text for each value of a reviews column from a table called reviews.
[02:19] The bulletointed summary could be used downstream to populate a new table column, making it easier to pass or analyze long sections of free text like a review.
[02:30] Snowflake Copilot is an LLM powered SQL assistant hosted by Snowflake which you can chat with in a panel on the UI.
[02:37] There's also a preview feature allowing you to use it in line with SQL code, but we'll focus on the chat interface for now.
[02:45] It's primarily used for generating SQL code for analysis from questions in plain English.
[02:52] In this example, the query generated by copilot is identifying the individual or organization that appears most frequently as a participant in US patent filings.
[03:00] And the interesting part is
[03:03] filings.
[03:03] And the interesting part is that in theory, it understands the context of your data.
[03:07] So it knows the database structure, the tables, the columns, and other objects.
[03:11] This is good for writing useful queries, but also for anyone just wanting to explore the data in an account.
[03:18] However, it is worth noting that it currently can't generate cross database or cross schema queries.
[03:24] It can also suggest improvements to SQL queries you've already written, so making them more performant or concise.
[03:30] And more generally, it can help you discover Snowflake features from the documentation and give a description of how to implement them.
[03:37] It's comparable to similar features on other products like Datab Bricks Assistant and Microsoft Fabric Copilot.
[03:44] If there is SQL generated, you can either click run, which will add the query to a worksheet, and it will automatically run from there, or add, which adds the query to a worksheet, so you can run it later.
[03:55] Another thing to note is that it relies on object comments to understand your data, like table column comments.
[04:01] And because this is not reliably filled for many organizations, some developers have
[04:04] organizations, some developers have found Copilot can be a bit limited in understanding your data out of the box.
[04:11] Document AI is used to extract data such as names, dates, or even checkbox values from unstructured documents like PDFs.
[04:19] To understand how this is achieved in Snowflake, let's refer to their document AI overview flowchart.
[04:25] The first step is to create a new model build.
[04:27] This is done through the UI under the AI and ML heading on a tab called document AI.
[04:32] A model build represents a single type of document you'd like to extract data from.
[04:35] It's here you'll use Snowflake's proprietary LLM called Arctic-T to extract data from that type of document.
[04:43] The next step in the process is to point your model build to some sample PDFs you previously uploaded to a stage.
[04:47] We then ask questions.
[04:51] This is a process of defining questions used to extract a specific type of data from that document type.
[04:59] Here's a screenshot from the Snowflake documentation where we're training on PDFs containing food
[05:06] we're training on PDFs containing food truck inspection report PDFs.
[05:08] On the truck inspection report PDFs.
[05:10] On the right, you'll notice a series of questions.
[05:12] These are essentially prompting the model to go and find the answers in the PDF using optical character recognition.
[05:16] Each question defined corresponds to a bit of information we want to extract from that document type.
[05:22] The model returns an answer along with a confidence score indicating how likely it is that the answer is correct.
[05:29] 0.9 means 90% confidence.
[05:34] Once this is done, you can check if the answers it has produced on your sample PDFs are accurate.
[05:40] If not, you can review the questions and answers, manually correcting answers which are wrong, and then optionally fine-tuning the foundational model so it's better at extracting for your type of document.
[05:51] If you're happy with the zero shot or first iteration, or happy after it's been fine-tuned, we can then publish the model build.
[05:56] This allows us to use the model in our SQL code using the exclamation point predict syntax.
[06:05] We provide the model build name, the
[06:07] We provide the model build name, the exclamation point predict syntax, then exclamation point predict syntax, then assigned URL of the stage object which assigned URL of the stage object which stores our PDFs.
[06:14] We'll cover this file function in greater detail later in the function in greater detail later in the course.
[06:19] And then an optional model build version.
[06:21] Here's a bit of example code showing how you can call a model build showing how you can call a model build called inspection reviews on PDF stored in an internal name stage.
[06:29] This predict function will return a JSON document detailing the confidence scores and the values extracted which you can then further process.
[06:38] Okay.
[06:41] Cortex fine-tuning.
[06:41] This is a way to take the output of a larger model and use it to train a smaller model which in theory will be quicker, cheaper, and better at doing a specific task.
[06:50] Let's say we have a table with millions of social media posts.
[06:54] We'd like to derive a new column with each post sentiment, the emotional tone of a piece of text.
[07:01] We could call the cortex function AI complete on millions of rows using a very large accurate model.
[07:07] However, this would be quite costly.
[07:07] So,
[07:09] However, this would be quite costly.
[07:11] So, the idea behind cortex fine-tuning is that we first create a sample data set using a large model.
[07:15] In this case, labeling a subset of the posts with sentiment and then use the output to create a fine-tuned smaller model tailored to doing that specific task.
[07:23] We can then use that to score sentiment on additional post with higher accuracy than the smaller base model would have done if not fine-tuned.
[07:31] Let's break down how this might be achieved in code.
[07:36] Step one would be to use the larger model to generate accurate sentiment labels for a sample of posts.
[07:40] This code reads the social post table and using the AI complete function with a prompt to generate sentiment will generate a new table of 10,000 rows with sentiment calculated.
[07:52] Step two would be to create a fine-tuning job.
[07:57] The fine-tune function allows you to create and manage LLMs customized for a specific task.
[08:02] It can be used to create a fine-tuning job like we can see here, but also used to check the status of a tuning job, which can take a while to complete with show and
[08:09] take a while to complete with show and describe, and stop a fine-tuning job.
[08:12] describe, and stop a fine-tuning job with cancel.
[08:15] To create a job, we pass in the name of the output model we can use once complete, the base model to fine-tune, and a training data query.
[08:21] This is how we select the labeled post we created in the previous step to fine-tune with.
[08:27] Step three would be to use the fine-tune model instead of the larger model to much more efficiently analyze the millions of posts.
[08:33] To give you a sense of the savings possible, the Claude 4 Opus model we used initially to calculate sentiment which is a huge model currently cost 12 credits per 1 million tokens.
[08:43] And the fine-tuned smaller model was Mistral 7B which is currently 0.12 credits per 1 million tokens.
[08:50] That's 100 times less.
[08:55] Okay, let's say you have reviews for a product you've just released.
[08:59] You want to be able to search these for specific topics, like whether damage has occurred during shipping or how someone feels about the product's texture or usability.
[09:05] With traditional keyword search, you'd only find reviews that
[09:10] search, you'd only find reviews that include the exact words you search for.
[09:12] include the exact words you search for.
[09:14] So, if someone searches for shipping damage, they might miss reviews like these, which are clearly relevant because they talk about things being broken or torn, but don't contain the exact phrase shipping damage.
[09:16] damage, they might miss reviews like these, which are clearly relevant
[09:17] these, which are clearly relevant because they talk about things being
[09:19] because they talk about things being broken or torn, but don't contain the
[09:21] broken or torn, but don't contain the exact phrase shipping damage.
[09:23] exact phrase shipping damage. This is where Cortex search comes in.
[09:27] where Cortex search comes in. Not only does Cortex search use keyword search,
[09:29] does Cortex search use keyword search, it also uses semantic search.
[09:32] it also uses semantic search. This is about looking at the meaning of your search query, not just the specific words.
[09:34] about looking at the meaning of your search query, not just the specific
[09:35] search query, not just the specific words. To do this, it uses something called embeddings.
[09:38] words. To do this, it uses something called embeddings.
[09:41] called embeddings. Embeddings are numeric representations of text which can be compared and searched using maths rather than the exact words.
[09:43] numeric representations of text which can be compared and searched using maths
[09:46] can be compared and searched using maths rather than the exact words. Imagine turning each review into a set of numbers, a vector that captures its meaning.
[09:49] rather than the exact words. Imagine
[09:51] turning each review into a set of numbers, a vector that captures its
[09:53] numbers, a vector that captures its meaning. The same goes for your search query.
[09:55] meaning. The same goes for your search query. These vectors are then compared using something called vector search.
[09:58] query. These vectors are then compared using something called vector search. If two pieces of text have similar meanings, their vectors will be close together.
[10:01] using something called vector search. If
[10:02] two pieces of text have similar meanings, their vectors will be close
[10:04] meanings, their vectors will be close together. So even if the words aren't the same, if the meaning is similar, cortex search can still return those
[10:06] together. So even if the words aren't the same, if the meaning is similar,
[10:08] the same, if the meaning is similar, cortex search can still return those
[10:10] cortex search can still return those results.
[10:13] This is often referred to as a type of fuzzy search, but it's more advanced because it's based on context and meaning, not just partial word matches or typos.
[10:21] To implement this in Snowflake, we create a Cortex search service.
[10:26] This command indexes the review text column for semantic vectorbased searching.
[10:31] Attributes allow you to filter results based on certain columns.
[10:33] Here we're using product ID, customer ID, and review date.
[10:37] Target lag specifies that we want the index refreshed every 30 minutes to stay up to date.
[10:45] And here we're setting Snowflake's default embedding model for turning text into vectors.
[10:50] Once the search service is up and running, Snowflake exposes a REST API endpoint where you can send post requests to run the semantic searches.
[11:00] This returns a JSON payload containing the results of your query.
[11:05] The second option is to query your service using the Snowflake Python SDK.
[11:10] service using the Snowflake Python SDK.
[11:12] This will produce a similar JSON response that you can further process downstream.
[11:18] At its core, the Cortex Analyst feature is a REST API fully managed by Snowflake that let you convert plain text questions about structured data in your account into SQL commands which it returns the results of.
[11:31] We can use it to build a chat interface for our data stored in Snowflake.
[11:35] Especially useful for enabling non-technical business users to do analysis.
[11:39] Let's lay out the components at a high level.
[11:41] At the bottom is the structure data we'd like to ask questions about stored in Snowflake as tables.
[11:48] Next, we have the Cortex Analyst REST API which sits between the table data and the front-end application.
[11:52] Although you could plug any application into the API via the publicly exposed endpoints, the example most often used by Snowflake is Streamlit.
[12:04] Streamlit is an open-source tool that lets you easily build interactive web apps using Python, which Snowflake natively supports for creating web apps on the Snowite UI.
[12:13] Web apps on the Snowite UI.
[12:15] In this stream application code, you call the Cortex analyst REST API via the get analyst response function and provide two bits of information.
[12:22] One is the semantic model, which bridges the gap between the data definition as it appears in Snowflake metadata and how a business user might phrase what they need.
[12:31] This is a YAML document detailing tables, columns, and other elements that analysts can access to generate queries in response to natural language questions.
[12:41] The second bit of information is the prompt or chat history, what the user wants to know.
[12:48] The API will return some JSON containing the response and the SQL used to generate it, which can then be shown in the Streamlit app.
[12:57] Okay, that's it for a quick introduction into all the Cortex features.

https://www.youtube.com/watch?v=i0kd-gEPU4Y

Summary

TL;DR — Snowflake Cortex offers a suite of AI features, primarily leveraging Large Language Models (LLMs), to enhance data analysis and interaction within the Snowflake ecosystem. These features include AI SQL functions for text generation and data summarization, a SQL assistant called Snowflake Copilot, Document AI for extracting information from unstructured documents, fine-tuning capabilities for custom models, semantic search for meaning-based data retrieval, and Cortex Analyst for converting natural language questions into SQL queries.

Key points

Cortex AI SQL extends SQL with LLM-powered functions like `AI_complete` for tasks such as answering questions, summarizing text, and generating creative content, supporting various models including those from OpenAI and Anthropic.

Snowflake Copilot acts as an LLM-powered SQL assistant, generating SQL code from plain English questions, suggesting query improvements, and helping users explore Snowflake features, though it relies on object comments for data understanding.

Document AI utilizes Snowflake's Arctic-T LLM to extract specific data points (like names, dates) from unstructured documents (e.g., PDFs) by defining questions and optionally fine-tuning the model for improved accuracy.

Cortex fine-tuning allows users to train smaller, more efficient models on specific tasks by using larger models to label data, significantly reducing costs for repetitive analyses like sentiment scoring.

Cortex Search enables semantic search by converting text into numerical embeddings, allowing for meaning-based retrieval of information rather than relying solely on exact keyword matches, improving search relevance.

Cortex Analyst provides a REST API that translates natural language questions about structured data into SQL queries, facilitating data analysis for non-technical users, often integrated with front-end applications like Streamlit.

Takeaway — Snowflake Cortex empowers users to interact with and analyze data more intuitively and efficiently through a comprehensive set of LLM-powered features integrated directly into the Snowflake platform.

Full Transcript

Summary

Key points

Cite this page