# Microsoft MarkItDown + Hermes Agent: The Ultimate Local Content Engine

https://www.youtube.com/watch?v=xWu_6IFPwtM

[00:00] Have you ever struggled to feed complex PDFs or excels into your local AI agents?
[00:05] Microsoft has just dropped a lightweight tool called market down that perfectly converts almost any file into a markdown format for all local LLM models.
[00:11] Today we are going to see how exactly it works and using it I'm going to build a local content engine that will run completely free of cost on your own machine.
[00:18] Welcome back to build with AI series where we cover local AI projects using free tools.
[00:21] In this video the main focus is a free tool called market down which is a very simple lightweight python utility released by the Microsoft team that specifically changes different files into the markdown format.
[00:32] If you don't know what markdown format it is just an .md file which is same as a text file but it has formatting options as well.
[00:37] Like this readme file that you are seeing is also a md file.
[00:41] It has general text but it has some text formatted also like this is in purple format and some links are also color coded.
[00:48] Some sections are marked with this vertical bar for important tips.
[00:49] So we can do all kind of formattings in this markdown format and it is also easy to understand for local AI models as well.
[00:55] As a final file you will be sending would be a very tiny size like these md files which are just
[01:00] in KB's which is really small compared to the large size PDFs or document formats.
[01:06] So your final raw data that you will be feeding to the AI would be clean and much more reliable.
[01:09] You might have used some other extraction tools like extract from the Amazon AWS library but this one focuses heavily on keeping the actual structure of the document intact.
[01:16] So things like headings, lists, tables, links will stay completely intact so the AI knows how to exactly read them.
[01:23] While the final formatted text which is this md file looks decent enough for human to read as well.
[01:28] Like here we are reading this readme file which is really well formatted with all the headings, subheadings, code formattings, color codes for the tips and links as well.
[01:35] It is human read as well but it is really built for the text analysis tools behind the scenes.
[01:39] So it is meant to be readily consumed by these tools and acts like a perfect bridge between your messy files and your AI automation pipeline.
[01:46] The best part is how many formats it supports out of the box without any extra coding.
[01:49] You can throw in standard office documents like PDF, PowerPoint, Word, Excel or basic developer formats like images, audio, HTML, CSV, JSONs and everything will be handled smoothly without breaking a sweat.
[01:57] It even works
[02:01] natively with the image data like EXIF metadata or the OCR data, which is optical character recognition.
[02:06] So, it can pull out the hidden file details using the OCR tools and actually read and extract the text inside your pictures.
[02:11] Then, if you have audio files, it can transcribe the speech directly for you, so you don't have to install any other library just to get the text from the audio clips.
[02:18] It doesn't even stop there.
[02:19] You can pass on direct website URLs or even the YouTube URLs and can also dig through the compressed file formats like zip files.
[02:25] So, you just need to give your URL or any of the file format that you have and it will directly pull out all the text and format them nicely in a markdown format.
[02:34] You might wonder why it uses a specific format instead of just plain text.
[02:36] The markdown is actually very close to the plain text, but it provides a way to actually represent your entire document structure, keeping your whole layout intact without any extra heavy formatting to get in the way.
[02:46] So, the file sizes remain small while your entire layout, information, and text remains intact.
[02:51] Most modern AIs like ChatGPT are also natively built using this file format.
[02:55] So, when you feed them any markdown file, they understand the layout perfectly without getting confused by any code formatting or weird
[03:02] color stylings.
[03:04] And this specific file format is also highly token efficient, so it saves you money and processing power when you're feeding your data to the AI models.
[03:10] Compared to sending raw HTMLs or the massive text or file dumps, this format uses far fewer tokens, making your API calls cheaper.
[03:17] And using this tool is also incredibly straightforward.
[03:20] You just need to pass on simple commands in your terminal, give it your input file location, and it will spit out the complete 3D markdown document immediately.
[03:26] By default, the package is lightweight, but you can install specific add-ons if you want.
[03:30] For example, if you only want to process PDF, doc file, or the PPT files, you can just install the tools required for those file formats.
[03:37] So, only the pieces required for your file formats will be installed, so the tool becomes even more lightweight.
[03:42] There are also specific dependencies for heavier tasks like you can enable the audio transcription tools if you're only working with the voice recording, so your computer only loads what you you actually using.
[03:51] The tool also supports third-party plugins, but they are disabled by default.
[03:54] But, if you want to install them, you can search markdown plugin on GitHub and install any specific plugin then use the list plugin command to see all the plugins that you have and directly enable them.
[04:03] and use it on any of your files.
[04:05] Keeping these plugins independent and separate makes the base tool secure and incredibly fast while letting you add extra powers.
[04:10] For example, if you want to add some vision tools, you can directly add those plugins.
[04:14] A great example is the markdown OCR plugin that you can enable, which will let you pull out the text from the embedded images in the doc, PDF, PPT, or the Excel files using the LLM vision and use any of your OpenAI compatible client and directly send that image data to the OpenAI API and get the reliable text back.
[04:30] Instead of installing any heavy machine learning libraries on your own computer, you can just send out that image data to the APIs directly.
[04:35] If you're dealing with really messy or complex files, there are also dependencies that you can add for higher cloud processing using the Azure systems content understanding.
[04:43] It will be a paid service, but it can handle tricky document layouts and extract specific fields for you automatically.
[04:50] This cloud option is for you if you want some multi-model support that can do higher quality conversions with documents, images, audio, video with a single access point.
[04:57] So, it can route every request to the right processor without you doing anything.
[05:00] But, if you want to do it locally on your own
[05:03] system, you can just use a git clone command and install all the local packages of markdown on your own system.
[05:08] And once we use this and get our clean text, I'm going to feed that directly into an AI agent.
[05:12] And for that, I'm going to be using the Hermes agent, which you can actually use it directly in your Python codes instead of just using it in the terminal like Open Clo.
[05:20] For installing Hermes also, you just need to use this kind of pip install command.
[05:23] And then for using the AI agent in Python, we just need to import it from the run agent module.
[05:27] For our AI brain, you can use any of the model inside it, but for my project, I'm going to be using a free local model, which is Granite 4.1, which was recently released by IBM having excellent instruction following and really fast processing speeds on any of your user queries.
[05:41] I've already covered this tool in detail.
[05:42] You can check out the Build with AI series where you can check this video or click the link on the top AI icon.
[05:45] Now, to set up the project, I'm going to be using Google Ant Gravity as my Python IDE.
[05:51] If you don't have it in system, just go to antigravity.google/download and download the installer according to your platform.
[05:55] Recently, it is upgraded to the 2.0 version, which I had already covered in my earlier video.
[05:59] Then, for running the local LLM models, I'm going to be using Ollama.
[06:02] So, just go to
[06:03] ollama.com/download, use the directory in your terminal, or download the software installer.
[06:09] So, now I've set up my project in Google Ant Gravity.
[06:10] If you want to make it from scratch, just go to the file menu and open folder and select any folder as your working directory.
[06:16] Then, you can copy all the codes and the readme file that I've given in the GitHub link in description and pin comments.
[06:20] Follow the installation steps in this readme file, and you can directly run the same project.
[06:24] And for the technical stack of my project, I'm going to be using Microsoft Markdown.
[06:28] LMS Agent will be used for agentic orchestration, Ollama for running local LLM models, and Python as our core programming language.
[06:34] Using this tool, we are going to be building a content engine that can help any content creator to build content for multiple platforms.
[06:42] You just need to pass on your YouTube URL, we will use the Markdown tool to fetch the full transcript of the video.
[06:48] LMS Agent will then use that text to build the content writing using a local LLM.
[06:52] And finally, we are going to produce a Markdown report, which will have a blog post, a five-part Twitter thread, and a ready-made LinkedIn post.
[06:59] To set up the project, you need to ensure that you are in current working directory.
[07:03] So, just open the terminal by
[07:04] expanding this pane and check that the working directory that you have here matches the folder in your explorer as well.
[07:10] If it is not matching, then you can just use the CD command, which is change directory, and point it to the correct folder.
[07:15] Then, we are going to be making a virtual environment in Python to install all of our required libraries in a separate environment, so it does not make any conflict in our earlier projects.
[07:25] So, I'm going to be making a virtual environment using this V and V command and making the new environment variable in the same name.
[07:31] And with the second line, we are going to be activating this virtual environment.
[07:35] Everything that you will be doing next will be done in this virtual environment only.
[07:39] So, I'll copy these and paste it in the terminal.
[07:41] So, now you see a new folder called V and V is made here and anti-gravity is asking me that do you want to select this environment for your workspace folder?
[07:48] So I'll just press yes and now I see that this V and V is activated here.
[07:52] Now to install all the dependencies I just need to use this pip install command.
[07:58] I have listed all of my requirements in this requirements.txt file.
[08:00] First is the markdown down tool and second is our Hermes Agent library.
[08:02] So I'll just copy this command and paste
[08:06] it in my terminal to install both of those libraries.
[08:08] Now while these are installing we can also install the Granite 4.13B model.
[08:09] You just need to copy this query and paste it in the terminal.
[08:15] Once it is installed you can also verify by typing Ollama list and you will see all of your list of models that you have currently installed in Ollama.
[08:23] So here I will see that the Granite 4.13B is listed as well.
[08:26] So now for the core Python logic I have built this app.py file.
[08:28] First I'm going to be installing all of our required libraries which is markdown down and AI agent from the Hermes Agent.
[08:34] Then while calling the file I'm going to be asking the user to give the arguments for whatever YouTube video they want to be analyzed.
[08:42] But if they are not giving any argument then I'll just say that there is a missing YouTube URL.
[08:45] For the usage of Python file you should give first python app.py and then the your YouTube URL.
[08:49] Then once the user gives the URL we'll just fetch it from the system.argv function.
[08:55] And step two is to extract the transcript using the markdown tool.
[08:57] I'm going to be using the markdown module from the markdown library.
[09:01] Then using md.convert you can convert your YouTube URL directly.
[09:05] Then from the resultant
[09:07] variable I'm going to be extracting the text content.
[09:10] If I do not find any transcript or or the transcript has just the white spaces which we can remove using dot strip, it'll just say that the extracted transcript is empty.
[09:16] And in case of any other error we are going to be using try and accept blocks and just print that markdown failed with this extract transcript error.
[09:23] Then once we have the transcript successfully extracted we are just going to print that we have the transcript ready with this amount of characters and then we are going to be passing on this transcript to the Hermes Agent.
[09:35] For any instruction to any AI model you need a prompt generated.
[09:38] So, in the prompt I'm just going to be giving that here is a transcript of the video, then adding some new line characters and a border, and then I will specify all my instructions that you are an expert social media manager.
[09:49] Using the transcript above, you must output three things separately by headers.
[09:53] First would be the SEO blog post, second is a five-part Twitter thread, and third is a LinkedIn carousel script.
[09:58] Then I'm asking you to format the output with these exact headers.
[10:02] First is the SEO blog post and five-part Twitter thread and the LinkedIn carousel script.
[10:05] Then we are going to be feeding this prompt
[10:07] to our local LLM model using Ollama.
[10:09] So, the provider is Ollama and the model is Granite 4.13B, and our AI agent is the AI agent class defined in the Hermas agent tool.
[10:16] Then using agent.chat, you can just pass on your prompt and get the response in this response variable.
[10:21] Again, in case of any error, we are enclosing it in try and accept blocks, and we are going to be exiting from the system in case this module for fails.
[10:29] Last step would be to extract the output in this generated content.md file.
[10:33] You're going to be creating this new file in a right mode, and using f.write, you can just write your response in this file directly.
[10:39] Once it is generated, we are going to say success, generated content is saved to this [music] output file.
[10:44] And in case of any error, we are going to be saying that we got some error saving the final output with this exact file error.
[10:49] Then once the Python file is called, we are just going to call this main function that we have defined above.
[10:54] Now everything is ready and set up.
[10:55] We just need to run this file with any of our YouTube URL.
[10:59] For me, I'm going to be using my own video, which is this HRN1B, where I had covered a 1 billion parameter model and tested it on some math tasks against the other
[11:08] open-source models like Google Gemini, IBM Granite using the Ollama library.
[11:12] So, I'm just going to be copying the URL of this video, then go to the Antigravity terminal, and sure that I'm in the virtual environment only, type python app.py, which will call it which will call this Python file and paste my URL here.
[11:23] So, this will call my main function on this YouTube URL.
[11:25] So, just press enter.
[11:27] So, now it is converting the YouTube video using markdown.
[11:30] Transcript are successfully extracted.
[11:32] Next, it is preparing the local Ollama models.
[11:34] Then the AI agent is initialized with this model.
[11:36] It is using the localhost URL, which is the Ollama local URL, to process any local AI requests.
[11:43] It is saying that the API key pair is invalid or missing because we are not using any APIs to call the AI model, which is fine.
[11:49] There are no tools selected and loaded.
[11:50] Then it has given the context window limit and it is starting the conversation.
[11:53] So now it is making all the calls to the AI model and finally it has generated the content and saved to the generated content.md file.
[12:02] Now you can directly open this file and check the output results.
[12:04] It has given the title that can I tiny AI beast HRM
[12:08] text 1B can handle mathematical reasoning, which is correct.
[12:11] It has given the meta description of my video.
[12:13] Then the content outline is also given from the video introduction, model basics, how to use a prompt act technique in that model.
[12:21] Then I had set up a project, which only uses CPU to run that 1 billion parameter model.
[12:25] Then I had tested my project, I had shown the result and there was a final conclusion and the resources links.
[12:31] So it has given the full content outline of my video.
[12:33] Then it has given the five-part tweet thread as well, where it has given the introduction that I have just ran a mathematical showdown testing against the IBM Granite model using the HRM text 1B model.
[12:45] Then it has given the deep dive key takeaway, the results and how to try it yourself.
[12:49] Finally is the LinkedIn carousel script, where it has given the slide wise title, image captions, contents, video clips and finally the resources and next steps of the end slide for call to action.
[12:58] So this is how you can build an entire AI agent pipeline that can just take a YouTube URL, extract the full transcript using the markdown text library, pass it on to the Hermes agent, which will in
[13:09] Turn pass on your request to the local Ollama models to finally generate a full content report, which you can use it on your social media platforms.
[13:17] Everything is run locally, there is no API cost involved, there is no data privacy leak and there are no risk of any usage limit restrictions as we are using complete local AI model, so you can run this infinite number of times.
[13:29] So, we can also discuss the use cases where you can use this kind of tool and the project setup.
[13:34] As you can turn long-form video into the direct blog posts and the social media snippets that we have done like now.
[13:40] Or you can create written transcripts, high-quality blog summaries of your competitor relevant videos, so you can see how they are driving the search traffic on them.
[13:47] From a single reference video, you can generate weeks of social media posts, so you can show an active social media presence.
[13:53] You can also convert educational videos and lectures to direct structured text summaries and review slides that then you can use on and let's say your live classroom or your final notes.
[14:03] Finally, you can build some scalable pipeline with your entire social media copywriting and content pipelines are there with the input as your YouTube.
[14:10] video.
[14:11] And on top of this project, you can also add some more features like.
[14:13] Let's say if you want multiple transcripts in different languages, you can add some tool to translate the transcript and the output post into your target languages.
[14:20] Then in your final report, you can also choose some different writing style like technical, playful, or persuasive according to your target audience.
[14:27] Then you can integrate APIs like Twitter, LinkedIn, and directly post your generated content into the social media platforms.
[14:33] Now, let's say if you just upload a video, your YouTube captions can be missing.
[14:38] So, instead, you can add some local whisper model that can serve as a fallback model in case the YouTube transcripts are not readily available.
[14:45] Then you can add some Streamlit UI dashboard as well using Python libraries where you can directly paste the URLs, edit your drafts, or view the final visual carousel sites in dashboard directly.
[14:55] So, that is how you can extract any data from any complex files and feed it directly to your custom AI agents.
[15:01] All the codes and the readme file with the step-by-step instruction on how to set up the project will be getting the GitHub link in description and pinned comments, so you can grab them and get started easily.
[15:07] Let me know
[15:11] In comments if you have some suggestions for me for building more local AI projects and follow the channel for more such AI builds updates and workflows.
