Turn any document into LLM-ready data! Microsoft released MarkItDown a lightweight Python library that converts any document to Markdown for use with LLMs. Key Features: • Converts PDF, Word, Excel, PPT, images, audio to markdown • Extracts EXIF, OCR, and transcripts automatically • Available via CLI, Python API, or Docker • Offers LLM-based image descriptions • Supports batch conversions It's 100% Open Source Link to the Github Repo in the comments! If you're into ML, LLMs, RAG, and AI Agents, I share AI apps and tutorials every week. Subscribe to AI Engineering (it's free): https://lnkd.in/gfkzKZYk
Thanks for sharing, They also have the Maritdown MCP server: https://github.com/microsoft/markitdown/tree/main/packages/markitdown-mcp
Thanks for sharing Sumanth P. I am looking for these kinds of open source libraries
Super. That's really interesting.
Tools like this make working with unstructured data so much more accessible. Appreciate the share, Sumanth.
I was thinking of finding out something like this, it's going to be used soon. Thanks for sharing
Good Job to post such information - Sumanth P . keep it up!!
This tool looks like it can simplify workflows significantly. Exciting times for developers needing quick Markdown transformations. Open source is a big win too.
Nice! Thanks for sharing Sumanth
Machine Learning Developer Advocate | LLMs, AI Agents & RAG | Shipping Open Source AI Apps | X (70K+)
4dLink to the Github Repo: https://github.com/microsoft/markitdown