diff --git a/packages/content/ref/pdf-to-images/resources/research.md b/packages/content/ref/pdf-to-images/resources/research.md new file mode 100644 index 00000000..e6c4d341 --- /dev/null +++ b/packages/content/ref/pdf-to-images/resources/research.md @@ -0,0 +1,136 @@ +# **Analysis of Open-Source Image to Markdown Conversion Tools with Table and AI Support** + +**1\. Introduction** + +The conversion of visual information into text-based formats like Markdown is an increasingly important capability for a variety of applications. These include the creation of accessible documentation, efficient knowledge management systems, and the integration of content with large language models for further processing and analysis. The user's specific requirements for tools in this domain are multifaceted, focusing on open-source availability, seamless integration with TypeScript projects, the accurate conversion of images to Markdown with particular attention to tabular data, and the utilization of artificial intelligence for enhanced accuracy, provided that the AI is either freely accessible or can be hosted locally. This report aims to analyze the provided research material to identify and evaluate open-source tools that align with these stringent criteria. The investigation will delve into the features, capabilities, and limitations of each potential solution, ultimately providing a comprehensive overview to aid in the selection of the most suitable tool. + +**2\. Detailed Analysis of Potential Open-Source Tools** + +* **2.1. LlamaOCR** + * **Description:** LlamaOCR is presented as an open-source Optical Character Recognition (OCR) library built upon the Llama 3.2 Vision model, with the primary function of transforming images into Markdown formatted text.1 This tool aims to simplify workflows by enabling the direct conversion of visual content into an easily editable and structured format. + * **TypeScript Integration:** A significant advantage of LlamaOCR is its availability as an npm package, installable via the command npm install llama-ocr.1 This method of distribution directly facilitates its integration into JavaScript and TypeScript-based projects, aligning with the user's preference for TypeScript-friendly solutions. The ease of installation and usage within existing TypeScript environments lowers the barrier to entry for developers in this ecosystem. + * **AI Capabilities (Free with API Key):** LlamaOCR leverages the capabilities of the Llama 3.2 Vision model, which is accessible through a free endpoint provided by Together AI.1 To utilize this feature, users are required to register on the Together AI platform to obtain a free API key. While the core AI functionality is offered without direct cost, the dependency on an external service introduces a point of consideration. The "free" tier of Together AI, while allowing for initial use, may have limitations such as rate limits on the number of requests or the volume of data processed within a specific timeframe, as indicated by information regarding Together AI's pricing and rate limits.4 These restrictions could potentially impact the scalability and sustained use of LlamaOCR for users with high-volume processing needs. + * **Table Support:** LlamaOCR is described as being proficient at extracting text from images, even those with complex layouts such as tables.1 A key feature highlighted is its "Markdown-First Design," which suggests that the tool aims to directly output the recognized text in Markdown format, preserving the original formatting and structure of the image, including attempting to represent tabular data using Markdown table syntax.1 This implies that LlamaOCR is designed to identify rows and columns within an image and translate them into the corresponding Markdown table structure using pipes (|) and hyphens (-). However, the research material lacks specific examples demonstrating the accuracy and handling of various table complexities, such as merged cells or multi-line content. + * **Languages:** Given its distribution as an npm package, LlamaOCR primarily supports JavaScript and TypeScript environments.1 The underlying Llama 3.2 Vision model likely possesses multilingual OCR capabilities, but this is not explicitly detailed within the context of the tool's documentation in the provided snippets. + * **Getting Started:** The initial setup for LlamaOCR involves a straightforward two-step process: installation via npm install llama-ocr and obtaining a free API key from Together AI.1 Subsequently, within a project, the ocr function can be imported to initiate image processing.1 + * **Example Usage:** A provided code snippet illustrates the basic usage of the ocr function, which takes the path to the image file and the Together AI API key as arguments. The function then asynchronously processes the image and returns the extracted text in Markdown format, which can be logged to the console.1 + * **Advanced Use Cases:** The documentation suggests potential for more advanced applications, including automating OCR tasks for multiple files through batch processing and integrating LlamaOCR into web applications to allow users to upload images and receive instant Markdown conversions.1 + * **Roadmap:** Future development plans for LlamaOCR include adding support for OCR on local and remote images, as well as for single and multi-page PDF documents. Additionally, the tool aims to provide output in JSON format alongside the current Markdown output, offering greater flexibility for data processing and integration.2 The current limitation to image file formats might be a constraint for users needing to process PDFs, but the planned expansion indicates ongoing development. + * **Repository Link:** The primary GitHub repository for LlamaOCR appears to be [https://github.com/Nutlope/llama-ocr](https://github.com/Nutlope/llama-ocr).3 It is also worth noting the existence of other related repositories that utilize the Llama vision model for OCR, such as [https://github.com/MinimalDevops/llama-ocr](https://github.com/MinimalDevops/llama-ocr) 10, which offers a Python-based OCR assistant using Streamlit and Ollama, and([https://github.com/yYorky/LlamaOCR](https://github.com/yYorky/LlamaOCR)) 11, which focuses on invoice processing and outputs data in CSV format, potentially indicating table extraction capabilities. + * **Example Markdown Output:** While the documentation consistently mentions that LlamaOCR outputs in Markdown format 1, the provided research material does not include a specific example demonstrating the conversion of an image containing a table into a Markdown table. Snippets 1, and 2 offer general context about the tool, and 42 shows a Markdown table from a different project (llama\_parse), while 43 presents an example generated by ChatGPT. The absence of a direct LlamaOCR example for table conversion makes it challenging to definitively assess its effectiveness in this crucial aspect. + * **Free Tier Limits (Together AI):** The research material provides information on the free tier of Together AI, the service that powers LlamaOCR's AI capabilities.4 This tier includes rate limits on requests per minute (RPM) and tokens per minute (TPM) for various models, including image models. Specifically, the free tier for image models has a limit of 60 images per minute, with a lower limit of 10 images per minute for the FLUX.1 \[schnell\] model.4 Users should be aware of these limitations, as exceeding them may require upgrading to a paid tier. +* **2.2. Marker** + * **Description:** Marker is presented as a Python-based tool designed for the rapid and accurate conversion of a wide range of document formats, including both PDF files and images, into Markdown, JSON, and HTML.12 It is highlighted for its ability to handle complex formatting, including tables, and offers the option to enhance accuracy through the use of Large Language Models (LLMs). + * **TypeScript Friendly:** Although Marker is primarily developed in Python, it can be integrated into TypeScript projects by executing it as a separate process and then consuming its output, which can be in Markdown or JSON format.12 The research material does not indicate the existence of direct TypeScript bindings or a dedicated TypeScript API for Marker. Therefore, while it can be used in conjunction with TypeScript projects, it does not offer the same level of direct integration as a native TypeScript library. + * **AI Capabilities (Free/Local with Ollama):** A significant feature of Marker is its optional integration with LLMs to improve conversion accuracy, particularly for tasks such as table merging and form extraction.12 Notably, Marker supports the use of locally hosted LLMs through its integration with Ollama.12 This directly addresses the user's requirement for AI that can be hosted locally, offering a free and private alternative to cloud-based AI services, provided the user has the necessary computational resources to run Ollama. Marker also supports cloud-based LLM services like Gemini, which requires an API key, as well as other options like Google Vertex, Claude, and OpenAI.12 + * **Table Support:** Marker is described as being highly proficient in formatting tables extracted from various document formats, including images within PDFs.12 It features a dedicated TableConverter class specifically designed for extracting and converting tabular data.12 This converter can output tables in HTML format and, with the output\_format=json setting, can also provide cell bounding boxes, offering detailed structural information about the extracted tables.12 This specialized focus on table handling suggests that Marker may offer a higher degree of accuracy and flexibility in converting image-based tables to Markdown compared to more general OCR tools. + * **Languages:** Marker is primarily written in Python.13 + * **Installation and Usage:** Marker can be easily installed using the Python package manager pip with the command pip install marker-pdf. For converting documents other than PDFs, additional dependencies may need to be installed.12 The tool offers a command-line interface for converting single files using the marker\_single command or multiple files within a folder using the marker command with options for specifying the number of parallel processes.12 + * **Configuration:** Marker provides a wide range of command-line flags to control the conversion process. These include options for specifying the output directory (--output\_dir), the output format (--output\_format), whether to use an LLM (--use\_llm), and settings for handling images and OCR (--disable\_image\_extraction, \--force\_ocr, \--strip\_existing\_ocr).12 Language settings can also be specified using the \--languages flag.12 For more advanced configuration, Marker supports the use of a JSON configuration file.12 When using the \--use\_llm flag with Ollama, users can configure the Ollama base URL (--ollama\_base\_url) and the specific model to be used (--ollama\_model).12 + * **LLM Services:** When the \--use\_llm flag is enabled, Marker supports a variety of LLM services, including Gemini (using the Gemini developer API), Google Vertex, Ollama (for local models), Claude (using the Anthropic API), and OpenAI (supporting any OpenAI-like endpoint).12 This provides users with a broad choice of AI models and services to enhance the accuracy of their document conversions. + * **GPU Support:** Marker is capable of running on GPU, CPU, or MPS (Metal Performance Shaders for Apple silicon), which can significantly improve the speed of the conversion process, especially when utilizing LLMs or processing large volumes of data.12 + * **Repository Link:** The GitHub repository for Marker is located at [https://github.com/VikParuchuri/marker](https://github.com/VikParuchuri/marker).12 + * **Example Markdown Output:** The research material includes examples of Markdown output generated by Marker, which demonstrate its ability to handle tables and images effectively.18 Snippet 18 shows a Markdown file with a figure, and 19 illustrates JSON and Markdown output from processing a research paper, including the extraction of sections and handling of images. Additionally, 44 suggests that Marker is helpful for preserving tables when converting PDFs to Markdown. These examples provide evidence of Marker's capability to convert image-based tables into Markdown format. + * **Ollama Integration Details:** Marker's integration with Ollama is well-documented in the research material.12 Users can configure the connection to their local Ollama server by specifying the base URL (typically http://localhost:11434) and the name of the desired model (e.g., gemma3:27b) through command-line arguments or within a configuration file.12 This integration allows Marker to leverage the power of locally run LLMs for tasks like improving table accuracy and formatting without requiring an internet connection or external API keys for the LLM itself. +* **2.3. MarkItDown** + * **Description:** MarkItDown is a lightweight Python utility developed by Microsoft for the purpose of converting various file formats to Markdown.13 Its primary focus is on preserving the important structure and content of documents, making it suitable for use with Large Language Models (LLMs) and related text analysis pipelines. + * **TypeScript Friendly:** MarkItDown is primarily a Python-based tool, and the provided research material does not mention any direct TypeScript bindings or a dedicated TypeScript API.17 Similar to Marker, integration with TypeScript projects would likely involve running MarkItDown as a separate process and consuming its Markdown output. + * **AI Capabilities (API-based):** MarkItDown supports the use of LLMs for tasks such as generating descriptions for images within the converted documents.17 The documentation provides examples of using OpenAI's API, specifically the gpt-4o model, by instantiating the MarkItDown class with an llm\_client and llm\_model.17 Additionally, it supports using Microsoft's Azure Document Intelligence service for document conversion, which requires providing an endpoint to an Azure Document Intelligence Resource.17 The research material does not indicate any direct integration with locally hosted LLM solutions like Ollama. + * **Table Support:** MarkItDown is designed to preserve important document structure and content during the conversion to Markdown, explicitly including headings, lists, tables, and links.13 This suggests that the tool aims to accurately represent tabular data from various input formats, including images within documents, in Markdown table format. + * **Languages:** MarkItDown is written in Python.17 + * **Installation and Usage:** MarkItDown can be installed using pip with the command pip install 'markitdown\[all\]', which installs all optional dependencies required for handling various file formats.17 The tool can be used from the command line by specifying the path to the file to be converted, with an option to specify the output file using the \-o flag or by piping the output.17 It also offers a Python API for programmatic use, allowing developers to integrate its conversion capabilities into their Python applications.17 + * **Supported Formats:** MarkItDown supports a wide range of input file formats, including PDF, PowerPoint, Word, Excel, Images (with EXIF metadata and OCR), Audio (with EXIF metadata and speech transcription), HTML, text-based formats like CSV, JSON, and XML, as well as ZIP files by iterating over their contents.13 This broad format support makes it a versatile tool for handling various types of documents that may contain images and tables. + * **Plugins:** MarkItDown supports third-party plugins, which can be used to extend its functionality. Plugins are disabled by default but can be enabled using the \--use-plugins command-line option.17 + * **Repository Link:** The GitHub repository for MarkItDown is located at [https://github.com/microsoft/markitdown](https://github.com/microsoft/markitdown).17 + * **Example Markdown Output:** While the research material mentions MarkItDown's focus on preserving document structure, including tables 13, it does not provide a specific example of converting an image of a table directly to Markdown table syntax. + * **AI Capabilities (API-based):** As previously mentioned, MarkItDown's AI capabilities rely on integration with external API services like OpenAI and Azure Document Intelligence.17 This means that for AI-powered features like image descriptions, users would need to have API keys for these services, and usage might incur costs depending on the volume of data processed. There is no indication of support for free or locally hosted AI solutions like Ollama within the provided snippets. +* **2.4. markdownify-mcp** + * **Description:** markdownify-mcp is described as a Model Context Protocol (MCP) server built using TypeScript, with the purpose of converting various file types and web content into Markdown format.23 + * **TypeScript Friendly:** Being built entirely with TypeScript, markdownify-mcp is inherently friendly to TypeScript developers.23 This provides a significant advantage for users who prefer to work within the JavaScript/TypeScript ecosystem, potentially allowing for easier integration, understanding of the codebase, and customization if needed. + * **AI Capabilities:** The features list for markdownify-mcp includes "Convert images to Markdown with metadata".23 However, the provided research material does not explicitly mention the use of artificial intelligence in this conversion process. It is possible that the tool relies on more traditional OCR techniques or that details about AI usage are not covered in the available snippets. + * **Table Support:** While markdownify-mcp can convert images to Markdown, the research material does not provide specific information on how it handles tables within images.23 Snippet 23 suggests that more detailed information regarding the image-to-markdown tool's functionality, including table handling, might be available in the project's README file or by examining the source code in src/tools.ts. + * **Languages:** The primary programming language for markdownify-mcp is TypeScript.23 + * **Installation and Usage:** To get started with markdownify-mcp, users need to clone the project repository, install dependencies using the pnpm install command, build the project using pnpm run build, and then start the server with pnpm start.23 + * **Tools:** markdownify-mcp provides a set of tools for converting different types of content to Markdown, including pdf-to-markdown, bing-search-to-markdown, webpage-to-markdown, image-to-markdown, and audio-to-markdown.23 The image-to-markdown tool is the most relevant to the user's query. + * **Repository Link:** The GitHub repository for markdownify-mcp is located at [https://github.com/zcaceres/markdownify-mcp](https://github.com/zcaceres/markdownify-mcp).23 + * **Example Markdown Output:** The provided research material does not include an example of the Markdown output generated by markdownify-mcp, specifically for the image-to-markdown tool and its potential handling of tables. This lack of a concrete example makes it difficult to assess the quality and format of the output for the user's specific requirements. + +**3\. Comparison of Tools** + +The following table summarizes the key features of the analyzed open-source tools based on the research material: + +| Tool/Library | Link | TypeScript Friendly | Table Support | AI Capabilities | Primary Languages | +| :---- | :---- | :---- | :---- | :---- | :---- | +| LlamaOCR | [https://github.com/Nutlope/llama-ocr](https://github.com/Nutlope/llama-ocr) | Yes | Claims support for complex layouts like tables | Free (via Together AI API key) | JavaScript, TypeScript | +| Marker | [https://github.com/VikParuchuri/marker](https://github.com/VikParuchuri/marker) | No | Strong support with dedicated TableConverter | Optional (Ollama for local, others via API) | Python | +| MarkItDown | [https://github.com/microsoft/markitdown](https://github.com/microsoft/markitdown) | No | Aims to preserve tables during conversion | API-based (OpenAI, Azure) | Python | +| markdownify-mcp | [https://github.com/zcaceres/markdownify-mcp](https://github.com/zcaceres/markdownify-mcp) | Yes | Details unclear in snippets | Not explicitly mentioned in snippets | TypeScript | + +**4\. Other Potential Solutions and Considerations** + +* **Mathpix Snip:** Mathpix Snip is a powerful OCR tool that offers specific features for converting images and PDFs to Markdown tables using AI.24 It is available as a web and mobile application, as well as a desktop snipping tool.24 While it boasts strong table conversion capabilities and supports a wide range of languages 27, it is not strictly open-source and requires a paid subscription for unlimited use beyond the free tier.26 However, for users prioritizing accuracy, especially with STEM content, it remains a noteworthy option for comparison. +* **Konbert:** Konbert is an online converter that utilizes AI-powered OCR to transform JPG and PNG images into Markdown tables.32 It offers a free service for files up to 5MB.32 While it directly addresses the image-to-table conversion requirement using AI, it is not an open-source library and relies on an external online service. Concerns about transparency and reliability have also been raised in reviews regarding a similar tool from the same domain.34 +* **Aspose:** Aspose is a commercial library for.NET and Java that provides a wide range of document conversion capabilities, including JPG to Markdown.36 While it is known for high-quality conversion and supports various file formats 37, it does not meet the user's open-source requirement. Reviews suggest generally positive experiences, but potential issues with image quality during conversion and the importance of image resolution for OCR accuracy have been noted.38 +* **General Challenges of OCR Accuracy:** It is important to acknowledge the inherent challenges in achieving perfect accuracy with OCR, especially when dealing with complex table layouts, low-resolution or distorted images, and handwritten text.1 Even AI-powered OCR is subject to limitations, and the quality of the output is heavily dependent on the quality of the input image. Users should set realistic expectations and be prepared for potential manual corrections regardless of the tool chosen. + +**5\. Recommendations** + +Based on the analysis, the following recommendations are provided: + +* **LlamaOCR:** Due to its direct TypeScript integration and the availability of a free AI model via Together AI, LlamaOCR appears to be a strong candidate for users primarily working within the TypeScript ecosystem. However, thorough testing of its table conversion accuracy with the user's specific image types is crucial. Users should also be aware of the potential limitations of Together AI's free API tier regarding usage and rate limits. +* **Marker:** For users who require robust table support and prefer the option of locally hosted AI, Marker presents a compelling solution. Its integration with Ollama directly addresses this need. While it is a Python-based tool, the potential benefits of its strong table handling capabilities and local AI option might outweigh the integration efforts required for a TypeScript workflow. Exploring methods for integrating Python processes into TypeScript applications could be beneficial. +* **MarkItDown:** MarkItDown is a viable option for users who need to convert images within various document formats to Markdown and are comfortable with a Python-based tool. Its focus on preserving document structure, including tables, is relevant. However, its AI capabilities rely on cloud-based APIs and do not meet the "free or locally hosted" criterion. +* **markdownify-mcp:** For users who strictly require a TypeScript-based solution, markdownify-mcp is an option. However, given the lack of detailed information about its AI capabilities and table conversion effectiveness in the provided snippets, further investigation of its documentation and source code is recommended before making a decision. + +Ultimately, the most suitable tool will depend on the user's specific priorities, technical environment, and the nature of the images and tables they need to convert. Practical testing of the most promising tools with representative samples is highly recommended to determine the best fit. + +**6\. Conclusion** + +The analysis of the provided research material reveals several open-source tools with the potential to convert images to Markdown, including handling tables and leveraging AI. LlamaOCR stands out for its direct TypeScript integration and free AI (via API), while Marker offers robust table support and the option for locally hosted AI through Ollama. MarkItDown provides broad format support and API-based AI, and markdownify-mcp offers a purely TypeScript-based solution. Each tool presents its own set of strengths and trade-offs concerning TypeScript compatibility, AI implementation, and table handling capabilities. The "perfect" solution will depend on the user's specific needs and priorities. Thorough testing and evaluation of the most promising options are crucial to ensure the selected tool meets the required accuracy and workflow integration demands. + +#### **Works cited** + +1. Transforming Images into Markdown: A Guide to LlamaOCR \- Cohorte Projects, accessed on April 23, 2025, [https://www.cohorte.co/blog/transforming-images-into-markdown-a-guide-to-llamaocr](https://www.cohorte.co/blog/transforming-images-into-markdown-a-guide-to-llamaocr) +2. Llama OCR: OCR library that converts images to Markdown in three lines of code using the free Llama 3.2 Vision interface \- Chief AI Sharing Circle \- 首席AI分享圈, accessed on April 23, 2025, [https://www.aisharenet.com/en/llama-ocr/](https://www.aisharenet.com/en/llama-ocr/) +3. Nutlope/llama-ocr: Document to Markdown OCR library with Llama 3.2 vision \- GitHub, accessed on April 23, 2025, [https://github.com/Nutlope/llama-ocr](https://github.com/Nutlope/llama-ocr) +4. Rate limits \- Introduction \- Together AI, accessed on April 23, 2025, [https://docs.together.ai/docs/rate-limits](https://docs.together.ai/docs/rate-limits) +5. Is the Together AI API key free? Exploring scalable AI solutions for developers and SMBs, accessed on April 23, 2025, [https://www.byteplus.com/en/topic/552569](https://www.byteplus.com/en/topic/552569) +6. How to use the Free Tier? \- AI/ML API Documentation, accessed on April 23, 2025, [https://docs.aimlapi.com/faq/free-tier](https://docs.aimlapi.com/faq/free-tier) +7. Is the Together API key free? Exploring scalable AI solutions for developers and SMBs, accessed on April 23, 2025, [https://www.byteplus.com/en/topic/554906](https://www.byteplus.com/en/topic/554906) +8. Together Pricing | The Most Powerful Tools at the Best Value, accessed on April 23, 2025, [https://www.together.ai/pricing](https://www.together.ai/pricing) +9. Try These Free, Unlimited AI API Keys for Cursor / Cline \- Hugging Face, accessed on April 23, 2025, [https://huggingface.co/blog/lynn-mikami/free-ai-apis](https://huggingface.co/blog/lynn-mikami/free-ai-apis) +10. MinimalDevops/llama-ocr: llama-ocr using python \- GitHub, accessed on April 23, 2025, [https://github.com/MinimalDevops/llama-ocr](https://github.com/MinimalDevops/llama-ocr) +11. yYorky/LlamaOCR: Effortlessly process invoices with AI\! This project uses the Llama3.2 Vision Model for OCR, converting invoice images into structured, machine-readable tables. Designed for accountants, it automates data extraction and outputs in tabular format ready for ERP integration, improving efficiency and accuracy in invoice management. \- GitHub, accessed on April 23, 2025, [https://github.com/yYorky/LlamaOCR](https://github.com/yYorky/LlamaOCR) +12. VikParuchuri/marker: Convert PDF to markdown \+ JSON quickly with high accuracy \- GitHub, accessed on April 23, 2025, [https://github.com/VikParuchuri/marker](https://github.com/VikParuchuri/marker) +13. Microsoft has released an open source Python tool for converting other document formats to markdown : r/ObsidianMD \- Reddit, accessed on April 23, 2025, [https://www.reddit.com/r/ObsidianMD/comments/1hioaov/microsoft\_has\_released\_an\_open\_source\_python\_tool/](https://www.reddit.com/r/ObsidianMD/comments/1hioaov/microsoft_has_released_an_open_source_python_tool/) +14. Optimal Hardware for Running Ollama Models with Marker for PDF to Markdown Conversion, accessed on April 23, 2025, [https://www.reddit.com/r/ollama/comments/1itbr79/optimal\_hardware\_for\_running\_ollama\_models\_with/](https://www.reddit.com/r/ollama/comments/1itbr79/optimal_hardware_for_running_ollama_models_with/) +15. Ollama Inference Failure and Broken Pipe Error · Issue \#621 · VikParuchuri/marker \- GitHub, accessed on April 23, 2025, [https://github.com/VikParuchuri/marker/issues/621](https://github.com/VikParuchuri/marker/issues/621) +16. Extract Table Info From PDF & Summarise It Using Llama3 via Ollama | LangChain, accessed on April 23, 2025, [https://www.youtube.com/watch?v=hQu8WN8NuVg](https://www.youtube.com/watch?v=hQu8WN8NuVg) +17. microsoft/markitdown: Python tool for converting files and office documents to Markdown. \- GitHub, accessed on April 23, 2025, [https://github.com/microsoft/markitdown](https://github.com/microsoft/markitdown) +18. Vision Parse with Ollama \- Parse PDF Documents into MarkDown Content \- YouTube, accessed on April 23, 2025, [https://www.youtube.com/watch?v=6ilFgwUyuWE](https://www.youtube.com/watch?v=6ilFgwUyuWE) +19. Marker: This Open-Source Tool will make your PDFs LLM Ready \- YouTube, accessed on April 23, 2025, [https://www.youtube.com/watch?v=mdLBr9IMmgI\&pp=0gcJCfcAhR29\_xXO](https://www.youtube.com/watch?v=mdLBr9IMmgI&pp=0gcJCfcAhR29_xXO) +20. Extract Table Info From SCANNED PDF & Summarise It Using Llama3.1 via Ollama, accessed on April 23, 2025, [https://www.youtube.com/watch?v=nkE65p42RgM](https://www.youtube.com/watch?v=nkE65p42RgM) +21. Microsoft open sources a markdown library to convert documents to markdown \- Community, accessed on April 23, 2025, [https://community.openai.com/t/microsoft-open-sources-a-markdown-library-to-convert-documents-to-markdown/1061731](https://community.openai.com/t/microsoft-open-sources-a-markdown-library-to-convert-documents-to-markdown/1061731) +22. MarkItDown: Python tool for converting files and office documents to Markdown | Hacker News, accessed on April 23, 2025, [https://news.ycombinator.com/item?id=42410803](https://news.ycombinator.com/item?id=42410803) +23. zcaceres/markdownify-mcp: A Model Context Protocol ... \- GitHub, accessed on April 23, 2025, [https://github.com/zcaceres/markdownify-mcp](https://github.com/zcaceres/markdownify-mcp) +24. OCR-powered Markdown Table Generator \- Mathpix, accessed on April 23, 2025, [https://mathpix.com/blog/ocr-powered-markdown-table-generator](https://mathpix.com/blog/ocr-powered-markdown-table-generator) +25. Mathpix now supports basic table OCR, accessed on April 23, 2025, [https://mathpix.com/blog/v1-table-recognition](https://mathpix.com/blog/v1-table-recognition) +26. Snip Apps \- Mathpix, accessed on April 23, 2025, [https://mathpix.com/snip](https://mathpix.com/snip) +27. All Supported Languages \- Mathpix, accessed on April 23, 2025, [https://mathpix.com/language-support](https://mathpix.com/language-support) +28. Convert API User Guide: Supported Languages \- Mathpix, accessed on April 23, 2025, [https://mathpix.com/docs/convert/supported\_languages](https://mathpix.com/docs/convert/supported_languages) +29. Snip web app now translated into 14 languages \- Mathpix, accessed on April 23, 2025, [https://mathpix.com/blog/multi-language-interface](https://mathpix.com/blog/multi-language-interface) +30. The best OCR for Chinese and math \- Mathpix, accessed on April 23, 2025, [https://mathpix.com/blog/ocr-chinese-characters](https://mathpix.com/blog/ocr-chinese-characters) +31. Mathpix Pricing, accessed on April 23, 2025, [https://mathpix.com/pricing](https://mathpix.com/pricing) +32. Convert JPG to Markdown Table \- Konbert, accessed on April 23, 2025, [https://konbert.com/convert/jpeg/to/markdown](https://konbert.com/convert/jpeg/to/markdown) +33. Convert PNG to Markdown Table, accessed on April 23, 2025, [https://konbert.com/convert/png/to/markdown](https://konbert.com/convert/png/to/markdown) +34. tool to convert pdf to markdown and keep all the formatting, tables, images etc. \- Reddit, accessed on April 23, 2025, [https://www.reddit.com/r/ObsidianMD/comments/1jkmdx9/tool\_to\_convert\_pdf\_to\_markdown\_and\_keep\_all\_the/](https://www.reddit.com/r/ObsidianMD/comments/1jkmdx9/tool_to_convert_pdf_to_markdown_and_keep_all_the/) +35. Convert JPG to Markdown table \- table.studio | The AI Spreadsheet, accessed on April 23, 2025, [https://table.studio/convert/jpeg/to/markdown](https://table.studio/convert/jpeg/to/markdown) +36. Convert JPG To Markdown Online \- Aspose Products, accessed on April 23, 2025, [https://products.aspose.app/words/conversion/jpg-to-md](https://products.aspose.app/words/conversion/jpg-to-md) +37. Aspose.Cells-Cloud 25.3.0 \- NuGet Gallery, accessed on April 23, 2025, [https://www.nuget.org/packages/Aspose.Cells-Cloud](https://www.nuget.org/packages/Aspose.Cells-Cloud) +38. Aspose.OCR fails to read simple JPEG files \- Stack Overflow, accessed on April 23, 2025, [https://stackoverflow.com/questions/45921387/aspose-ocr-fails-to-read-simple-jpeg-files](https://stackoverflow.com/questions/45921387/aspose-ocr-fails-to-read-simple-jpeg-files) +39. Low image quality when converting Word to Markdown \- Free Support Forum \- aspose.com, accessed on April 23, 2025, [https://forum.aspose.com/t/low-image-quality-when-converting-word-to-markdown/269407](https://forum.aspose.com/t/low-image-quality-when-converting-word-to-markdown/269407) +40. Llama-OCR: Document to Markdown | Hacker News, accessed on April 23, 2025, [https://news.ycombinator.com/item?id=42154410](https://news.ycombinator.com/item?id=42154410) +41. LlamaOCR.com – Document to markdown, accessed on April 23, 2025, [https://llamaocr.com/](https://llamaocr.com/) +42. markdown output for table in pdf is incorrect · Issue \#167 · run-llama/llama\_cloud\_services, accessed on April 23, 2025, [https://github.com/run-llama/llama\_parse/issues/167](https://github.com/run-llama/llama_parse/issues/167) +43. From Screenshots to Markdown Tables with LLMs \- Shekhar Gulati, accessed on April 23, 2025, [https://shekhargulati.com/2024/07/22/from-screenshots-to-markdown-tables-with-llms/](https://shekhargulati.com/2024/07/22/from-screenshots-to-markdown-tables-with-llms/) +44. What model would you use to extract full pdf? : r/ollama \- Reddit, accessed on April 23, 2025, [https://www.reddit.com/r/ollama/comments/1gc8je1/what\_model\_would\_you\_use\_to\_extract\_full\_pdf/](https://www.reddit.com/r/ollama/comments/1gc8je1/what_model_would_you_use_to_extract_full_pdf/) \ No newline at end of file