Show HN: OCR Benchmark Focusing on Automation OCR/Document extraction field has seen lot of action recently with releases like Mixtral OCR, Andrew Ng's agentic document processing etc. Also there are several benchmarks for OCR, however all testing for something slightly different which make good comparison of models very hard. To give an example, some models like mixtral-ocr only try to convert a document to markdown format. You have to use another LLM on top of it to get the final result. Some VLM’s directly give structured information like key fields from documents like invoices, but you have to either add business rules on top of it or use some LLM as a judge kind of system to get sense of which output needs to be manually reviewed or can be taken as correct output. No benchmark attempts to measure the actual rate of automation you can achieve. We have tried to solve this problem with a benchmark that is only applicable for documents/usecases where you are looking for automation and its trying to measure that end to end automation level of different models or systems. We have collected a dataset that represents documents like invoices etc which are applicable in processes where automation is needed vs are more copilot in nature where you would need to chat with document. Also have annotated these documents and published the dataset and repo so it can be extended. Here is writeup: https://ift.tt/DlQiHCK Dataset: https://ift.tt/TxS2oLG Github: https://ift.tt/3lb5TvY Looking for suggestions on how this benchmark can be improved further. https://ift.tt/DlQiHCK March 13, 2025 at 02:19AM
Show HN: OCR Benchmark Focusing on Automation https://ift.tt/x5YzwXc
Related Articles
Show HN: Systema Robotica, a treatise on the order and evolution of robotkind https://ift.tt/4R5dOAIShow HN: Systema Robotica, a treatise on the order and evolution of ro… Read More
Show HN: Hacker Pulse a TUI for Hacker News in your terminal https://ift.tt/zGpZUvKShow HN: Hacker Pulse a TUI for Hacker News in your terminal https://i… Read More
Show HN: An extensive set of RAG implementations+many different strategies https://ift.tt/PEThaUSShow HN: An extensive set of RAG implementations+many different strate… Read More
Show HN: Learn Blender shortcuts with lots of tiny videos https://ift.tt/9HLrmNnShow HN: Learn Blender shortcuts with lots of tiny videos I've used bl… Read More
Show HN: Free Generative Service – AspirArt https://ift.tt/NvoiyjDShow HN: Free Generative Service – AspirArt No account needed, our fir… Read More
Show HN: We are building Figma for developers https://ift.tt/uYx5wSsShow HN: We are building Figma for developers What is Symbols? “The ul… Read More
Show HN: an AI copilot for Next.js developers https://ift.tt/fRJtnmeShow HN: an AI copilot for Next.js developers Hey HN! I'm Andrew, one … Read More
Show HN: Printing Chess Boards in a Terminal https://ift.tt/2rpxLB9Show HN: Printing Chess Boards in a Terminal I'm doing some fun chess-… Read More
0 Comments: