Show HN: Benchmarking VLMs vs. Traditional OCR Vision models have been gaining popularity as a replacement for traditional OCR. Especially with Gemini 2.0 becoming cost competitive with the cloud platforms. We've been continuously evaluating different models since we released the Zerox package last year ( https://ift.tt/CnLzlBU ). And we wanted to put some numbers behind it. So we’re open sourcing our internal OCR benchmark + evaluation datasets. Full writeup + data explorer here: https://ift.tt/K5UWPkp Github: https://ift.tt/2kWwsC0 Huggingface: https://ift.tt/wAbVgSo Couple notes on the methodology: 1. We are using JSON accuracy as our primary metric. The end goal is to evaluate how well each OCR provider can prepare the data for LLM ingestion. 2. This methodology differs from a lot of OCR benchmarks, because it doesn't rely on text similarity. We believe text similarity measurements are heavily biased towards the exact layout of the ground truth text, and penalize correct OCR that has slight layout differences. 3. Every document goes Image => OCR => Predicted JSON. And we compare the predicted JSON against the annotated ground truth JSON. The VLMs are capable of Image => JSON directly, we are primarily trying to measure OCR accuracy here. Planning to release a separate report on direct JSON accuracy next week. This is a continuous work in progress! There are at least 10 additional providers we plan to add to the list. The next big roadmap items are: - Comparing OCR vs. direct extraction. Early results here show a slight accuracy improvement, but it’s highly variable on page length. - A multilingual comparison. Right now the evaluation data is english only. - A breakdown of the data by type (best model for handwriting, tables, charts, photos, etc.) https://ift.tt/K5UWPkp February 21, 2025 at 12:19AM
Show HN: Benchmarking VLMs vs. Traditional OCR https://ift.tt/NZz2kF0
Related Articles
Show HN: Pgsodium – A Crytographic PostgreSQL Extension https://ift.tt/3tig9pMShow HN: Pgsodium – A Crytographic PostgreSQL Extension https://ift.tt… Read More
Show HN: Campy the Font https://ift.tt/3JZph8FShow HN: Campy the Font https://ift.tt/3Fdo9L2 January 11, 2022 at 01:… Read More
Show HN: Get Rich Quick – the ultimate Web3/crypto project https://ift.tt/3r5y2WhShow HN: Get Rich Quick – the ultimate Web3/crypto project https://ift… Read More
Show HN: A store builder for indie makers https://ift.tt/3JUPEg8Show HN: A store builder for indie makers https://ift.tt/3I22NCr Janua… Read More
Show HN: I bought and tested the filtration of every mask on Amazon https://ift.tt/3f6UjxdShow HN: I bought and tested the filtration of every mask on Amazon ht… Read More
Show HN: Webcrepe – SQL for the Internet https://ift.tt/3q7fupiShow HN: Webcrepe – SQL for the Internet https://ift.tt/3naMrPI Januar… Read More
Show HN: MergerFS – A Featureful Union Filesystem https://ift.tt/3F8fFF1Show HN: MergerFS – A Featureful Union Filesystem https://ift.tt/31C5e… Read More
Show HN: GitAlias v27 – many Git alias shortcuts and helpers https://ift.tt/3zLQVBwShow HN: GitAlias v27 – many Git alias shortcuts and helpers https://i… Read More
0 Comments: