Show HN: Open-source model and scorecard for measuring hallucinations in LLMs Hi all! This morning, we released a new Apache 2.0 licensed model on HuggingFace for detecting hallucinations in retrieval augmented generation (RAG) systems. What we've found is that even when given a "simple" instruction like "summarize the following news article," every LLM that's available hallucinates to some extent, making up details that never existed in the source article -- and some of them quite a bit. As a RAG provider and proponents of ethical AI, we want to see LLMs get better at this. We've published an open source model, a blog more thoroughly describing our methodology (and some specific examples of these summarization hallucinations), and a GitHub repository containing our evaluation from the most popular generative LLMs available today. Links to all of them are referenced in the blog here, but for the technical audience here, the most interesting additional links might be: - https://ift.tt/HJwyvdx... - https://ift.tt/cPD9NZx We hope that releasing these under a truly open source license and detailing the methodology, we hope to increase the viability of anyone really quantitatively measuring and improving the generative LLMs they're publishing. https://ift.tt/VKUEhL6 November 7, 2023 at 12:41AM
Show HN: Open-source model and scorecard for measuring hallucinations in LLMs https://ift.tt/xHhOuyl
Related Articles
Show HN: I scraped 200M Shopify products to build a search engine https://ift.tt/sG8KXJzShow HN: I scraped 200M Shopify products to build a search engine Hi H… Read More
Show HN: Refractify: optical software against Myopia https://ift.tt/RXtoaU1Show HN: Refractify: optical software against Myopia Last summer there… Read More
Show HN: Learn Game Theory Optimal Poker Preflop with Spaced-Repetition https://ift.tt/enasQ4dShow HN: Learn Game Theory Optimal Poker Preflop with Spaced-Repetitio… Read More
Show HN: Task manager with bear notes style tagging system https://ift.tt/oJ5ZdRBShow HN: Task manager with bear notes style tagging system https://hyp… Read More
Show HN: Little Fixes – a spatial forum to improve your city https://ift.tt/vYhPt8mShow HN: Little Fixes – a spatial forum to improve your city https://l… Read More
Show HN: Psfiles – a CLI tool to monitor file system activity of a Linux process https://ift.tt/tCqWPZQShow HN: Psfiles – a CLI tool to monitor file system activity of a Lin… Read More
Show HN: I built jq-like scriptable tool to query CSV and JSON with SQLite https://ift.tt/KAFS2nyShow HN: I built jq-like scriptable tool to query CSV and JSON with SQ… Read More
Show HN: Flash Calendar – performance focused calendars for React Native https://ift.tt/kJYsqxFShow HN: Flash Calendar – performance focused calendars for React Nati… Read More
0 Comments: