Show HN: Open-source model and scorecard for measuring hallucinations in LLMs Hi all! This morning, we released a new Apache 2.0 licensed model on HuggingFace for detecting hallucinations in retrieval augmented generation (RAG) systems. What we've found is that even when given a "simple" instruction like "summarize the following news article," every LLM that's available hallucinates to some extent, making up details that never existed in the source article -- and some of them quite a bit. As a RAG provider and proponents of ethical AI, we want to see LLMs get better at this. We've published an open source model, a blog more thoroughly describing our methodology (and some specific examples of these summarization hallucinations), and a GitHub repository containing our evaluation from the most popular generative LLMs available today. Links to all of them are referenced in the blog here, but for the technical audience here, the most interesting additional links might be: - https://ift.tt/HJwyvdx... - https://ift.tt/cPD9NZx We hope that releasing these under a truly open source license and detailing the methodology, we hope to increase the viability of anyone really quantitatively measuring and improving the generative LLMs they're publishing. https://ift.tt/VKUEhL6 November 7, 2023 at 12:41AM
Show HN: Open-source model and scorecard for measuring hallucinations in LLMs https://ift.tt/xHhOuyl
Related Articles
Show HN: How the result of the Stack Overflow survey changes over the years https://ift.tt/H1c85grShow HN: How the result of the Stack Overflow survey changes over the … Read More
Show HN: Playwright in Docker with hot reloading – Simple testautomation https://ift.tt/J1DcVsRShow HN: Playwright in Docker with hot reloading – Simple testautomati… Read More
Show HN: Dataherald AI – Natural Language to SQL Engine https://ift.tt/jmzAOf1Show HN: Dataherald AI – Natural Language to SQL Engine Hi HN communit… Read More
Show HN: Make sense of all your files, links and messages in the cloud https://ift.tt/2QX6WTrShow HN: Make sense of all your files, links and messages in the cloud… Read More
Show HN: Automate complicated manual business processes https://ift.tt/MkafPrAShow HN: Automate complicated manual business processes https://ift.tt… Read More
Show HN: Pip install inference, open source computer vision deployment https://ift.tt/d7ZLfA3Show HN: Pip install inference, open source computer vision deployment… Read More
Show HN: OnePrompt – Personal Assistant ChatBot Using GPT https://ift.tt/Nl5IGLYShow HN: OnePrompt – Personal Assistant ChatBot Using GPT OnePrompt is… Read More
Show HN: Gentrace – evaluation and observability for generative AI https://ift.tt/j5CMYasShow HN: Gentrace – evaluation and observability for generative AI Hi … Read More
0 Comments: