Show HN: Open-source model and scorecard for measuring hallucinations in LLMs Hi all! This morning, we released a new Apache 2.0 licensed model on HuggingFace for detecting hallucinations in retrieval augmented generation (RAG) systems. What we've found is that even when given a "simple" instruction like "summarize the following news article," every LLM that's available hallucinates to some extent, making up details that never existed in the source article -- and some of them quite a bit. As a RAG provider and proponents of ethical AI, we want to see LLMs get better at this. We've published an open source model, a blog more thoroughly describing our methodology (and some specific examples of these summarization hallucinations), and a GitHub repository containing our evaluation from the most popular generative LLMs available today. Links to all of them are referenced in the blog here, but for the technical audience here, the most interesting additional links might be: - https://ift.tt/HJwyvdx... - https://ift.tt/cPD9NZx We hope that releasing these under a truly open source license and detailing the methodology, we hope to increase the viability of anyone really quantitatively measuring and improving the generative LLMs they're publishing. https://ift.tt/VKUEhL6 November 7, 2023 at 12:41AM
Show HN: Open-source model and scorecard for measuring hallucinations in LLMs https://ift.tt/xHhOuyl
Related Articles
Show HN: An online cookbook for cooking with feelings not measurements https://ift.tt/rFeT3vGShow HN: An online cookbook for cooking with feelings, not measurement… Read More
Show HN: I built a dashboard tracking the number of Threads users https://ift.tt/YNuLIFfShow HN: I built a dashboard tracking the number of Threads users http… Read More
Show HN: Banger.show create colorful visuals for your songs in seconds https://ift.tt/Bo7mgurShow HN: Banger.show – create colorful visuals for your songs in secon… Read More
Show HN: Day by Day every day of my life https://ift.tt/P1Z3JLyShow HN: Day by Day – every day of my life https://days.rory.codes Jul… Read More
Show HN: A news feed exclusively populated by journalists https://ift.tt/OHGKBYdShow HN: A news feed exclusively populated by journalists Hi HN -- Giv… Read More
Show HN: RF a portable Reddit browser in the style of HN that still works https://ift.tt/ZuSKAeNShow HN: RF – a portable Reddit browser in the style of HN that still … Read More
Show HN: Ngnr.club A link-in-bio service for engineers https://ift.tt/utcq2aFShow HN: Ngnr.club – A link-in-bio service for engineers https://ngnr.… Read More
Show HN: A Swift app I made in 4 months with no prior experience https://ift.tt/FjNmfRVShow HN: A Swift app I made in 4 months with no prior experience I hav… Read More
0 Comments: