Show HN: Continuous-eval – Granular evaluation of GenAI pipelines Hi HN - we are the creators of “continuous-eval”, an open-source tool to test and evaluate generative AI apps. "Continuous-eval" came from our efforts to measure, validate and improve the reliability of a finance AI copilot we were developing for banks. End-to-end evaluation was not enough for us. We wanted to have granular evaluations that help pinpoint the bottlenecks and identify what / how to improve. We’ve since developed more metrics and made the framework more flexible so it can evaluate components like agent tool use, code change, retrieval steps, etc. Let us know what you think of our approach to GenAI App evaluation. https://ift.tt/cFVeQny February 26, 2024 at 12:11AM
Show HN: Continuous-eval – Granular evaluation of GenAI pipelines https://ift.tt/kSEhrdm
Related Articles
Show HN: API to deliver responsive images for Web https://ift.tt/RMzNPFJShow HN: API to deliver responsive images for Web https://ift.tt/BswnI… Read More
Show HN: Using stylometry to find HN users with alternate accounts https://ift.tt/3LwglISShow HN: Using stylometry to find HN users with alternate accounts htt… Read More
Show HN: WinkNLP delivers 600k tokens/second speed on browsers (MBP M1) https://ift.tt/M9vi4oSShow HN: WinkNLP delivers 600k tokens/second speed on browsers (MBP M1… Read More
Show HN: A tool that automatically follows people from Twitter on Mastodon https://ift.tt/TF0Z9NOShow HN: A tool that automatically follows people from Twitter on Mast… Read More
Show HN: Iceburg CRM – Open-Source Meta Driven CRM Using Vue3 / Laravel https://ift.tt/DpLHNMRShow HN: Iceburg CRM – Open-Source Meta Driven CRM Using Vue3 / Larave… Read More
Show HN: I built an app that scans every social media network for your username https://ift.tt/h9tFyczShow HN: I built an app that scans every social media network for your… Read More
Show HN: We created a tool to visualize scientific knowledge https://ift.tt/84f2ESqShow HN: We created a tool to visualize scientific knowledge I posted … Read More
Show HN: AudioGata, a plugin based Web Audio player https://ift.tt/H6GslMDShow HN: AudioGata, a plugin based Web Audio player https://ift.tt/qBb… Read More
0 Comments: