Show HN: Continuous-eval – Granular evaluation of GenAI pipelines Hi HN - we are the creators of “continuous-eval”, an open-source tool to test and evaluate generative AI apps. "Continuous-eval" came from our efforts to measure, validate and improve the reliability of a finance AI copilot we were developing for banks. End-to-end evaluation was not enough for us. We wanted to have granular evaluations that help pinpoint the bottlenecks and identify what / how to improve. We’ve since developed more metrics and made the framework more flexible so it can evaluate components like agent tool use, code change, retrieval steps, etc. Let us know what you think of our approach to GenAI App evaluation. https://ift.tt/cFVeQny February 26, 2024 at 12:11AM
Show HN: Continuous-eval – Granular evaluation of GenAI pipelines https://ift.tt/kSEhrdm
Related Articles
Show HN: Hyperdiv – Reactive, immediate-mode web UI framework for Python https://ift.tt/p0htN4HShow HN: Hyperdiv – Reactive, immediate-mode web UI framework for Pyth… Read More
Show HN: An Experiment with One-Feature Tool Made $7164/Mo https://ift.tt/GprPZcxShow HN: An Experiment with One-Feature Tool Made $7164/Mo My Raw Stor… Read More
Show HN: jSuites v4 - A library of ultra-light components and plugins free (MIT) https://ift.tt/SUbDRZjShow HN: jSuites v4 - A library of ultra-light components and plugins … Read More
Show HN: CaveRibbon (SFCave Remake) https://ift.tt/5sQRcYWShow HN: CaveRibbon (SFCave Remake) Hi HN! I made this project to indu… Read More
Show HN: Like HN, but for Science https://ift.tt/NobaEcVShow HN: Like HN, but for Science https://ift.tt/QE0K4c3 February 19, … Read More
Show HN: LoraLand – 25 fine-tuned LLMs that beat GPT-4 https://ift.tt/dEIf1poShow HN: LoraLand – 25 fine-tuned LLMs that beat GPT-4 Hi all, today w… Read More
Show HN: DMARC Checker https://ift.tt/vtHFQB4Show HN: DMARC Checker https://ift.tt/bL7myzt February 20, 2024 at 10:… Read More
Show HN: Tl;Dr Voters – Simplifying Democracy with AI Powered Ballot Summaries https://ift.tt/wERb2AcShow HN: Tl;Dr Voters – Simplifying Democracy with AI Powered Ballot S… Read More
0 Comments: