Show HN: Continuous-eval – Granular evaluation of GenAI pipelines Hi HN - we are the creators of “continuous-eval”, an open-source tool to test and evaluate generative AI apps. "Continuous-eval" came from our efforts to measure, validate and improve the reliability of a finance AI copilot we were developing for banks. End-to-end evaluation was not enough for us. We wanted to have granular evaluations that help pinpoint the bottlenecks and identify what / how to improve. We’ve since developed more metrics and made the framework more flexible so it can evaluate components like agent tool use, code change, retrieval steps, etc. Let us know what you think of our approach to GenAI App evaluation. https://ift.tt/cFVeQny February 26, 2024 at 12:11AM
Show HN: Continuous-eval – Granular evaluation of GenAI pipelines https://ift.tt/kSEhrdm
Related Articles
Show HN: Meal planning – without the mental load https://ift.tt/kcEfAr7Show HN: Meal planning – without the mental load TLDR; I applied the c… Read More
Show HN: Get paid to do your own ML research https://ift.tt/ylsvHmRShow HN: Get paid to do your own ML research I'm launching an experime… Read More
Show HN: A New Kind of Chat Room https://ift.tt/uNsmCx6Show HN: A New Kind of Chat Room I’ve developed an application that re… Read More
Show HN: Blue Noise – Interactive Explanation of Void and Cluster Algorithm https://ift.tt/l9MLP1rShow HN: Blue Noise – Interactive Explanation of Void and Cluster Algo… Read More
Show HN: I've Created the First Artificial Memory (and It's Open-Source) https://ift.tt/L1xT9HkShow HN: I've Created the First Artificial Memory (and It's Open-Sourc… Read More
Show HN: FlashText with Rust for Python https://ift.tt/evd1nbUShow HN: FlashText with Rust for Python LeNLP is a toolbox dedicated t… Read More
Show HN: Awesome CI/CD Attacks https://ift.tt/djAJNHWShow HN: Awesome CI/CD Attacks https://ift.tt/myfSt9o May 28, 2024 at … Read More
Show HN: I Built an Invoicing App https://ift.tt/bkQaIKmShow HN: I Built an Invoicing App It's probably not the most interesti… Read More
0 Comments: