Show HN: Pi Labs – AI scoring and optimization tools for software engineers https://ift.tt/87YKZji

Show HN: Pi Labs – AI scoring and optimization tools for software engineers Hey HN, after years building some of the core AI and NLU systems in Google Search, we decided to leave and build outside. Our goal was to put the advanced ML and DS techniques we’ve been using in the hands of all software engineers, so that everyone can build AI and Search apps at the same level of performance and sophistication as the big labs. This was a hard technical challenge but we were very inspired by the MVC architecture for Web development. The intuition there was that when a data model changes, its view would get auto-updated. We built a similar architecture for AI. On one side is a scoring system, which encapsulates in a set of metrics what’s good about the AI application. On the other side is a set of optimizers that “compile” against this scorer - prompt optimization, data filtering, synthetic data generation, supervised learning, RL, etc. The scoring system can be calibrated using developer, user or rater feedback, and once it’s updated, all the optimizers get recompiled against it. The result is a setup that makes it easy to incrementally improve the quality of your AI in a tight feedback loop: You update your scorers, they auto-update your optimizers, your app gets better, you see that improvement in interpretable scores, and then you repeat, progressing from simpler to more advanced optimizers and from off-the-shelf to calibrated scorers. We would love your feedback on this approach. https://build.withpi.ai has a set of playgrounds to help you quickly build a scorer and multiple optimizers. No sign in required. https://code.withpi.ai has the API reference and Notebook links. Finally, we have a Loom demo [1]. More technical details Scorers: Our scoring system has three key differences from the common LLM-as-a-judge pattern. First, rather than a single label or metric from an LLM judge, our scoring system is represented as a tunable tree of metrics, with 20+ dimensions which get combined into a final (non-linear) weighted score. The tree structure makes scores easily interpretable (just look at the breakdown by dimension), extensible (just add/remove a dimension), and adjustable (just re-tune the weights). Training the scoring system with labeled/preference data adjusts the weights. You can automate this process with user feedback signals, resulting in a tight feedback loop. Second, our scoring system handles natural language dimensions (great for free-form, qualitative questions requiring NLU) alongside quantitative dimensions (like computations over dates or doc length, which can be provided in Python) in the same tree. When calibrating with your labeled or preference data, the scorer learns how to balance these. Third, for natural language scoring, we use specialized smaller encoder models rather than autoregressive models. Encoders are a natural fit for scoring as they are faster and cheaper to run, easier to fine-tune, and more suitable architecturally (bi-directional attention with regression or classification head) than similar sized decoder models. For example, we can score 20+ dimensions in sub-100ms, making it possible to use scoring everywhere from evaluation to agent orchestration to reward modeling. Optimizers: We took the most salient ML techniques and reformulated them as optimizers against our scoring system e.g. for DSPy, the scoring system acts as its validator. For GRPO, the scoring system acts as its reward model. We’re keen to hear the community’s feedback on which techniques to add next. Overall stack: Playgrounds next.js and Vercel. AI: Runpod and GCP for training GPUs, TRL for training algos, ModernBert & Llama as base models. GCP and Azure for 4o and Anthropic calls. We’d love your feedback and perspectives: Our team will be around to answer questions and discuss. If there’s a lot of interest, happy to host a live session! - Achint, co-founder of Pi Labs [1] https://ift.tt/o1A3MzJ https://ift.tt/msEv1xM March 14, 2025 at 07:07PM

World News

Labels Cloud

Hot News

Socialize

Page Nav

Breaking News

News

Sports

Grid

Menu Footer Widget

Featured

Social Plugin

Videos

Text Widget

Populars

Trending Posts Display

Home Layout Display

Contact Form

Contact Us

Ticker

Latest News

Labels

Ad Code

Like Us

Latest

Brexit

Football

America

Total Pageviews

Home Top Ad

Archive

Post Top Ad

Post Bottom Ad

728x90 AdSpace

Slider

Subscribe Us

Ads Place

Ad Space

Footer Menu

Connect WIth Us

Sports News

Games

Category

Sports

Trends

About Us

News By Picture

Politics

Travel

Tech

Music

Games

Ads Place

Iklan Atas Artikel

Social

Pages

Iklan Tengah Artikel 1

Content Marketing

Iklan Tengah Artikel 2

Privacy Policy

Iklan Bawah Artikel

Fashion & Lifestyle

Popular

Show HN: Pi Labs – AI scoring and optimization tools for software engineers https://ift.tt/87YKZji

0 Comments: