Show HN: Open-source study to measure end user satisfaction levels with LLMs https://ift.tt/3ksGSYJ

Show HN: Open-source study to measure end user satisfaction levels with LLMs The LLM challenge - an online study - aims to answer a simple question: what is the quality corridor that matters to end users when interacting with LLMs? At what point do users stop seeing a quality difference and at what point do users get frustrated by poor LLM quality. The project is an Apache 2.0 licensed open source project available on Github: https://ift.tt/GDCmJE6 . And the challenge is hosted on AWS as a single-page web app, where users see greeting text, followed by a randomly selected prompt and a LLM response, which they must rate on a likert scale of 1-5 (or yes/no rating) that matches the task represented in the prompt. The study uses pre-generated prompts across popular real-world uses cases like information extraction and summarization, creative tasks like writing a blog post or story, problem solving task like getting central ideas from a passage or writing business emails or brainstorming ideas to solve a problem at work/school. And to generate responses of varying quality the study uses the following OSS LLMs: Qwen 2-0.5B-Instruct, Qwen2-1.5B-Instruct, gemma-2-2B-it, Qwen2-7B-Instruct, Phi-3-small-128k-instruct, Qwen2-72B and Meta-Llama-3.1-70B. And for proprietary LLMs, we limited our choices to Claude 3 Haiku, Claude 3.5 Sonnet, OpenAI GPT 3.5-Turbo and OpenAI GPT4-o. Today, LLM vendors are in a race with each other to one-up benchmarks like MMLU, MTBench, HellowSwag etc - designed and rated primarily by human experts. But as LLMs get deployed in the real-world for end users and productivity workers, there hasn't been a study (as far as we know) that helps researches and developers understand the impact of model selection as perceived by end users. This study aims to get valuable insights to incorporate human-centric benchmarks in building generative AI applications and LLMs If you want to contribute to the AI community in an open source way, we'd love if you can take the challenge. We'll publish study results in 30 days on Github. https://ift.tt/HR86ucx August 27, 2024 at 11:18PM

World News

Labels Cloud

Hot News

Socialize

Page Nav

Breaking News

News

Sports

Grid

Menu Footer Widget

Featured

Social Plugin

Videos

Text Widget

Populars

Trending Posts Display

Home Layout Display

Contact Form

Contact Us

Ticker

Latest News

Labels

Ad Code

Like Us

Latest

Brexit

Football

America

Total Pageviews

Home Top Ad

Archive

Post Top Ad

Post Bottom Ad

728x90 AdSpace

Slider

Subscribe Us

Ads Place

Ad Space

Footer Menu

Connect WIth Us

Sports News

Games

Category

Sports

Trends

About Us

News By Picture

Politics

Travel

Tech

Music

Games

Ads Place

Iklan Atas Artikel

Social

Pages

Iklan Tengah Artikel 1

Content Marketing

Iklan Tengah Artikel 2

Privacy Policy

Iklan Bawah Artikel

Fashion & Lifestyle

Popular

Show HN: Open-source study to measure end user satisfaction levels with LLMs https://ift.tt/3ksGSYJ

0 Comments: