Show HN: Voice bots with 500ms response times Last year when GPT-4 was released I started making lots of little voice + LLM experiments. Voice interfaces are fun; there are several interesting new problem spaces to explore. I'm convinced that voice is going to be a bigger and bigger part of how we all interact with generative AI. But one thing that's hard, today, is building voice bots that respond as quickly as humans do in conversation. A 500ms voice-to-voice response time is just barely possible with today's AI models. You can get down to 500ms if you: host transcription, LLM inference, and voice generation all together in one place; are careful about how you route and pipeline all the data; and the gods of both wifi and vram caching smile on you. Here's a demo of a 500ms-capable voice bot, plus a container you can deploy to run it yourself on an A10/A100/H100 if you want to: https://ift.tt/2nZARW8 We've been collecting lots of metrics. Here are typical numbers (in milliseconds) for all the easily measurable parts of the voice-to-voice response cycle. macOS mic input 40 opus encoding 30 network stack and transit 10 packet handling 2 jitter buffer 40 opus decoding 30 transcription and endpointing 200 llm ttfb 100 sentence aggregation 100 tts ttfb 80 opus encoding 30 packet handling 2 network stack and transit 10 jitter buffer 40 opus decoding 30 macOS speaker output 15 ---------------------------------- total ms 759 Everything in AI is changing all the time. LLMs with native audio input and output capabilities will likely make it easier to build fast-responding voice bots soon. But for the moment, I think this is the fastest possible approach/tech stack. https://ift.tt/2nZARW8 June 27, 2024 at 03:21AM
Show HN: Voice bots with 500ms response times https://ift.tt/IJwQmkL
Related Articles
Show HN: I built a tool to manage and compare credit card rewards https://ift.tt/cC8IdAnShow HN: I built a tool to manage and compare credit card rewards This… Read More
Show HN: memEx, a personal knowledge base inspired by zettlekasten and org-mode https://ift.tt/wPzjgG4Show HN: memEx, a personal knowledge base inspired by zettlekasten and… Read More
Show HN: Downr – An All-in-One Social Media Downloader for 50 Platforms https://ift.tt/EStc7KsShow HN: Downr – An All-in-One Social Media Downloader for 50 Platform… Read More
Show HN: I built a word game, or "Caesar's 20-puzzle" https://ift.tt/4WfZRgEShow HN: I built a word game, or "Caesar's 20-puzzle" https://ift.tt/b… Read More
Show HN: Pg_CRDT – CRDTs in Postgres Using Automerge https://ift.tt/Qp83AM7Show HN: Pg_CRDT – CRDTs in Postgres Using Automerge https://ift.tt/z7… Read More
Show HN: Atari Missile Command Game Built Using AI Gemini 2.5 Pro https://ift.tt/3sW0jw9Show HN: Atari Missile Command Game Built Using AI Gemini 2.5 Pro A mo… Read More
Show HN: OctAPI – Visualize API Routes Directly in VS Code https://ift.tt/xrX58BoShow HN: OctAPI – Visualize API Routes Directly in VS Code Started not… Read More
Show HN: Lunon – Instant model switching across LLMs https://ift.tt/g6YPJycShow HN: Lunon – Instant model switching across LLMs Hey HN! We built … Read More
0 Comments: