Show HN: open source framework OpenAI uses for Advanced Voice Hey HN, we've been working with OpenAI for the past few months on the new Realtime API. The goal is to give everyone access to the same stack that underpins Advanced Voice in the ChatGPT app. Under the hood it works like this: - A user's speech is captured by a LiveKit client SDK in the ChatGPT app - Their speech is streamed using WebRTC to OpenAI’s voice agent - The agent relays the speech prompt over websocket to GPT-4o - GPT-4o runs inference and streams speech packets (over websocket) back to the agent - The agent relays generated speech using WebRTC back to the user’s device The Realtime API that OpenAI launched is the websocket interface to GPT-4o. This backend framework covers the voice agent portion. Besides having additional logic like function calling, the agent fundamentally proxies WebRTC to websocket. The reason for this is because websocket isn’t the best choice for client-server communication. The vast majority of packet loss occurs between a server and client device and websocket doesn’t provide programmatic control or intervention in lossy network environments like WiFi or cellular. Packet loss leads to higher latency and choppy or garbled audio. https://ift.tt/9tyHjUc October 4, 2024 at 10:31PM
Show HN: open source framework OpenAI uses for Advanced Voice https://ift.tt/Zj5ITgO
Related Articles
Show HN: Steel.dev – An open-source browser API for AI agents and apps https://ift.tt/bnlRvAaShow HN: Steel.dev – An open-source browser API for AI agents and apps… Read More
Show HN: Jinbase – Multi-model transactional embedded database https://ift.tt/migAp03Show HN: Jinbase – Multi-model transactional embedded database Hi HN !… Read More
Show HN: Convert your text or CSV to infographics to 10x the engagement https://ift.tt/xWiEVlBShow HN: Convert your text or CSV to infographics to 10x the engagemen… Read More
Show HN: Bring Pokémon nostalgia to your code editor https://ift.tt/xKnz6SRShow HN: Bring Pokémon nostalgia to your code editor I created this VS… Read More
Show HN: Automate your studio – mute a mixer channel to turn your PTZ camera https://ift.tt/6lzPgeoShow HN: Automate your studio – mute a mixer channel to turn your PTZ … Read More
Show HN: LLM tool use schema generator for Kotlin Serializable classes https://ift.tt/satGjW4Show HN: LLM tool use schema generator for Kotlin Serializable classes… Read More
Show HN: Akiradocs – open-source Documentation Framework with AI features https://ift.tt/e0XnHDqShow HN: Akiradocs – open-source Documentation Framework with AI featu… Read More
Show HN: Markwhen: Markdown for Timelines https://ift.tt/YEiF3cXShow HN: Markwhen: Markdown for Timelines https://markwhen.com Decembe… Read More
0 Comments: