Show HN: open source framework OpenAI uses for Advanced Voice Hey HN, we've been working with OpenAI for the past few months on the new Realtime API. The goal is to give everyone access to the same stack that underpins Advanced Voice in the ChatGPT app. Under the hood it works like this: - A user's speech is captured by a LiveKit client SDK in the ChatGPT app - Their speech is streamed using WebRTC to OpenAI’s voice agent - The agent relays the speech prompt over websocket to GPT-4o - GPT-4o runs inference and streams speech packets (over websocket) back to the agent - The agent relays generated speech using WebRTC back to the user’s device The Realtime API that OpenAI launched is the websocket interface to GPT-4o. This backend framework covers the voice agent portion. Besides having additional logic like function calling, the agent fundamentally proxies WebRTC to websocket. The reason for this is because websocket isn’t the best choice for client-server communication. The vast majority of packet loss occurs between a server and client device and websocket doesn’t provide programmatic control or intervention in lossy network environments like WiFi or cellular. Packet loss leads to higher latency and choppy or garbled audio. https://ift.tt/9tyHjUc October 4, 2024 at 10:31PM
Show HN: open source framework OpenAI uses for Advanced Voice https://ift.tt/Zj5ITgO
Related Articles
Show HN: CloudCoil – Production-ready Python client for cloud-native ecosystem https://ift.tt/NmqyTsxShow HN: CloudCoil – Production-ready Python client for cloud-native e… Read More
Show HN: Pytest-evals – Simple LLM apps evaluation using pytest https://ift.tt/wWgLFANShow HN: Pytest-evals – Simple LLM apps evaluation using pytest https:… Read More
Show HN: Snap Scope – Visualize Lens Focal Length Distribution from EXIF Data https://ift.tt/BGq0vo3Show HN: Snap Scope – Visualize Lens Focal Length Distribution from EX… Read More
Show HN: Helicone (YC W23) – OSS LLM Observability and Development Platform https://ift.tt/nSpftL6Show HN: Helicone (YC W23) – OSS LLM Observability and Development Pla… Read More
Show HN: RAG Web UI – Possibly the Most Beginner-Friendly RAG Knowledge Base https://ift.tt/XQyGoUkShow HN: RAG Web UI – Possibly the Most Beginner-Friendly RAG Knowledg… Read More
Show HN: I Made an Open-Source Laptop from Scratch https://ift.tt/eyrzduFShow HN: I Made an Open-Source Laptop from Scratch Hello! I'm Byran. I… Read More
Show HN: Responding to SMS Spam with Ollama https://ift.tt/ox35alHShow HN: Responding to SMS Spam with Ollama I've been working on a sid… Read More
Show HN: I'm Building an Alternative to Figma https://ift.tt/0UX7HvpShow HN: I'm Building an Alternative to Figma I'm building Octo becaus… Read More
0 Comments: