Show HN: I built an OSS alternative to Azure OpenAI services Hey HN, I am proud to show you guys that I have built an open source alternative to Azure OpenAI services. Azure OpenAI services was born out of companies needing enhanced security and access control for using different GPT models. I want to build an OSS version of Azure OpenAI services that people could self host in their own infrastructure. "How can I track LLM spend per API key?" "Can I create a development OpenAI API key with limited access for Bob?" "Can I see my LLM spend breakdown by models and endpoints?" "Can I create 100 OpenAI API keys that my students could use in a classroom setting?" These are questions that BricksLLM helps you answer. BricksLLM is an API gateway that let you create API keys with rate limit, cost control and ttl that could be used to access all OpenAI and Anthropic endpoints with out of box analytics. When I first started building with OpenAI APIs, I was constantly worried about API keys being comprised since vanilla OpenAI API keys would grant you unlimited access to all of their models. There are stories of people losing thousands of dollars and the existence of a black market for stolen OpenAI API keys. This is why I started building a proxy for ourselves that allows for the creation of API keys with rate limits and cost controls. I built BricksLLM in Go since that was the language I used to build performative ads exchanges that scaled to thousands of requests per second at my previous job. A lot of developer tools in LLM ops are built with Python which I believe might be suboptimal in terms of performance and compute resource efficiency. One of the challenges building this platform is to get accurate token counts for different OpenAI and Anthropic models. LLM providers are not exactly transparent with the way how they count prompt and completion tokens. In addition to user input, OpenAI and Anthropic pad prompt inputs with additional instructions or phrases that contribute to the final token counts. For example, Anthropic's actual completion token consumption is consistently 4 more than the token count of the completion output. The latency of the gateway hovers around 50ms. Half of the latency comes from the tokenizer. If I start utilizing Go routines, might be able to lower the latency of the gateway to 30ms. BricksLLM is not an observability platform, but we do provide integration with Datadog so you can get more insights regarding what is going on inside the proxy. Compared to other tools in the LLMOps space, I believe that BricksLLM has the most comprehensive features when it comes to access control. Let me know what you guys think. https://ift.tt/7mKEhnf December 12, 2023 at 12:26AM
Show HN: I built an OSS alternative to Azure OpenAI services https://ift.tt/aDMFl75
Related Articles
Show HN: K8s Cross-Cluster Application Migration in 1 Minute or Less https://ift.tt/3d5qCM6Show HN: K8s Cross-Cluster Application Migration in 1 Minute or Less h… Read More
Show HN: I made a documentation websites builder on top of Notion https://ift.tt/2ZURMlUShow HN: I made a documentation websites builder on top of Notion http… Read More
Show HN: EveryoneDraw – An infinite collaborative drawing canvas https://ift.tt/3G6XvUUShow HN: EveryoneDraw – An infinite collaborative drawing canvas https… Read More
Show HN: Data algebra, going back to Codd's relational operators https://ift.tt/3G5RJmEShow HN: Data algebra, going back to Codd's relational operators https… Read More
Show HN: Box – Data Transformation Pipelines in Rust DataFusion https://ift.tt/3rpKFNsShow HN: Box – Data Transformation Pipelines in Rust DataFusion https:… Read More
Show HN: Pyrite – a WebRTC client for the Galène videoconference server https://ift.tt/3dfyn1RShow HN: Pyrite – a WebRTC client for the Galène videoconference serve… Read More
Show HN: My first article: SSO using Flask and selenium https://ift.tt/3xx1PK3Show HN: My first article: SSO using Flask and selenium https://papyru… Read More
Show HN: Radar Chat – Leave Digital Messages in the Physical World https://ift.tt/3FVKq0AShow HN: Radar Chat – Leave Digital Messages in the Physical World htt… Read More
0 Comments: