Show HN: QwQ-32B APIs – o1 like reasoning at 1% the cost Ubicloud is an open source alternative to AWS. Today, we launched our inference APIs, built with open source AI models. QwQ-32B-Preview is one of those models; and it can provide o1-like reasoning at 1% the cost. QwQ is licensed under Apache 2.0 [1] and Ubicloud under AGPL v3. We deploy open models on a cloud stack that can run anywhere. This allows us to offer great price / performance. From an accuracy standpoint, QwQ does well in math and coding domains. For example, in the MMLU-Pro Computer Science LLM Benchmark, the accuracy rankings are as follows. Claude-3.5 Sonnet (82.5), QwQ-32B-Preview (79.1), and GPT 4o 2024-11-20 (73.1). [2] You can start evaluating QwQ (and Llama 3B / 70B) by logging into the Ubicloud console: https://ift.tt/QIvkqes We also provide an AI chat box for convenience. We price the API endpoints at $0.60 per M tokens, or 100x lower than o1’s output token price. Also, when using open models, your first million tokens each month are free. This way, you can start evaluating these models today. ## OpenAI o1 or QwQ-32B In math and coding benchmarks, QwQ-32B ties with o1 and outperforms Claude 3.5 Sonnet. In our qualitative tests, we found o1 to perform better. For example, we asked both models to “add a pair of parentheses to the incorrect equation: 1 + 2 * 3 + 4 * 5 + 6 * 7 + 8 * 9 = 479, to make the equation true.” [3] QwQ’s answer shows iterative reasoning steps, where the model enumerates over answers using light heuristics. o1’s answer to the same question feels like an iterative deepen-and-test (though not purely depth-first). When we asked the models harder questions, it felt that o1 could understand the question better and employ more complex strategies. [3][4] Finally, we found that o1’s advantage in reasoning compounded with other ones. For example, we asked both models to write example Python programs. Looking at the answers, it became clear that o1 was trained on a larger data set and that it was aware of Python libraries that QwQ-32B didn’t know about. Further, QwQ-32B at times flip flopped between English and Chinese, making it harder for us to understand the model. [3] Now, if we think that o1 has these advantages, why the heck are we doing a Show HN on QwQ-32B (and other open weight models)? Two reasons. First, QwQ is still comparable to o1 and Ubicloud offers it for 100x less. You can employ a dozen QwQ-32Bs, prompt them with different search strategies, use VMs to verify their results, and still come in under what o1 costs. In the short term, combining these classic AI search strategies with AI models feels much more efficient than trying to “teach” an uber AI model. Second, we think open source fosters collaboration and trust -- and that is its superpower that compounds over time. We foresee a future where open source AI not only delivers top-quality results, but also surpasses proprietary models in some areas. If you believe in that future and are looking for someone to partner with on the infrastructure side, please hit us up at info@ubicloud.com! [1] https://ift.tt/Z78A1kH [2] https://ift.tt/C4pn8cL... [3] https://ift.tt/fbOzagW [4] https://ift.tt/DM9ylm5 January 15, 2025 at 08:59PM
Show HN: QwQ-32B APIs – o1 like reasoning at 1% the cost https://ift.tt/my0DGUf
Related Articles
Show HN: Lofimusic.app, an open source Background Music Progressive Web App https://ift.tt/2RsoqDqShow HN: Lofimusic.app, an open source Background Music Progressive We… Read More
Show HN: More HN https://ift.tt/3h95OTsShow HN: More HN https://ift.tt/2R0ZDq1 September 8, 2020 at 06:01AM … Read More
Show HN: My GitHub Readme Is Interactive https://ift.tt/3lMEbDaShow HN: My GitHub Readme Is Interactive https://ift.tt/3lHNbcG Septem… Read More
Show HN: ePaper.js – Easily create an ePaper display using JavaScript and HTML https://ift.tt/3l75l7kShow HN: ePaper.js – Easily create an ePaper display using JavaScript … Read More
Show HN: A CSS file that reshapes the web https://ift.tt/34JdrgRShow HN: A CSS file that reshapes the web https://ift.tt/34wzgxH Augus… Read More
Show HN: OOTB Code-Server, Easiest “VSCode on Browser” + HTTPS and GitHub Auth https://ift.tt/2ZB4bYFShow HN: OOTB Code-Server, Easiest “VSCode on Browser” + HTTPS and Git… Read More
Show HN: Faceoffus.com: easily remove faces from photos before sharing https://ift.tt/3iDQOOmShow HN: Faceoffus.com: easily remove faces from photos before sharing… Read More
Show HN: My recreation of cyberpunk/futuristic UI in rust https://ift.tt/3gJRCQsShow HN: My recreation of cyberpunk/futuristic UI in rust https://ift.… Read More
0 Comments: