Show HN: The fastest way to run Mixtral 8x7B on Apple Silicon Macs I’d originally launched my app: Private LLM[1][2] on HN around 10 months ago, with a single RedPajama Chat 3B model. The app has come a long way since then. About a month ago, I added support for 4-bit OmniQuant quantized Mixtral 8x7B Instruct model, and it seems to outperform Q4 models at inference speed and Q8 models at text generation quality, while consuming only about 24GB of RAM[3] at 8k context length. The trick is: a) to use a better quantization algorithm and b) to use unquantized embeddings and the MoE gates (the overhead is quite small). Other notable features include many more downloadable models, support for App Intents (Siri, Apple Shortcuts), on-device grammar correction, summarization etc with macOS services and an iOS version (universal app), also with many smaller downloadable models and support for App Intents. There's a small community of users building and sharing LLM based shortcuts on the App's discord. Last week, I also shipped support for the bilingual Yi-34B Chat model, which consumes ~18GB of RAM. iOS users and users with low memory Macs can download the related Yi-6B Chat model. Unlike most popular offline LLM apps out there, this app uses mlc-llm for inference and not llama.cpp. Also, all models in the app are quantized with OmniQuant[4] quantization and not RTN quantization. [1]: https://privatellm.app/ [2]: https://ift.tt/YunixZm [3]: https://www.youtube.com/watch?v=4AE8yXIWSAA [4]: https://ift.tt/7iyemCF April 8, 2024 at 09:37PM
Show HN: The fastest way to run Mixtral 8x7B on Apple Silicon Macs https://ift.tt/8j52txs
Related Articles
Show HN: Professional Headshots Using AI https://ift.tt/GmX07ZPShow HN: Professional Headshots Using AI Hey HN! Launching portraitmak… Read More
Show HN: Freeact – A Lightweight Library for Code-Action Based Agents https://ift.tt/yJIwetkShow HN: Freeact – A Lightweight Library for Code-Action Based Agents … Read More
Show HN: Ultra-portable Gantt chart tool for very regulated environments https://ift.tt/Flq4j2rShow HN: Ultra-portable Gantt chart tool for very regulated environmen… Read More
Show HN: Dribbble for code https://ift.tt/5UtumkdShow HN: Dribbble for code https://ift.tt/wOcUHpm January 12, 2025 at … Read More
Show HN: Never let friends forget who is the winner https://ift.tt/x7A4HNbShow HN: Never let friends forget who is the winner Hi HN, I made a si… Read More
Show HN: Next gen AI workout planner and logger https://ift.tt/ASa9uMJShow HN: Next gen AI workout planner and logger Hey HN! Excited to sha… Read More
Show HN: TLabWebViewVR – Open Source 3D Web Browser Project https://ift.tt/wUmc2SPShow HN: TLabWebViewVR – Open Source 3D Web Browser Project https://if… Read More
Show HN: Bin - AI business intelligence analyst that turns data into dashboards https://ift.tt/smi7YHjShow HN: Bin - AI business intelligence analyst that turns data into d… Read More
0 Comments: