Show HN: The fastest way to run Mixtral 8x7B on Apple Silicon Macs I’d originally launched my app: Private LLM[1][2] on HN around 10 months ago, with a single RedPajama Chat 3B model. The app has come a long way since then. About a month ago, I added support for 4-bit OmniQuant quantized Mixtral 8x7B Instruct model, and it seems to outperform Q4 models at inference speed and Q8 models at text generation quality, while consuming only about 24GB of RAM[3] at 8k context length. The trick is: a) to use a better quantization algorithm and b) to use unquantized embeddings and the MoE gates (the overhead is quite small). Other notable features include many more downloadable models, support for App Intents (Siri, Apple Shortcuts), on-device grammar correction, summarization etc with macOS services and an iOS version (universal app), also with many smaller downloadable models and support for App Intents. There's a small community of users building and sharing LLM based shortcuts on the App's discord. Last week, I also shipped support for the bilingual Yi-34B Chat model, which consumes ~18GB of RAM. iOS users and users with low memory Macs can download the related Yi-6B Chat model. Unlike most popular offline LLM apps out there, this app uses mlc-llm for inference and not llama.cpp. Also, all models in the app are quantized with OmniQuant[4] quantization and not RTN quantization. [1]: https://privatellm.app/ [2]: https://ift.tt/YunixZm [3]: https://www.youtube.com/watch?v=4AE8yXIWSAA [4]: https://ift.tt/7iyemCF April 8, 2024 at 09:37PM
Show HN: The fastest way to run Mixtral 8x7B on Apple Silicon Macs https://ift.tt/8j52txs
Related Articles
Show HN: PalWorld Breeding Calculator https://ift.tt/xid2yW9Show HN: PalWorld Breeding Calculator https://ift.tt/kVosMym January 2… Read More
Show HN: 12-colored visual interactive music theory for pop/rock MIDI (+Github) https://ift.tt/wMSl8FrShow HN: 12-colored visual interactive music theory for pop/rock MIDI … Read More
Show HN: An Amiga Soundtracker Mod / PDF / CSV Polyglot File [pdf] https://ift.tt/g9fYANRShow HN: An Amiga Soundtracker Mod / PDF / CSV Polyglot File [pdf] Ope… Read More
Show HN: Anki/Duolingo like app using Educational YouTube videos https://ift.tt/Dm2BltdShow HN: Anki/Duolingo like app using Educational YouTube videos Hi HN… Read More
Show HN: Pong Wars https://ift.tt/85yjcT4Show HN: Pong Wars https://ift.tt/AhMGPVU January 28, 2024 at 01:49AM … Read More
Show HN: A text-mode periodic table in C for GNU/Linux terminals https://ift.tt/hMf2BjcShow HN: A text-mode periodic table in C for GNU/Linux terminals I'm s… Read More
Show HN: Librarian - Semantic Bookmark Search Using Transformers https://ift.tt/lhgdfFWShow HN: Librarian - Semantic Bookmark Search Using Transformers Searc… Read More
Show HN: Strava-Postgres, load your Strava activities into a PG instance https://ift.tt/gBJHdZNShow HN: Strava-Postgres, load your Strava activities into a PG instan… Read More
0 Comments: