Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090 I was running into an issue with a vLLM bug that affected multiple GPUs and I needed a stand-in while that bug was getting fixed that used the same API format but had better performance than the API on text-generation-webui. It's very rough. I'm not a coder by trade. But it's very fast once you have many simultaneous connections. https://ift.tt/I02aRKr December 27, 2023 at 01:22AM
Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090 https://ift.tt/VMN6S2s
Related Articles
Show HN: Wife couldn't find a dev job so I built a tool to automate the search https://ift.tt/zr61eQGShow HN: Wife couldn't find a dev job so I built a tool to automate th… Read More
Show HN: Print My Drone, catalog of 3D printable drone models https://ift.tt/iU0SKyPShow HN: Print My Drone, catalog of 3D printable drone models https://… Read More
Show HN: React Geiger – performance profiling using sound https://ift.tt/5Vhl8x4Show HN: React Geiger – performance profiling using sound https://ift.… Read More
Show HN: wallstreetlocal – View investments from America's biggest companies https://ift.tt/akXhJGqShow HN: wallstreetlocal – View investments from America's biggest com… Read More
Show HN: Vlite – Lite demo server, inspired by Vite https://ift.tt/PnLjqcCShow HN: Vlite – Lite demo server, inspired by Vite https://ift.tt/ZqA… Read More
Show HN: Timelock.dev – Send a secret into the future using timelock encryption https://ift.tt/0LUJ1jVShow HN: Timelock.dev – Send a secret into the future using timelock e… Read More
Show HN: BashBundle to single .sh. Extract by executing. Or make an installer https://ift.tt/Jtvl0bfShow HN: BashBundle to single .sh. Extract by executing. Or make an in… Read More
Show HN: Manta – A tool for FPGA Debugging and Rapid Prototyping https://ift.tt/1b6nwkdShow HN: Manta – A tool for FPGA Debugging and Rapid Prototyping Hi HN… Read More
0 Comments: