Show HN: FP32 matmul of large matrices up to 24% faster than cuBLAS on a 4090 I decided to share a CUDA kernel I wrote over 5 months ago. Nvidia's hardware and software may surprise you. https://ift.tt/3Ram24O August 1, 2024 at 12:09AM
Show HN: FP32 matmul of large matrices up to 24% faster than cuBLAS on a 4090 https://ift.tt/eYpMOTt
Related Articles
Show HN: Make Holiday AI Art https://ift.tt/3szLELSShow HN: Make Holiday AI Art https://roboholiday.com December 25, 2021… Read More
Show HN: NFT Reverse Search Engine https://ift.tt/3erAPD0Show HN: NFT Reverse Search Engine https://bing.ly/ December 27, 2021 … Read More
Show HN: Validate startup idea in creative new way raising domain-name funding https://ift.tt/3ezk0X1Show HN: Validate startup idea in creative new way raising domain-name… Read More
Show HN: A CLI utility automagically converting Unix timestamps https://ift.tt/3Jy0yIjShow HN: A CLI utility automagically converting Unix timestamps https:… Read More
Show HN: Bridge Audio Between Mumble and Discord https://ift.tt/3exnUQcShow HN: Bridge Audio Between Mumble and Discord https://ift.tt/3Hlxja… Read More
Show HN: MLConsole, web app to train ML models (for free, and 100% client-side) https://ift.tt/3HhUP7BShow HN: MLConsole, web app to train ML models (for free, and 100% cli… Read More
Show HN: No Signup, Yet, Authenticated Posts https://ift.tt/3FMLdRXShow HN: No Signup, Yet, Authenticated Posts https://applause.chat/ De… Read More
Show HN: Robo Holiday – create holiday themed AI art using natural language https://ift.tt/3ej5WkpShow HN: Robo Holiday – create holiday themed AI art using natural lan… Read More
0 Comments: