Show HN: FP32 matmul of large matrices up to 24% faster than cuBLAS on a 4090 I decided to share a CUDA kernel I wrote over 5 months ago. Nvidia's hardware and software may surprise you. https://ift.tt/3Ram24O August 1, 2024 at 12:09AM
Show HN: FP32 matmul of large matrices up to 24% faster than cuBLAS on a 4090 https://ift.tt/eYpMOTt
Related Articles
Show HN: Heyya v1.0.0 Elixir and Phoenix LiveView Snapshot Testing Library https://ift.tt/5hQdcryShow HN: Heyya v1.0.0 Elixir and Phoenix LiveView Snapshot Testing Lib… Read More
Show HN: I made a tool to easily transform and manipulate your JSON data https://ift.tt/my9hZ5gShow HN: I made a tool to easily transform and manipulate your JSON da… Read More
Show HN: FastHTML, a new Python-based system for writing web applications https://ift.tt/Jv1f6dBShow HN: FastHTML, a new Python-based system for writing web applicati… Read More
Show HN: ChainFactory – Run Structured LLM Inference with Easy Parallelism https://ift.tt/mJSBAOlShow HN: ChainFactory – Run Structured LLM Inference with Easy Paralle… Read More
Show HN: How I wrote a LaTeX paper without writing any LaTeX https://ift.tt/ntSoJRXShow HN: How I wrote a LaTeX paper without writing any LaTeX Stempad i… Read More
Show HN: Run Llama 3.1 8B in the browser https://ift.tt/6PER93UShow HN: Run Llama 3.1 8B in the browser https://app.wiz.chat July 29,… Read More
Show HN: ThinkPost – split-panel note taking & brainstorming app for devs https://ift.tt/d3fUbDzShow HN: ThinkPost – split-panel note taking & brainstorming app f… Read More
Show HN: Chrome Extension to Open Google Maps Locations in Apple Maps https://ift.tt/Khl6TiZShow HN: Chrome Extension to Open Google Maps Locations in Apple Maps … Read More
0 Comments: