Archives
- 22 Oct From Confusion to Colors: My Journey Learning Parallel Image Magic with CUDA Threads
- 14 Aug From Matrix Multiplication to Warp Optimizations — My Journey and Insights.
- 05 Aug My Journey Optimizing Attention: Why My First CUDA Optimization Barely Worked
- 27 Jul How HyLoRA Squeezed TinyLlama by 51% Without Killing Performance.
- 27 Jul GPUs Are Lazy: Why Your Matrix Multiplication Is Wasting 37% Memory.
- 15 Jun AI Action Item Extractor: Meeting Dialogue to JSON.