CUDA 4
- From Confusion to Colors: My Journey Learning Parallel Image Magic with CUDA Threads
- From Matrix Multiplication to Warp Optimizations — My Journey and Insights.
- My Journey Optimizing Attention: Why My First CUDA Optimization Barely Worked
- GPUs Are Lazy: Why Your Matrix Multiplication Is Wasting 37% Memory.