Deep Learning 2 My Journey Optimizing Attention: Why My First CUDA Optimization Barely Worked Aug 5, 2025 GPUs Are Lazy: Why Your Matrix Multiplication Is Wasting 37% Memory. Jul 27, 2025