About

About

My journey started with a simple question: why are the most elegant deep learning models so computationally inefficient?

To answer this, I first went deep into the fundamentals of Deep Learning and the Linear Algebra that underpins modern architectures like the Transformer. But understanding the ‘why’ wasn’t enough; I needed to understand the ‘how’. This drove me down the stack to the metal of GPU architecture.

To bridge this gap, I taught myself CUDA C++ from first principles, focusing on how fundamental operations are actually executed. My work is now centered on analyzing these core mechanisms and writing custom, I/O-aware kernels for operations like attention and convolution to solve fundamental performance puzzles.


My Core Interests

My focus lies at the intersection of deep learning theory, systems programming, and high-performance computing.

  • 🧠 Deep Learning & Model Optimization: Applying first-principles knowledge to analyze, compress, and prune complex models like Transformers and CNNs.

  • High-Performance Kernel Development: Writing custom C++ and CUDA kernels from scratch to accelerate core computational bottlenecks (e.g., attention, convolution, sparse matrix operations).

  • ⚙️ AI Systems Research/Engineering: Bridging the gap between algorithmic theory and hardware-efficient implementation to build powerful and accessible AI.