Tags attention1 bottleneck1 Cuda1 cuda1 fine-tuning1 Fine‑Tuning1 gpu4 HyLoRA1 json1 learning-journey2 LLaMA1 llm1 LoRA1 matrix multiplication1 matrix-multiplication2 mistral1 nlp1 optimization4 parallel-computing2 peft1 performance4 Performance Trade‑off1 Perplexity1 phi-41 profiling3 qlora1 shared memory1 shared-memory2 structured-data-extraction1 SVD1 tiling2 transformer1 transformers1 warp-divergence2