Context in LLMs: What Determines It, What It Costs, and What Actually Works
A comprehensive deep dive into how LLMs reason over long-horizon tasks, the mechanics behind context length (positional encodings, attention cost, KV cache), and why smart agents with surgical retrieval beat brute-force long context windows.
GPU Fundamentals & LLM Inference Mental Models
Build intuition for LLM inference from first principles: GPU architecture (A100), the roofline model, memory estimation, arithmetic intensity, and latency.
Serving LLMs with vLLM on RunPod: A Complete Guide
A deep dive into self-hosting LLMs using vLLM on RunPod. Covers the full architecture from GPU containers to API endpoints, explains PagedAttention and continuous batching, and includes benchmark results with cost analysis.
From Scratch Implementation of ResShift Paper for Image Super-Resolution
A deep dive into implementing the ResShift paper from scratch for efficient diffusion-based image super-resolution. Learn about U-Net architecture with Swin Transformer blocks, residual shifting mechanisms, and building state-of-the-art image enhancement models.
From Scratch Implementation of RNN, LSTM and BiLSTM: What I Learned
Exploring the inner workings of Recurrent Neural Networks by implementing RNN, LSTM, and Bidirectional LSTM from scratch. Covers forward propagation, backpropagation through time (BPTT), and key insights from building these architectures.
ML Training Optimization: FLOPs, Profiling, and Learning Strategies
A comprehensive guide to optimizing machine learning training, covering computational constraints, performance profiling, and learning strategies that can save significant costs and time.
From Quantization to Inference: Beginners Guide for Practical Finetuning
A beginner-friendly guide that bridges the gap between quantization and inference, providing practical insights into fine-tuning techniques.
Building GPT from First Principles: Code and Intuition
An intuitive and code-driven exploration of building GPT models from scratch, unraveling the principles behind their architecture.
Understanding Quantization in Deep Learning
A comprehensive guide to memory optimization in deep learning, focusing on quantization techniques and their practical implementation in modern neural networks.
A Guide to Fine-tuning Methods in LLMs (Part 1)
A deep dive into modern fine-tuning techniques for Large Language Models, exploring methods like LoRA, QLoRA, and their practical implementations.