Blog | Akhil Shekkari

Context in LLMs: What Determines It, What It Costs, and What Actually Works

Feb 6th, 2026 · 25 min read

A comprehensive deep dive into how LLMs reason over long-horizon tasks, the mechanics behind context length (positional encodings, attention cost, KV cache), and why smart agents with surgical retrieval beat brute-force long context windows.

LLMs Long Context Attention Reasoning

GPU Fundamentals & LLM Inference Mental Models

Summary · 20 min read

Build intuition for LLM inference from first principles: GPU architecture (A100), the roofline model, memory estimation, arithmetic intensity, and latency.

GPU LLMs Inference Optimization

Serving LLMs with vLLM on RunPod: A Complete Guide

Feb 3rd, 2026 · 12 min read

A deep dive into self-hosting LLMs using vLLM on RunPod. Covers the full architecture from GPU containers to API endpoints, explains PagedAttention and continuous batching, and includes benchmark results with cost analysis.

LLMs vLLM Inference RunPod

From Scratch Implementation of ResShift Paper for Image Super-Resolution

Coming Soon · Diffusion Models · Computer Vision

A deep dive into implementing the ResShift paper from scratch for efficient diffusion-based image super-resolution. Learn about U-Net architecture with Swin Transformer blocks, residual shifting mechanisms, and building state-of-the-art image enhancement models.

Diffusion Models Computer Vision Deep Learning From Scratch

From Scratch Implementation of RNN, LSTM and BiLSTM: What I Learned

Coming Soon · Neural Networks · Sequence Modeling

Exploring the inner workings of Recurrent Neural Networks by implementing RNN, LSTM, and Bidirectional LSTM from scratch. Covers forward propagation, backpropagation through time (BPTT), and key insights from building these architectures.

RNN LSTM Neural Networks From Scratch

ML Training Optimization: FLOPs, Profiling, and Learning Strategies

Oct 20th, 2025 · 12 min read

A comprehensive guide to optimizing machine learning training, covering computational constraints, performance profiling, and learning strategies that can save significant costs and time.

Machine Learning Optimization Training

From Quantization to Inference: Beginners Guide for Practical Finetuning

Published in Towards AI · Apr 25th, 2025

A beginner-friendly guide that bridges the gap between quantization and inference, providing practical insights into fine-tuning techniques.

Quantization Fine-tuning Deep Learning

Building GPT from First Principles: Code and Intuition

Published in Towards AI · Apr 30th, 2025

An intuitive and code-driven exploration of building GPT models from scratch, unraveling the principles behind their architecture.

GPT AI Models Deep Learning

Understanding Quantization in Deep Learning

Mar 13th, 2025 · 15 min read

A comprehensive guide to memory optimization in deep learning, focusing on quantization techniques and their practical implementation in modern neural networks.

Deep Learning Optimization Neural Networks

A Guide to Fine-tuning Methods in LLMs (Part 1)

Mar 17th, 2025 · 20 min read

A deep dive into modern fine-tuning techniques for Large Language Models, exploring methods like LoRA, QLoRA, and their practical implementations.

LLMs Fine-tuning Deep Learning

Under the Hood