Projects | Akhil Shekkari

Built from Scratch

Agents, reasoning models, inference engines, and apps built from the ground up

AI Agent Framework

PythonOpenAIMCPPydanticLiteLLMFastAPI

New

Complex real-world tasks need multi-step reasoning that single LLM calls can't handle — they need tool use, memory, and the ability to decompose problems.

Designed a multi-step reasoning loop with OpenAI function calling, MCP integration for external tool servers, Pydantic schemas for structured outputs, and sliding-window memory with compaction for long conversations.

Evaluated on GAIA benchmark · Web chat interface · Extensible tool registration

Try It Live View Code

Building a Reasoning LLM from Scratch

PyTorchGRPOSymPyHugging FaceCUDAW&B

New

Models like DeepSeek-R1 and OpenAI o1 show that reasoning ability emerges through deliberate post-training, but these pipelines are proprietary and poorly documented.

Implemented the complete post-training pipeline to turn a base LLM (Qwen-2.5) into a reasoning model. Built a custom inference engine with KV-cache and temperature/top-p sampling. Created a math evaluation harness with SymPy-based symbolic equivalence checking. Implemented inference-time scaling (chain-of-thought, self-consistency via majority voting, self-refinement with log-probability scoring). Trained the model using GRPO reinforcement learning from scratch: rollout sampling, group-relative advantage normalization, clipped policy gradient with KL penalty.

Evaluated on GSM8K & MATH benchmarks · GRPO from raw PyTorch (no TRL) · Runs on single GPU

View Code

Mini Inference Engine

PythonPyTorchCUDAKV CacheQuantization

New

Commercial inference engines like vLLM and TGI are powerful but opaque — understanding how LLM serving actually works requires building one from the ground up.

Built a mini inference engine from scratch covering the full serving stack: tokenization, KV-cache management, batching strategies, and basic quantization. Designed to demystify what happens between receiving a prompt and streaming back tokens.

End-to-end from-scratch implementation · KV-cache & batching · Deep understanding of LLM serving internals

View Code

AI Resume Analyzer

OpenAIEmbeddingsGradioPython

A fun weekend project. Upload a resume and a job description, get an embedding-based match score and LLM-generated feedback on what to improve.

Try It Live View Code

Finetuning Open-Source Models

Took base open-source models and LoRA fine-tuned them for chat and tool use

Qwen 3B — Chat & Function Calling Finetuning

PyTorchLoRAUnslothHugging FaceQLoRA

New

Base language models can generate text but lack the ability to hold conversations or call external tools — both critical for production AI applications.

Took the base Qwen 3B model and LoRA fine-tuned it in two stages: first on conversational data to give it chat capabilities, then on function-calling datasets to teach it structured tool use. Used QLoRA for memory-efficient training on consumer GPUs.

Base → Chat → Function Calling pipeline · LoRA adapters for each stage · Runs on single GPU

Try It Live View Code

Paper Implementations

Implemented knowledge distillation and diffusion-based super-resolution from research papers

Code Reviewer (Knowledge Distillation)

PyTorchLoRADistillationDockerClearML

Manual code review is slow and inconsistent. Large models can review code but are too expensive and heavy for production deployment.

LoRA fine-tuned a 220M parameter Microsoft model on 150k code samples, then distilled it down to 80M parameters. Containerized with Docker, tracked experiments with ClearML.

60% cost reduction via distillation · Production-ready containerized APIs · ClearML versioning

View Code

Image Super Resolution (ResShift)

PyTorchDiffusion ModelsSwin TransformerU-Net

Standard diffusion-based super-resolution requires hundreds of denoising steps, making it too slow for practical use. The ResShift paper offers an efficient alternative.

Implemented from scratch: U-Net with 4-stage encoder-decoder and Swin Transformer bottleneck, residual shifting mechanism reducing steps to just 15, sinusoidal time conditioning, trained on DIV2K dataset.

Competitive PSNR/SSIM/LPIPS · 15 diffusion steps · Full from-scratch implementation

Try It Live View Code