Akhil Shekkari

AI Engineer — Agents, Post-Training, Inference

Excited about building agents that can plan, reason, and act across complex multi-step tasks, post-training techniques that unlock new capabilities, and pushing inference to go faster at scale.

Currently: MS in Applied Machine Learning at University of Maryland, College Park
Previously: AI Engineer at Atrium (client: Pfizer) · 3 years building ML systems at Tezo

Featured

GPU Fundamentals & LLM Inference Mental Models

15k+ people reached
on LinkedIn

Read the post

My Best Past Work

Atrium · Pfizer · United States

AI Engineer

Jun 2025 – Aug 2025

Worked at Atrium with Pfizer's AI team to automate how statisticians write analysis plans. Built a RAG pipeline that cut drafting time by 60%, and an LLM-as-a-Judge system that catches hallucinations with 80% precision.

60% faster
drafting

80% hallucination
precision

What my manager said

Tezo · 3 years · India

Machine Learning Engineer

Jul 2021 – Jul 2024

Built a RAG-powered chatbot that let employees search across 1,000+ internal documents. Reduced document lookup time by 60%. Also trained fraud detection models that improved recall by 15%.

1k+ docs
indexed

60% faster
lookups

15% fraud recall
improvement

What my manager said

View Full Timeline

Projects

AI Agents from Scratch

Featured

Designed and implemented a modular AI agent framework in Python, without relying on LangChain, CrewAI, or any existing agent library. Features a think-act reasoning loop that enables LLMs to autonomously chain tool calls across multiple steps to solve complex tasks.

Agent core: Async-first agent loop with Pydantic data models, structured output support, and execution tracing for full observability
Tool system + MCP: Extensible @tool decorator with auto function-to-JSON-schema conversion, MCP client for dynamically loading external tool servers, and human-in-the-loop confirmation for dangerous operations
Memory & context: Hierarchical optimization combining sliding window truncation, tool result compaction, tiktoken counting, and LLM-based summarization to stay within context limits
RAG pipeline: Text chunking, OpenAI embeddings, and cosine-similarity search to compress long search results before feeding them back to the agent
Multi-format files: Unified tool for text, CSV, Excel, PDF, images, and audio with multimodal LLM vision/audio analysis

Python

OpenAI

MCP

Pydantic

LiteLLM

FastAPI

Message Reason Tool Call Callbacks Memory Loop

Building a Reasoning LLM from Scratch

New

Implemented the complete post-training pipeline to turn a base LLM into a reasoning model. Covers inference-time scaling, self-refinement, and reinforcement learning with verifiable rewards (GRPO). No TRL, no alignment libraries. Inspired by DeepSeek-R1.

Inference engine: Custom text generation with KV-cache, temperature/top-p sampling, and torch.compile optimization
Evaluation harness: Math verifier with SymPy symbolic equivalence checking, structured answer extraction, and GSM8K/MATH benchmarking
Inference-time scaling: Chain-of-thought prompting, self-consistency via majority voting (N=10), and systematic temperature analysis
Self-refinement: Iterative generate-score-critique loop using token log-probability confidence scoring
GRPO from scratch: Full RL pipeline: rollout sampling, group-relative advantages, clipped policy gradient with KL penalty, multi-epoch training with checkpointing

Python

PyTorch

HuggingFace

SymPy

W&B

CUDA

Base Model Evaluate CoT + Sampling Self-Refine GRPO Train Reasoning LLM

View All Projects

Read All Posts

Get in Touch

akhil.masters21@gmail.com GitHub LinkedIn

Akhil Shekkari

GPU Fundamentals & LLM Inference Mental Models

My Best Past Work

Projects

AI Agents from Scratch

Building a Reasoning LLM from Scratch

Latest Blog Posts

Improving Context: Reasoning & Long Context in LLMs

GPU Fundamentals & LLM Inference Mental Models

Serving LLMs with vLLM on RunPod: A Complete Guide

Get in Touch