Work | Prateek Kumar Goel

Currently building

Most AI coding agents fail the same way: they hallucinate project conventions, ignore existing abstractions, and produce code that passes tests but breaks the codebase's internal contract. Setu sits in front of the agent and prevents this before generation.

It uses a JIT Context Engine to surface the most relevant files, patterns, and constraints at inference time, and a DAG Swarm to parallelise multi-file reasoning across the project graph.

TypeScriptJIT Context EngineDAG SwarmLLM Agents

github.com/pkgprateek/setu-opencode

RAG benchmarks are broken. Most evaluate one configuration in isolation, obscuring what actually drives quality.

RAG Arena runs configurations head-to-head on the same queries using an LLM-as-judge panel for faithfulness, relevance, and groundedness. Pluggable retrievers, pluggable corpora, live leaderboard.

PythonFastAPIChromaDBLLM-as-judge

github.com/pkgprateek/rag-arena-2026

Past work

Led the redesign of the distributed training stack for multi-node GPU clusters. The bottleneck was inter-node communication during backward passes.

The fix combined FSDP sharding, NCCL tuning, and FP16 → FP8 mixed precision. Wall-time dropped 3.2× on the same hardware.

↗ 3.2× training speedup ↗ Multi-node A100 clusters

PyTorchFSDPNCCLSlurmFP8CUDA

The checkout flow was timing out at ~3k RPS. Rewrote the core API in Hono, moved product reads to a Redis-backed edge cache, and decoupled the recommendation service into an async fire-and-forget.

Peak throughput reached 12k RPS. Cart abandonment dropped 65%.

↗ 12k RPS peak ↗ 65% abandonment reduction

HonoTypeScriptRedisPostgreSQL