AI/ML Engineer — Generative AI
Bitwise
Remote, India
Built a ticket-processing simulation environment to train an RL agent over6actions (pick, wait, escalate, submit, call tool A, call tool B), using preference-based reward modeling from pairwise trajectory judgments by a local LLM judge (Llama 3.2 1B via Ollama); monitored policy learning, reward trends, and episode behavior withTensorBoardand Weights & Biasesto validate training stability for a prototype internal tool-use platform. Reduced per-workflow LLM spend by27%and improved pass@k by12%across7recurring tasks by redesigning prompt pipelines using preprocessing, structured context packing, and gated prompt variants to increase information density and reduce input tokens. Distilled teacher LLMs (DeepSeekV3.2, Sonnet 4.5) into smaller64K-contextstudent models (Llama 3.2, GLM 4.7, Qwen 3.5) for converting large XML-defined enterprise workflows into distributed processing code and job specifica- tions; generated SFT data from teacher outputs and trained with Unsloth on96 GBVRAM, reducing inference cost by53%while improving successful conversion from45% to 75%at near-teacher quality. Benchmarked4-bit/8-bitweight quantization and KV-cache quantization with bitsandbytes on student models, reducing VRAM requirements from78 GBto36 GB/51 GBfor weight-only runs and to28 GB/42 GBwith quantized cache; documented quality regressions and ruled out lossy configurations before deployment.
Ola
Bangalore, India
Automated reconciliation for20M+ INR/month transaction flows, eliminating99.4%of manual checks and recov- ering10M INRin leakages within2quarters. Re-architected Credit Line data pipeline to reduce SLA from26h to 11h (–58%), improving on-time disbursal rate from72% to 94%. Built discrepancy detection dashboards that cut monthly audit time by43%and surfaced7high-severity issues in the first60days.
Bachelor of Technology
Generated500+3D lattice variants and parallelized runs to cut end-to-end runtime by90% (from 10h to 1h)on standard lab hardware. Eliminated data loss incidents (from3/month to 0) by adding state checkpointing and resumable execution, in- creasing long-run completion rate to99%+.
Completed over450eval tasks with99%acceptance and less than2%revisions over rolling60days. Raised rubric adherence from ̃88% to ̃96%via label normalization and edge-case notes (–35%reviewer comments). Reduced task turnaround from ̃14 min to ̃9 min (–36%)by introducing reusable rationale templates, standard- ized checklists, and keyboard macros, increasing weekly output from ̃80 to ̃125items.