Hi, I'm Balaji Koneti - Machine Learning Engineer
ML Engineer · GenAI/RAG · Production Systems
6+ years shipping production retrieval & evaluation systems on AWS. Measurable gains in relevance, latency, and cost through evaluation-driven ML.

Shipping production ML
with measurable impact.
I'm Balaji Koneti, a Machine Learning Engineer (GenAI/RAG) with 6+ years in software engineering, shipping production retrieval and evaluation systems on AWS.
I've delivered measurable gains in retrieval relevance (+22% P@5 on 450 enterprise queries), latency (P95 1.3s to 640ms), and LLM cost (-31%) through evaluation-driven iteration, scalable inference microservices, and reliability guardrails.
I hold a Master's in Computer Sciencefrom Northern Arizona University and served as a Graduate TA for Machine Learning, leading model explainability reviews using SHAP & LIME across 12 projects and 30+ models.
+0%
Retrieval Relevance (P@5)
0% → +22%
1300ms
P95 End-to-End Latency
1.3s → 640ms
-0%
LLM Cost per Request
Saved 31% per request
0+
Years Engineering
And counting…
The stack behind
production ML systems.
From retrieval pipelines to evaluation frameworks - the tools I use to build, ship, and measure ML at scale.
Programming Languages
GenAI / RAG
LLMs
Machine Learning
Evaluation & Observability
ML Serving & APIs
Cloud & Infrastructure
Data Engineering
Built to solve
real-world problems.
End-to-end ML projects with quantified impact - from NLP-powered assistants to security threat detection pipelines.
Academic foundations
that shaped my craft.
M.S. in Computer Science with ML specialization, backed by strong CS fundamentals from India.
Master of Science in Computer Science
Northern Arizona University
- Graduate Teaching Assistant for Machine Learning - led explainability reviews across 12 projects and 30+ models
- Deep focus on ML evaluation, model interpretability, and NLP/LLM systems
- Built reusable SHAP/LIME notebooks adopted as course-wide standard
Bachelor of Technology in Computer Science
Jawaharlal Nehru Technological University
- Strong foundation in data structures, algorithms, and systems programming
- Active in hackathons and technical competitions
- Executive Body Member of Computer Society of India (CSI)
6+ years shipping
production systems.
From production RAG services to fraud detection pipelines - every role measured by real impact.
Machine Learning Engineer
Nordstrom
Leading design and rollout of production RAG services, cutting P95 latency from 1.3s to 640ms, improving retrieval relevance by 22% (P@5), and reducing LLM spend by 31% per request.
- Led design and rollout of production RAG services (LangChain & pgvector), improving retrieval relevance by 22% (Precision@5) on ~450 real enterprise queries via semantic/recursive chunking and hybrid retrieval tuning.
- Cut P95 end-to-end latency from 1.3s to 640ms by separating embedding + generation services, adding Redis caching keyed by (query, tenant, filters), and batching embedding calls.
- Reduced LLM spend by 31% per request by enforcing token budgets, prompt compression, and dynamic routing (retrieval-only / low-risk flows) to smaller models without degrading answer quality.
- Built evaluation & regression pipeline combining Label Studio human review with LLM-as-judge (RAGAS & custom GPT graders) to catch faithfulness/relevance regressions; operationalized as a release gate.
- Productionized inference service (FastAPI & AWS) with health checks, circuit breakers, and explicit abstain/empty-retrieval handling - improved reliability and lowered hallucination in low-confidence scenarios.
Graduate Teaching Assistant - Machine Learning
Northern Arizona University
Led model explainability reviews using SHAP & LIME across 12 projects / 45 students / 30+ models, building reusable interpretability notebooks that reduced time-to-diagnose by 40%.
- Led model explainability reviews using SHAP and LIME across 12 projects / 45 students / 30+ models, identifying leakage and bias patterns and converting findings into concrete feature redesign and retraining recommendations.
- Built reusable interpretability notebooks and visualization templates adopted in course workflows, reducing time-to-diagnose model issues by 40% and improving clarity for non-ML stakeholders.
- Coached teams on evaluation hygiene (proper splits, leakage checks, error analysis) and improved project report quality, reducing rework cycles by 2 iterations.
Software Engineer - Machine Learning Systems
Infosys Ltd
Built fraud-detection pipelines over 1M+ transactional records, improved ETL performance by 43%, and integrated ML outputs into backend services through resilient API patterns.
- Built and operationalized fraud-detection pipelines over 1M+ transactional records, delivering model features and real-time inference via REST API for XGBoost models - enabled faster detection and reduced fraud losses.
- Improved ETL performance 43% through SQL tuning (query refactors, indexing, partitioning) and automated data-quality validation for null spikes and schema drift.
- Integrated ML outputs into backend services through resilient API patterns (timeouts, retries, structured errors), improving reliability and reducing failed inference runs by 15%.
Software Engineer
Nerds and Geeks Pvt Ltd
Owned backend API development, cutting P95 query latency from 450ms to 120ms and automating reporting workflows from 4 hours to 20 minutes.
- Owned backend API development for data-driven applications; improved system reliability by adding input validation, structured logging, and consistent error semantics - reduced production defects by 20%.
- Optimized database schema and SQL queries (indexes, query rewrites), cutting P95 query latency 450ms to 120ms and enabling 35% higher throughput under load.
- Automated reporting/data workflows using Python and SQL, reducing manual reporting from 4 hours to 20 minutes and improving data consistency via repeatable pipelines and sanity checks.
Validated expertise,
industry recognized.
Professional certifications backing hands-on ML engineering experience.
Let's build something
extraordinary together.
I'm always open to exciting collaborations, new roles, and conversations about AI, machine learning, and the future of technology.