Balaji.dev

Loading

Available for opportunities

Hi, I'm Balaji Koneti - Machine Learning Engineer

ML Engineer · GenAI/RAG · Production Systems

6+ years shipping production retrieval & evaluation systems on AWS. Measurable gains in relevance, latency, and cost through evaluation-driven ML.

Balaji Koneti - Machine Learning Engineer specializing in GenAI and RAG
Scroll
About Me

Shipping production ML
with measurable impact.

I'm Balaji Koneti, a Machine Learning Engineer (GenAI/RAG) with 6+ years in software engineering, shipping production retrieval and evaluation systems on AWS.

I've delivered measurable gains in retrieval relevance (+22% P@5 on 450 enterprise queries), latency (P95 1.3s to 640ms), and LLM cost (-31%) through evaluation-driven iteration, scalable inference microservices, and reliability guardrails.

I hold a Master's in Computer Sciencefrom Northern Arizona University and served as a Graduate TA for Machine Learning, leading model explainability reviews using SHAP & LIME across 12 projects and 30+ models.

ML Engineer (GenAI/RAG)6+ Years Software EngineeringProduction RAG SystemsAWS Certified ML SpecialtyEvaluation-Driven DevelopmentLangChain & pgvectorRAGAS & LLM-as-JudgeFastAPI Microservices

+0%

Retrieval Relevance (P@5)

0% → +22%

1300ms

P95 End-to-End Latency

1.3s → 640ms

-0%

LLM Cost per Request

Saved 31% per request

0+

Years Engineering

And counting…

Technical Skills

The stack behind
production ML systems.

From retrieval pipelines to evaluation frameworks - the tools I use to build, ship, and measure ML at scale.

Programming Languages

PythonJavaSQL

GenAI / RAG

LangChainLlamaIndexSemantic ChunkingRecursive ChunkingEmbeddingsVector SearchpgvectorHybrid Retrieval

LLMs

OpenAI GPT-4/4oAnthropic ClaudePrompt EngineeringToken OptimizationDynamic Model Routing

Machine Learning

NLPSupervised LearningFeature EngineeringModel EvaluationError AnalysisXGBoostscikit-learn

Evaluation & Observability

RAGASLLM-as-JudgeHuman-in-the-Loop (Label Studio)MLflowWeights & BiasesSHAPLIME

ML Serving & APIs

FastAPIInference MicroservicesREST APIsCircuit BreakersRedis CachingHealth Checks

Cloud & Infrastructure

AWS (EC2, S3, Lambda, SageMaker)DockerTerraformCI/CD

Data Engineering

PostgreSQLETL PipelinesData ValidationSQL TuningSchema DesignIndexing & Partitioning
Education

Academic foundations
that shaped my craft.

M.S. in Computer Science with ML specialization, backed by strong CS fundamentals from India.

Northern Arizona University logo

Master of Science in Computer Science

Northern Arizona University

Jan 2024 – May 2025Flagstaff, AZ
  • Graduate Teaching Assistant for Machine Learning - led explainability reviews across 12 projects and 30+ models
  • Deep focus on ML evaluation, model interpretability, and NLP/LLM systems
  • Built reusable SHAP/LIME notebooks adopted as course-wide standard
AI/ML FocusGraduate TAModel ExplainabilitySHAP & LIME
Jawaharlal Nehru Technological University logo

Bachelor of Technology in Computer Science

Jawaharlal Nehru Technological University

Jun 2016 – Nov 2020Tirupati, AP, India
  • Strong foundation in data structures, algorithms, and systems programming
  • Active in hackathons and technical competitions
  • Executive Body Member of Computer Society of India (CSI)
CS FundamentalsData StructuresAlgorithmsCSI Member
Experience

6+ years shipping
production systems.

From production RAG services to fraud detection pipelines - every role measured by real impact.

Current
N

Machine Learning Engineer

Nordstrom

Jun 2025 – PresentPlano, TX

Leading design and rollout of production RAG services, cutting P95 latency from 1.3s to 640ms, improving retrieval relevance by 22% (P@5), and reducing LLM spend by 31% per request.

  • Led design and rollout of production RAG services (LangChain & pgvector), improving retrieval relevance by 22% (Precision@5) on ~450 real enterprise queries via semantic/recursive chunking and hybrid retrieval tuning.
  • Cut P95 end-to-end latency from 1.3s to 640ms by separating embedding + generation services, adding Redis caching keyed by (query, tenant, filters), and batching embedding calls.
  • Reduced LLM spend by 31% per request by enforcing token budgets, prompt compression, and dynamic routing (retrieval-only / low-risk flows) to smaller models without degrading answer quality.
  • Built evaluation & regression pipeline combining Label Studio human review with LLM-as-judge (RAGAS & custom GPT graders) to catch faithfulness/relevance regressions; operationalized as a release gate.
  • Productionized inference service (FastAPI & AWS) with health checks, circuit breakers, and explicit abstain/empty-retrieval handling - improved reliability and lowered hallucination in low-confidence scenarios.
PythonFastAPILangChainpgvectorAWSDockerTerraformRAGAS
Northern Arizona University logo

Graduate Teaching Assistant - Machine Learning

Northern Arizona University

Jun 2024 – May 2025Flagstaff, AZ

Led model explainability reviews using SHAP & LIME across 12 projects / 45 students / 30+ models, building reusable interpretability notebooks that reduced time-to-diagnose by 40%.

  • Led model explainability reviews using SHAP and LIME across 12 projects / 45 students / 30+ models, identifying leakage and bias patterns and converting findings into concrete feature redesign and retraining recommendations.
  • Built reusable interpretability notebooks and visualization templates adopted in course workflows, reducing time-to-diagnose model issues by 40% and improving clarity for non-ML stakeholders.
  • Coached teams on evaluation hygiene (proper splits, leakage checks, error analysis) and improved project report quality, reducing rework cycles by 2 iterations.
PythonSHAPLIMEJupyterscikit-learnMentoring
Infosys Ltd logo

Software Engineer - Machine Learning Systems

Infosys Ltd

Mar 2022 – Dec 2023Bangalore, India

Built fraud-detection pipelines over 1M+ transactional records, improved ETL performance by 43%, and integrated ML outputs into backend services through resilient API patterns.

  • Built and operationalized fraud-detection pipelines over 1M+ transactional records, delivering model features and real-time inference via REST API for XGBoost models - enabled faster detection and reduced fraud losses.
  • Improved ETL performance 43% through SQL tuning (query refactors, indexing, partitioning) and automated data-quality validation for null spikes and schema drift.
  • Integrated ML outputs into backend services through resilient API patterns (timeouts, retries, structured errors), improving reliability and reducing failed inference runs by 15%.
PythonSQLXGBoostREST APIsETLPostgreSQLAWS S3
Nerds and Geeks Pvt Ltd logo

Software Engineer

Nerds and Geeks Pvt Ltd

Dec 2019 – Feb 2022Bangalore, India

Owned backend API development, cutting P95 query latency from 450ms to 120ms and automating reporting workflows from 4 hours to 20 minutes.

  • Owned backend API development for data-driven applications; improved system reliability by adding input validation, structured logging, and consistent error semantics - reduced production defects by 20%.
  • Optimized database schema and SQL queries (indexes, query rewrites), cutting P95 query latency 450ms to 120ms and enabling 35% higher throughput under load.
  • Automated reporting/data workflows using Python and SQL, reducing manual reporting from 4 hours to 20 minutes and improving data consistency via repeatable pipelines and sanity checks.
PythonSQLREST APIsPostgreSQLFastAPIBackend
Certifications

Validated expertise,
industry recognized.

Professional certifications backing hands-on ML engineering experience.

Get in Touch

Let's build something
extraordinary together.

I'm always open to exciting collaborations, new roles, and conversations about AI, machine learning, and the future of technology.