Available for opportunities

Hi, I'm Balaji Koneti - Machine Learning Engineer

ML Engineer · GenAI/RAG · Production Systems

6+ years shipping production retrieval & evaluation systems on AWS. Measurable gains in relevance, latency, and cost through evaluation-driven ML.

Download Resume Download CV

Balaji Koneti - Machine Learning Engineer specializing in GenAI and RAG

Scroll

About Me

Shipping production ML
with measurable impact.

I'm Balaji Koneti, a Machine Learning Engineer (GenAI/RAG) with 6+ years in software engineering, shipping production retrieval and evaluation systems on AWS.

I've delivered measurable gains in retrieval relevance (+22% P@5 on 450 enterprise queries), latency (P95 1.3s to 640ms), and LLM cost (-31%) through evaluation-driven iteration, scalable inference microservices, and reliability guardrails.

I hold a Master's in Computer Sciencefrom Northern Arizona University and served as a Graduate TA for Machine Learning, leading model explainability reviews using SHAP & LIME across 12 projects and 30+ models.

ML Engineer (GenAI/RAG)6+ Years Software EngineeringProduction RAG SystemsAWS Certified ML SpecialtyEvaluation-Driven DevelopmentLangChain & pgvectorRAGAS & LLM-as-JudgeFastAPI Microservices

+0%

Retrieval Relevance (P@5)

0% → +22%

1300ms

P95 End-to-End Latency

1.3s → 640ms

-0%

LLM Cost per Request

Saved 31% per request

Years Engineering

And counting…

Technical Skills

The stack behind
production ML systems.

From retrieval pipelines to evaluation frameworks - the tools I use to build, ship, and measure ML at scale.

Programming Languages

PythonJavaSQL

GenAI / RAG

LangChainLlamaIndexSemantic ChunkingRecursive ChunkingEmbeddingsVector SearchpgvectorHybrid Retrieval

LLMs

OpenAI GPT-4/4oAnthropic ClaudePrompt EngineeringToken OptimizationDynamic Model Routing

Machine Learning

NLPSupervised LearningFeature EngineeringModel EvaluationError AnalysisXGBoostscikit-learn

Evaluation & Observability

RAGASLLM-as-JudgeHuman-in-the-Loop (Label Studio)MLflowWeights & BiasesSHAPLIME

ML Serving & APIs

FastAPIInference MicroservicesREST APIsCircuit BreakersRedis CachingHealth Checks

Cloud & Infrastructure

AWS (EC2, S3, Lambda, SageMaker)DockerTerraformCI/CD

Data Engineering

PostgreSQLETL PipelinesData ValidationSQL TuningSchema DesignIndexing & Partitioning

Featured Projects

Built to solve
real-world problems.

End-to-end ML projects with quantified impact - from NLP-powered assistants to security threat detection pipelines.

Intelligent Learning Assistant

Restricting AI Guidance via Prompt Injection Detection

Built an NLP-based learning assistant using BERT + Transformers to guide 200+ students through stepwise learning workflows, improving task accuracy by 35% via structured prompting and intent-aware routing.

+35% task accuracy via structured prompting & intent-aware routing
-30% successful prompt-injection attempts on adversarial test set
+40% improvement in learning outcome evaluation

PythonHugging FaceBERTPyTorchFastAPITransformers

LLM for Security Log Detection

Hybrid LLM + BERT Pipeline for Threat Analysis

Designed a hybrid security log analysis pipeline combining LLMs + BERT + regex rules to detect anomalies and threats early in system logs, generating structured triage outputs with severity, suspected cause, and recommended actions.

Hybrid pipeline: LLM + BERT + regex for multi-layered detection
Structured triage: severity, cause, recommended actions
Web + CLI interfaces for investigation & operational use

PythonLLM APIsBERTTransformersRegexFastAPIStreamlit

Education

Academic foundations
that shaped my craft.

M.S. in Computer Science with ML specialization, backed by strong CS fundamentals from India.

Master of Science in Computer Science

Northern Arizona University

Jan 2024 – May 2025Flagstaff, AZ

Graduate Teaching Assistant for Machine Learning - led explainability reviews across 12 projects and 30+ models
Deep focus on ML evaluation, model interpretability, and NLP/LLM systems
Built reusable SHAP/LIME notebooks adopted as course-wide standard

AI/ML FocusGraduate TAModel ExplainabilitySHAP & LIME

Bachelor of Technology in Computer Science

Jawaharlal Nehru Technological University

Jun 2016 – Nov 2020Tirupati, AP, India

Strong foundation in data structures, algorithms, and systems programming
Active in hackathons and technical competitions
Executive Body Member of Computer Society of India (CSI)

CS FundamentalsData StructuresAlgorithmsCSI Member

Experience

6+ years shipping
production systems.

From production RAG services to fraud detection pipelines - every role measured by real impact.

Current

Machine Learning Engineer

Nordstrom

Jun 2025 – PresentPlano, TX

Leading design and rollout of production RAG services, cutting P95 latency from 1.3s to 640ms, improving retrieval relevance by 22% (P@5), and reducing LLM spend by 31% per request.

Led design and rollout of production RAG services (LangChain & pgvector), improving retrieval relevance by 22% (Precision@5) on ~450 real enterprise queries via semantic/recursive chunking and hybrid retrieval tuning.
Cut P95 end-to-end latency from 1.3s to 640ms by separating embedding + generation services, adding Redis caching keyed by (query, tenant, filters), and batching embedding calls.
Reduced LLM spend by 31% per request by enforcing token budgets, prompt compression, and dynamic routing (retrieval-only / low-risk flows) to smaller models without degrading answer quality.
Built evaluation & regression pipeline combining Label Studio human review with LLM-as-judge (RAGAS & custom GPT graders) to catch faithfulness/relevance regressions; operationalized as a release gate.
Productionized inference service (FastAPI & AWS) with health checks, circuit breakers, and explicit abstain/empty-retrieval handling - improved reliability and lowered hallucination in low-confidence scenarios.

PythonFastAPILangChainpgvectorAWSDockerTerraformRAGAS

Graduate Teaching Assistant - Machine Learning

Northern Arizona University

Jun 2024 – May 2025Flagstaff, AZ

Led model explainability reviews using SHAP & LIME across 12 projects / 45 students / 30+ models, building reusable interpretability notebooks that reduced time-to-diagnose by 40%.

Led model explainability reviews using SHAP and LIME across 12 projects / 45 students / 30+ models, identifying leakage and bias patterns and converting findings into concrete feature redesign and retraining recommendations.
Built reusable interpretability notebooks and visualization templates adopted in course workflows, reducing time-to-diagnose model issues by 40% and improving clarity for non-ML stakeholders.
Coached teams on evaluation hygiene (proper splits, leakage checks, error analysis) and improved project report quality, reducing rework cycles by 2 iterations.

PythonSHAPLIMEJupyterscikit-learnMentoring

Software Engineer - Machine Learning Systems

Infosys Ltd

Mar 2022 – Dec 2023Bangalore, India

Built fraud-detection pipelines over 1M+ transactional records, improved ETL performance by 43%, and integrated ML outputs into backend services through resilient API patterns.

Built and operationalized fraud-detection pipelines over 1M+ transactional records, delivering model features and real-time inference via REST API for XGBoost models - enabled faster detection and reduced fraud losses.
Improved ETL performance 43% through SQL tuning (query refactors, indexing, partitioning) and automated data-quality validation for null spikes and schema drift.
Integrated ML outputs into backend services through resilient API patterns (timeouts, retries, structured errors), improving reliability and reducing failed inference runs by 15%.

PythonSQLXGBoostREST APIsETLPostgreSQLAWS S3

Software Engineer

Nerds and Geeks Pvt Ltd

Dec 2019 – Feb 2022Bangalore, India

Owned backend API development, cutting P95 query latency from 450ms to 120ms and automating reporting workflows from 4 hours to 20 minutes.

Owned backend API development for data-driven applications; improved system reliability by adding input validation, structured logging, and consistent error semantics - reduced production defects by 20%.
Optimized database schema and SQL queries (indexes, query rewrites), cutting P95 query latency 450ms to 120ms and enabling 35% higher throughput under load.
Automated reporting/data workflows using Python and SQL, reducing manual reporting from 4 hours to 20 minutes and improving data consistency via repeatable pipelines and sanity checks.

PythonSQLREST APIsPostgreSQLFastAPIBackend

Certifications

Validated expertise,
industry recognized.

Professional certifications backing hands-on ML engineering experience.

AWS

AWS Certified Machine Learning - Specialty

Amazon Web Services

May 2025

IBM

IBM Machine Learning Specialist - Advanced

IBM

Apr 2025

Deep Learning using TensorFlow

IBM / Coursera

Apr 2025

Get in Touch

Let's build something
extraordinary together.

I'm always open to exciting collaborations, new roles, and conversations about AI, machine learning, and the future of technology.

Send me an email

Drop me a line anytime

Let's connect professionally

GitHub

Check out my code

Hi, I'm Balaji Koneti - Machine Learning Engineer|

Shipping production MLwith measurable impact.

The stack behindproduction ML systems.