Looking for full-time roles · Starting 2026

YachiDarji.

Data Scientist · ML Engineer · Chicago, IL

I build ML systems that actually make it to production. Not "works on my machine" — real pipelines, real users, real results. I always track whether something actually worked.

See the Work Let's Talk

Projects
shipped

Years
in ML

Records
processed

Cloud
certs

Open to work

RoleData Scientist · ML Engineer

LocationChicago, IL · Remote OK

VisaOPT STEM authorized

AvailableRight now

Actively interviewing

Scroll

01About

About Me

Hi, I'm Yachi —
nice to meet you.

I'm a data scientist based in Chicago. I spent the last three years figuring out that building a good model is honestly the easier part — getting it into production, getting people to trust it, and proving it actually changed something is where most projects fall apart. That's the part I care about most.

I just graduated with my MS in Data Science from Illinois Tech. Before that I was a co-op at Labelmaster building demand forecasting systems, worked on churn prediction at August Infotech, B2B lead scoring at Orion Technolabs, and won a track at HackIllinois with Caterpillar. In every role I stayed involved from the first meeting to the final dashboard — I don't like handing things off mid-flight.

Outside of work I've tutored 50+ students in Python through Superprof, which turns out to be one of the best ways to actually understand something. I'm also a McKinsey Forward alumna and have three cloud certifications I actually use.

I'm currently looking for full-time data science or ML engineering roles. I'm on OPT STEM so I can start right away.

📍 Chicago, IL 🎓 Illinois Tech MS 2026 🛂 OPT STEM authorized ☁️ 3 Cloud Certs 🏆 McKinsey Forward

31.5%

less forecasting error
at Labelmaster

9.4%

more conversions
from a single feature

0.92

RAG faithfulness score
(out of 1.0)

25M+

records I've run
through live pipelines

My Process

How I build

STEP 01

Frame

From vague asks to measurable bets.

I don't start modeling until I can write the decision this will change in one sentence. Business question first, dataset second. If the success metric isn't defined before I open a notebook, I define it — even if that means pushing back on the brief.

Rule: No model without a defined success metric.

STEP 02

Prove

From model to rigorous evidence.

37-fold rolling-origin backtesting. 8-week A/B tests with proper holdouts. RAGAS evaluation at 0.92 faithfulness. I don't call it working until numbers say so — and when XGBoost outperformed LSTM across all departments, I documented it and changed the model. That's engineering, not ego.

↓ 31.5% WAPE · 0.92 RAGAS · 1,000-run A/A validation

STEP 03

Ship

From notebook to production trust.

MLflow tracking, FastAPI endpoints, CloudWatch monitoring, schema-validated outputs. The model is the last mile — the infrastructure is the real product. I don't call it done until it's measured in production and someone outside the data team is using it to make decisions.

↑ 9.4% conversion lift · Deployed & measured · Zero silent failures.

02Experience

Career

Where I've Shipped

Labelmaster

Jan 2026 — May 2026

Illinois Tech

Aug 2025 — May 2026

August Infotech

Jan 2024 — Jul 2024

Orion Technolabs

Jun 2022 — May 2023

Data Science Co-op

Labelmaster · Chicago, IL

Jan 2026 — May 2026

Architected an end-to-end monthly sales forecasting system spanning 8+ departments. Transitioned from recursive to direct multi-output LSTM, eliminating error compounding from sequential predictions. Designed bias-correction layers and built a rigorous rolling-origin backtesting framework (37 folds) that revealed XGBoost outperforming LSTM across all departments — a finding that informed final model selection and saved weeks of tuning the wrong architecture.

↓ 31.5% WAPE 8+ Departments 37-Fold Backtesting MLflow Tracking XGBoost · LSTM

Graduate Teaching Assistant

Illinois Institute of Technology · Chicago, IL · CS487 Software Engineering

Aug 2025 — May 2026

Supporting 50+ students through weekly office hours, live sessions, and rubric design for research framework papers. Designed assessment criteria that improved student clarity on software design principles by standardizing evaluation across multiple project types.

50+ Students Curriculum Design CS487

Data Scientist Intern

August Infotech · Surat, India

Jan 2024 — Jul 2024

Built a production churn prediction and expansion-likelihood platform for a B2B SaaS client on AWS. Engineered RFM-style features from 500K+ event logs, trained XGBoost achieving 0.81 AUC-ROC (vs 0.68 logistic baseline), and deployed batch + real-time scoring via Lambda. The key deliverable wasn't the model — it was the 8-week A/B test that proved model-driven onboarding actually moved revenue metrics at the company level.

↑ 9.4% Conversion 0.81 AUC-ROC ↓ 14.7% Churn AWS Lambda Real-time Scoring

Machine Learning Intern

Orion Technolabs · Ahmedabad, India

Jun 2022 — May 2023

Built a B2B lead scoring system that improved top-decile conversion by 11–15% over the existing rule-based approach (AUC-ROC: 0.77–0.80). Compared logistic regression, random forest, XGBoost, and MLP before settling on XGBoost for its explainability tradeoff. Deployed via batch scoring on EC2 and a Flask REST API, integrating hot/warm/cold priority bands directly into the CRM so sales teams saw the output without touching the model.

↑ 15% Conversion 0.80 AUC-ROC CRM Integration Flask REST API EC2 Deployment

03Education

Academic Background

Where I Learned to Build

🎓

Master of Science, Data Science

Illinois Institute of Technology

Aug 2024 — May 2026 · Chicago, IL

Focus: Applied ML · NLP · Agentic AI · Federated Learning · Statistical Experimentation

GPA 3.72 / 4.0

Google Developer Student Club

📘

Bachelor of Engineering, Information Technology

Gujarat Technological University

2020 — 2024 · India

Foundation in computer science, algorithms, data structures, and software engineering principles.

GPA 3.8 / 4.0

Coding Club

Credentials

Certifications & Awards

ML Foundations Training

Amazon Web Services · Mar 2025

✓ Verified Credly

Data Analysis Using Python

IBM · Jun 2023

✓ Verified Credly

Data Visualization Using Python

IBM · Jun 2023

✓ Verified Credly

ML with Python: Foundations

LinkedIn Learning · Mar 2023

✓ Certificate

McKinsey Forward Program

McKinsey & Company

✓ Alumna

Google Cloud Skills

Google Qwiklabs · Since 2021

✓ 13 Badges · View Profile

🏆 SSIP Hackathon 2022 — Winner 🎖 AICTE National Scholarship (3×) 🏆 HackIllinois 2026 — Caterpillar Track 🎓 Python Tutor · Superprof — 4.9★ · 50+ Students

04Skills

Tools & Technologies

What I Work With

Hover any card to see what's inside.

🐍

Languages & Data

6 core tools

Hover to explore →

Languages & Data

PythonSQLRPandasNumPyPySpark

🤖

Machine Learning

13 frameworks

Hover to explore →

ML & Statistics

PyTorchTensorFlowScikit-learnXGBoostLightGBMLSTMProphetSHAPA/B TestingCUPEDBayesian MethodsCausal InferenceFeature Eng.

⚡

GenAI & Agents

14 tools

Hover to explore →

GenAI & Agentic AI

LangGraphLangChainLlamaIndexRAGCrewAILoRA/QLoRARAGASPrompt Eng.OpenAI APIGemini APIAnthropic APIOllamaReActMemory Systems

🗄️

Vector & Graph DB

7 databases

Hover to explore →

Databases & Search

PineconeChromaDBWeaviateNeo4jPostgreSQLMongoDBMySQL

☁️

Cloud & MLOps

11 services

Hover to explore →

Cloud & Engineering

AWS SageMakerEC2 / LambdaS3 / RDSDockerKubernetesMLflowFastAPIFlaskAirflowCI/CDCloudWatch

📊

Visualization & BI

6 tools

Hover to explore →

Visualization

TableauPower BIPlotly/DashStreamlitMatplotlibSeaborn

05Portfolio

Selected Work

Projects

Flagship AI systems and ML projects built end-to-end

10 Technologies 0.92 RAGAS

WanderMind AI

Multi-Agent RAG Platform · LangGraph · Neo4j · 3 Vector DBs

Adaptive RAG with fine-tuned Mistral 7B query router (94% accuracy, 60ms latency). Triple-layer constitutional validation. BERT personality classifier (91% accuracy) across 7 dimensions. Faithfulness: 0.76 → 0.92.

LangGraphLlamaIndexNeo4jChromaDBMistral 7BFastAPIDocker

7 Technologies 🏆 Hackathon

CatSense

Multimodal AI Inspection · HackIllinois 2026 · Caterpillar Track

Full-stack multimodal assistant with photo + voice input, RAG-grounded reasoning via Actian VectorAI, triple-layer hallucination control, schema-validated JSON (Zod) for deterministic UI. Deployed to Cloudflare Workers.

ReactTypeScriptGemini 2.5 FlashFastAPIZodCloudflare

6 Technologies ↓ 38% SE

PulseBoard AI

A/B Testing & Causal Inference Platform · Bayesian + Frequentist

CUPED variance reduction (38% SE decrease), auto-diagnostics for 8 pathologies, sequential testing. Causal toolkit: DiD, Synthetic Controls, RDD, IV. Analysis time: ~2 hours → under 10 minutes.

PyMCPlotly/DashCUPEDBayesian StatsCausal Inference

4 Technologies 52% vs 28%

TrustWeight

Asynchronous Federated Learning · PyTorch · ResNet-18

Momentum-based gradient projection filtering harmful stale gradients. Quality-aware aggregation with freshness functions. 52–53% accuracy on CIFAR-10 vs 11–28% for FedAsync/FedBuff baselines under 20–50% delayed clients.

PyTorchFederated LearningResNet-18Distributed Systems

4 Technologies Agentic AI

Crypto Market Analyst Agent

LangGraph · GPT-4 · Autonomous Tool Selection

Autonomous LangGraph agent with GPT-4 reasoning and graph-based workflow. 3 custom tools for live price, volume, and sentiment data. Thread-based memory for cross-query context. Production safeguards and graceful fallbacks.

LangGraphOpenAI GPT-4StreamlitAgentic AI

6 Technologies R² 0.87

UK Real Estate Intelligence

25M+ Records · XGBoost · SHAP · K-means

End-to-end ML pipeline over 25M+ transactions. Ensemble models (RF, XGBoost) with SHAP analysis, K-means clustering producing 5 buyer segments (silhouette 0.68). RMSE £42K, R² 0.87. Spark-compatible for scale.

XGBoostSHAPK-meansStreamlitSQLPySpark

View Full Archive on GitHub

YachiDarji.

YachiDarji.

Hi, I'm Yachi —nice to meet you.

How I build

Where I've Shipped

Where I Learned to Build

Certifications & Awards

What I Work With

Projects

Let's Connect

Hi, I'm Yachi —
nice to meet you.