Hire Me
About Experience Education Skills Projects Contact

Illinois Institute of Technology · Class of 2026

YachiDarji.

MS Data Science, Illinois Tech  ·  Chicago  ·  Looking for what's next

Scroll
Looking for full-time roles · Starting 2026

YachiDarji.

Data Scientist · ML Engineer · Chicago, IL

I build ML systems that actually make it to production. Not "works on my machine" — real pipelines, real users, real results. I always track whether something actually worked.

0
Projects
shipped
0
Years
in ML
0
Records
processed
0
Cloud
certs

Open to work

RoleData Scientist · ML Engineer
LocationChicago, IL · Remote OK
VisaOPT STEM authorized
AvailableRight now
Actively interviewing
Scroll
01About
About Me

Hi, I'm Yachi —
nice to meet you.

I'm a data scientist based in Chicago. I spent the last three years figuring out that building a good model is honestly the easier part — getting it into production, getting people to trust it, and proving it actually changed something is where most projects fall apart. That's the part I care about most.

I just graduated with my MS in Data Science from Illinois Tech. Before that I was a co-op at Labelmaster building demand forecasting systems, worked on churn prediction at August Infotech, B2B lead scoring at Orion Technolabs, and won a track at HackIllinois with Caterpillar. In every role I stayed involved from the first meeting to the final dashboard — I don't like handing things off mid-flight.

Outside of work I've tutored 50+ students in Python through Superprof, which turns out to be one of the best ways to actually understand something. I'm also a McKinsey Forward alumna and have three cloud certifications I actually use.

I'm currently looking for full-time data science or ML engineering roles. I'm on OPT STEM so I can start right away.

📍 Chicago, IL 🎓 Illinois Tech MS 2026 🛂 OPT STEM authorized ☁️ 3 Cloud Certs 🏆 McKinsey Forward
31.5%
less forecasting error
at Labelmaster
9.4%
more conversions
from a single feature
0.92
RAG faithfulness score
(out of 1.0)
25M+
records I've run
through live pipelines

My Process

How I build

STEP 01

Frame

From vague asks to measurable bets.

I don't start modeling until I can write the decision this will change in one sentence. Business question first, dataset second. If the success metric isn't defined before I open a notebook, I define it — even if that means pushing back on the brief.

Rule: No model without a defined success metric.

STEP 02

Prove

From model to rigorous evidence.

37-fold rolling-origin backtesting. 8-week A/B tests with proper holdouts. RAGAS evaluation at 0.92 faithfulness. I don't call it working until numbers say so — and when XGBoost outperformed LSTM across all departments, I documented it and changed the model. That's engineering, not ego.

↓ 31.5% WAPE  ·  0.92 RAGAS  ·  1,000-run A/A validation

STEP 03

Ship

From notebook to production trust.

MLflow tracking, FastAPI endpoints, CloudWatch monitoring, schema-validated outputs. The model is the last mile — the infrastructure is the real product. I don't call it done until it's measured in production and someone outside the data team is using it to make decisions.

↑ 9.4% conversion lift · Deployed & measured · Zero silent failures.
02Experience
Career

Where I've Shipped

Labelmaster
Jan 2026 — May 2026
Illinois Tech
Aug 2025 — May 2026
August Infotech
Jan 2024 — Jul 2024
Orion Technolabs
Jun 2022 — May 2023

Data Science Co-op

Labelmaster · Chicago, IL

Jan 2026 — May 2026

Architected an end-to-end monthly sales forecasting system spanning 8+ departments. Transitioned from recursive to direct multi-output LSTM, eliminating error compounding from sequential predictions. Designed bias-correction layers and built a rigorous rolling-origin backtesting framework (37 folds) that revealed XGBoost outperforming LSTM across all departments — a finding that informed final model selection and saved weeks of tuning the wrong architecture.

↓ 31.5% WAPE 8+ Departments 37-Fold Backtesting MLflow Tracking XGBoost · LSTM

Graduate Teaching Assistant

Illinois Institute of Technology · Chicago, IL · CS487 Software Engineering

Aug 2025 — May 2026

Supporting 50+ students through weekly office hours, live sessions, and rubric design for research framework papers. Designed assessment criteria that improved student clarity on software design principles by standardizing evaluation across multiple project types.

50+ Students Curriculum Design CS487

Data Scientist Intern

August Infotech · Surat, India

Jan 2024 — Jul 2024

Built a production churn prediction and expansion-likelihood platform for a B2B SaaS client on AWS. Engineered RFM-style features from 500K+ event logs, trained XGBoost achieving 0.81 AUC-ROC (vs 0.68 logistic baseline), and deployed batch + real-time scoring via Lambda. The key deliverable wasn't the model — it was the 8-week A/B test that proved model-driven onboarding actually moved revenue metrics at the company level.

↑ 9.4% Conversion 0.81 AUC-ROC ↓ 14.7% Churn AWS Lambda Real-time Scoring

Machine Learning Intern

Orion Technolabs · Ahmedabad, India

Jun 2022 — May 2023

Built a B2B lead scoring system that improved top-decile conversion by 11–15% over the existing rule-based approach (AUC-ROC: 0.77–0.80). Compared logistic regression, random forest, XGBoost, and MLP before settling on XGBoost for its explainability tradeoff. Deployed via batch scoring on EC2 and a Flask REST API, integrating hot/warm/cold priority bands directly into the CRM so sales teams saw the output without touching the model.

↑ 15% Conversion 0.80 AUC-ROC CRM Integration Flask REST API EC2 Deployment
03Education
Academic Background

Where I Learned to Build

🎓

Master of Science, Data Science

Illinois Institute of Technology

Aug 2024 — May 2026 · Chicago, IL

Focus: Applied ML · NLP · Agentic AI · Federated Learning · Statistical Experimentation

GPA 3.72 / 4.0

Google Developer Student Club

📘

Bachelor of Engineering, Information Technology

Gujarat Technological University

2020 — 2024 · India

Foundation in computer science, algorithms, data structures, and software engineering principles.

GPA 3.8 / 4.0

Coding Club

Credentials

Certifications & Awards

🏆 SSIP Hackathon 2022 — Winner 🎖 AICTE National Scholarship (3×) 🏆 HackIllinois 2026 — Caterpillar Track 🎓 Python Tutor · Superprof — 4.9★ · 50+ Students
04Skills
Tools & Technologies

What I Work With

Hover any card to see what's inside.

🐍
Languages & Data
6 core tools
Hover to explore →
Languages & Data
PythonSQLRPandasNumPyPySpark
🤖
Machine Learning
13 frameworks
Hover to explore →
ML & Statistics
PyTorchTensorFlowScikit-learnXGBoostLightGBMLSTMProphetSHAPA/B TestingCUPEDBayesian MethodsCausal InferenceFeature Eng.
GenAI & Agents
14 tools
Hover to explore →
GenAI & Agentic AI
LangGraphLangChainLlamaIndexRAGCrewAILoRA/QLoRARAGASPrompt Eng.OpenAI APIGemini APIAnthropic APIOllamaReActMemory Systems
🗄️
Vector & Graph DB
7 databases
Hover to explore →
Databases & Search
PineconeChromaDBWeaviateNeo4jPostgreSQLMongoDBMySQL
☁️
Cloud & MLOps
11 services
Hover to explore →
Cloud & Engineering
AWS SageMakerEC2 / LambdaS3 / RDSDockerKubernetesMLflowFastAPIFlaskAirflowCI/CDCloudWatch
📊
Visualization & BI
6 tools
Hover to explore →
Visualization
TableauPower BIPlotly/DashStreamlitMatplotlibSeaborn
05Portfolio
Selected Work

Projects

Flagship AI systems and ML projects built end-to-end

10 Technologies 0.92 RAGAS

WanderMind AI

Multi-Agent RAG Platform · LangGraph · Neo4j · 3 Vector DBs

Adaptive RAG with fine-tuned Mistral 7B query router (94% accuracy, 60ms latency). Triple-layer constitutional validation. BERT personality classifier (91% accuracy) across 7 dimensions. Faithfulness: 0.76 → 0.92.

LangGraphLlamaIndexNeo4jChromaDBMistral 7BFastAPIDocker
7 Technologies 🏆 Hackathon

CatSense

Multimodal AI Inspection · HackIllinois 2026 · Caterpillar Track

Full-stack multimodal assistant with photo + voice input, RAG-grounded reasoning via Actian VectorAI, triple-layer hallucination control, schema-validated JSON (Zod) for deterministic UI. Deployed to Cloudflare Workers.

ReactTypeScriptGemini 2.5 FlashFastAPIZodCloudflare
6 Technologies ↓ 38% SE

PulseBoard AI

A/B Testing & Causal Inference Platform · Bayesian + Frequentist

CUPED variance reduction (38% SE decrease), auto-diagnostics for 8 pathologies, sequential testing. Causal toolkit: DiD, Synthetic Controls, RDD, IV. Analysis time: ~2 hours → under 10 minutes.

PyMCPlotly/DashCUPEDBayesian StatsCausal Inference
4 Technologies 52% vs 28%

TrustWeight

Asynchronous Federated Learning · PyTorch · ResNet-18

Momentum-based gradient projection filtering harmful stale gradients. Quality-aware aggregation with freshness functions. 52–53% accuracy on CIFAR-10 vs 11–28% for FedAsync/FedBuff baselines under 20–50% delayed clients.

PyTorchFederated LearningResNet-18Distributed Systems
4 Technologies Agentic AI

Crypto Market Analyst Agent

LangGraph · GPT-4 · Autonomous Tool Selection

Autonomous LangGraph agent with GPT-4 reasoning and graph-based workflow. 3 custom tools for live price, volume, and sentiment data. Thread-based memory for cross-query context. Production safeguards and graceful fallbacks.

LangGraphOpenAI GPT-4StreamlitAgentic AI
6 Technologies R² 0.87

UK Real Estate Intelligence

25M+ Records · XGBoost · SHAP · K-means

End-to-end ML pipeline over 25M+ transactions. Ensemble models (RF, XGBoost) with SHAP analysis, K-means clustering producing 5 buyer segments (silhouette 0.68). RMSE £42K, R² 0.87. Spark-compatible for scale.

XGBoostSHAPK-meansStreamlitSQLPySpark
06Contact
Get in Touch

Let's Connect

Open to full-time Data Scientist, ML Engineer, AI/GenAI Engineer roles starting 2026. Work-authorized on OPT STEM extension. If you're solving hard problems with AI — and you care whether the solution actually works in production — I want to hear about it.

Available · Full-time · 2026 · OPT STEM authorized