Data Talent Hiring,
Rebuilt for the AI era.

Watch candidates solve real business cases in a notebook with an AI assistant.
See every action and every prompt — learn how they think, and how they work with AI.

Get early access → Are you a candidate? Practice here →

01 — The problem

Your interview process
wasn't built for AI.

Interview before AI

The old data interview is broken

Live SQL coding test

Pasted into ChatGPT in a second tab.
Python & algorithm screen

LeetCode is solved instantly. The job isn't for-loops anymore.
Verbal / whiteboard case

Rehearsed nightly with AI. You hear a script, not reasoning.
Resume & experience dig

Every bullet was rewritten by an LLM.

Interview after AI

Test the actual job

Frame the right question

Open-ended brief, real dataset. Can they scope it and push back?
Focus on judgment & domain expertise

Catch the leaky join, the wrong baseline — or ship it?
Collaborate with the AI assistant

Same notebook and copilots they'll use on day one.

02 — How it works

The same notebook
they'll use on the job.

We capture every action, every line of code, every prompt.

REC · 00:00

Warehouse Fulfillment Delay

Background

ParcelHaus is a mid-sized direct-to-consumer fulfillment company operating four regional warehouses across the US. They fulfill orders for roughly 60 apparel and home-goods brands, with SLAs promising same-day or next-day pick-and-pack for inbound orders.

Situation

Our Q3 fulfillment scorecard came in and average order fulfillment time (order placed → package handed to carrier) is up 18% versus Q2. That's a material miss against our SLA commitments and I'm already getting questions from two of our larger brand partners. I need to understand what's driving this before the QBR next week so we can decide where to invest — more headcount, process changes, or capex on a specific facility. You have the Q3 order-level export from our WMS. Please dig in and come back with a point of view on what's driving the increase and where we should focus the fix.

Your Task

Average order fulfillment time is 18% higher than last quarter — what is driving it and where should we fix?

Stakeholder

VP of Operations, presenting at the QBR next week.

Data Dictionary

order_idid

Unique identifier for each customer order

order_placed_tsdate

Timestamp when the customer order was placed

ship_tsdate

Timestamp when the package was handed to the carrier

warehouse_idcategorical

The fulfillment warehouse that handled the order (4 sites)

brandcategorical

Apparel or home-goods brand the order was placed under

product_categorycategorical

Primary product category for the order

units_in_ordernumeric

Number of individual units in the order

shiftcategorical

Pick-and-pack shift that fulfilled the order (AM / PM / NGT)

carriercategorical

Shipping carrier assigned to the outbound package

fulfillment_hoursnumeric

Hours elapsed between order placement and the package being handed to the carrier (pick-and-pack completion)

Getting Started

Your data is in a database. Use SQL or Python to explore it:

SQL: SELECT * FROM case_data LIMIT 10

Python: df = pd.read_csv('case_data.csv')

See data_dictionary.md for column descriptions.

03 — What we measure

We measure what other
interviews can't see.

A data scientist thinking through charts and questions

What they bring beyond AI

Judgment, framing, and taste.

How they frame an ambiguous problem, what tradeoffs they prioritize, and whether they can tell when an answer is actually useful — not just technically correct.

A data scientist collaborating with an LLM across SQL, notebook, and insights

How they work with AI

Verification, pushback, restraint.

Did they verify the AI's output? Did they catch its mistakes? Did they over-trust it? We watch the full session — every prompt, every accepted suggestion, every override.

A complete evaluation report.

Research-backed evaluation rubrics. Every score traces back to the exact evidence.

Sample Report Template

Workspace / Reviews / Session

Review: Sample Candidate

Sample interview · Assessment submitted Apr 30, 2026, 6:08 PM

Duration 25 min

AI prompts 1

Status Submitted

Final Report

Action Timeline

Case & Results

AI Chat Timeline

Follow-Up Questions

Overall fit score

7.8 / 10

Exceptional

Summary

Role fit Exceptional

Strong fit for the analyst role's core ask — the candidate framed the business question before reaching for tooling and used AI as a focused executor. Worth probing in interview: recommendations stayed at the diagnostic level and didn't push into the operational cutoffs this role owns.

Final Deliverable Above bar

Reaches a defensible, evidence-backed recommendation but stops short of an operational cutoff.

AI Collaboration Exceptional

Delegated one well-scoped task to AI and read the output critically instead of paraphrasing it.

Role-level Expertise Above bar

Candidate-authored work shows solid domain framing; independent statistical depth is good, not standout.

Your decision

Good Fit

Hold

No Fit

Notes Auto-saved · just now

Strong framing — already had the right lens before opening the data. Want to probe whether they can land an operational recommendation before the next round.

Dimensions & evidence

Final Deliverable

Above bar

The notebook reaches a defensible recommendation — a basic lift check kills the initial hypothesis and a controlled model isolates the real driver. The work stops at the diagnostic level; an operational cutoff is left for a follow-up.

Evidence · 3

+
Recommendation is tied to specific model output — names the significant variable and its direction, not a vague summary Section 2 Q1
+
Killed the original hypothesis with a basic lift check before modeling — disciplined sequencing Cell #1
−
No operational threshold proposed — the recommendation stays diagnostic and doesn't reach the cutoff this role owns Section 2 Q1

AI Collaboration

Exceptional

Used AI as a precise executor: one well-scoped prompt with named variables and the exact output wanted, then read the result table critically rather than paraphrasing it.

Evidence · 3

+
A single prompt specified the model, the four named controls, and the output columns wanted — no back-and-forth needed msg #1
+
Cited specific coefficient values from the AI-generated cell as the basis for the conclusion Section 2 Q1
+
Cross-checked the AI output against an earlier hand-written cell before trusting it — convergent evidence Cell #3

Role-level Expertise

Above bar

Candidate-authored work shows solid domain framing — quiz answers name industry factors not visible in the data, and the written findings reflect stakeholder constraints. Independent statistical reasoning is good, not standout.

Evidence · 3

+
Quiz Q1 invokes domain-specific risk factors the dataset alone wouldn't surface Quiz Q1
+
Section 2 frames the finding around operational load and downstream impact — stakeholder-aware Section 2 Q1
−
Did not independently interrogate a borderline non-significant control — accepted the model spec as generated Cell #4

Action Timeline

Every AI prompt, cell run, and edit — in order.