Getting Started
Your data is in a database. Use SQL or Python to explore it:
SQL: SELECT * FROM case_data LIMIT 10
Python: df = pd.read_csv('case_data.csv')
See data_dictionary.md for column descriptions.
Watch candidates solve real business cases in a notebook with an AI assistant.
See every action and every prompt — learn how they think, and how they work with AI.
Take-homes get gamed. Live coding doesn't reflect how the job is actually done.
Phone screens never predicted performance — and now every shortcut is a liability.
SQL, Python, real data, and an AI assistant in the panel.
Candidates work the way they actually work — and we capture every move.
Your data is in a database. Use SQL or Python to explore it:
SQL: SELECT * FROM case_data LIMIT 10
Python: df = pd.read_csv('case_data.csv')
See data_dictionary.md for column descriptions.
Anyone can get the right answer with AI now. The signal is in how they got there
— what they questioned, what they trusted, where they pushed back.
How they frame an ambiguous problem, what tradeoffs they prioritize, and whether they can tell when an answer is actually useful — not just technically correct.
Did they verify the AI's output? Did they catch its mistakes? Did they over-trust it? We watch the full session — every prompt, every accepted suggestion, every override.
We analyze every prompt, every cell, every decision — and we show you all the evidences behind every score.
Sample interview · Assessment submitted Apr 30, 2026, 6:08 PM
Strong fit for the analyst role's core ask: candidate consistently led with a hypothesis before reaching for tooling, mirroring how the team scopes ambiguous business questions day-to-day. Worth probing in interview — recommendations stayed at the diagnostic level and didn't push into the operational tradeoffs this role owns once a quarter.
Candidate brought the right analytical lens to the problem before touching the data. Quiz answers already named the core issue; the IDE work mostly confirmed it.
Strong fluency with the relevant industry context: cited domain-specific risk factors and operational tradeoffs that would not be visible from the data alone. Recommendation reflects awareness of stakeholder constraints, not just statistics.
Candidate used AI as a focused executor after framing the problem on their own. AI was an accelerator for one specific task, not a thinking partner — exactly the pattern we want to see.
Candidate verified the AI's output against their own prior analysis (both pointed the same direction), and read the result table critically — citing specific numbers as the basis for their conclusion rather than paraphrasing the assistant.
Case
A short business framing for the case appears here — who the team is, what they're trying to decide, and the constraints they're operating under. Two or three sentences set the stage without prescribing the answer.
[Framing question 1 goes here — sets up the business hypothesis the candidate needs to push back on.]
[Sample candidate response — typically 2–4 sentences naming the candidate's mental model, the variables they'd reach for, and the assumption they want to test. Long-form free text, no character cap.]
[Framing question 2 goes here — asks the candidate where they'd start the analysis and why.]
[Sample candidate response — describes the first cut of the data they'd run, what they'd compare it against, and which secondary check would either confirm or kill the hypothesis.]
[Framing question 3 goes here — drops a partial statistic on the candidate and asks what they'd interpret and check next.]
[Sample candidate response — names the missing baseline, describes the lift calculation they'd want to do, and adds a sanity-check on a likely confounder.]
SELECT * FROM case_data LIMIT 10
df = pd.read_csv('case_data.csv')
data_dictionary.md for column descriptions.
import pandas as pd df = pd.read_csv('case_data.csv') # Headline numbers outcome_rate = df['target'].mean() segment_share = (df['segment_flag'] == 1).mean() print(f'overall outcome rate: {outcome_rate:.4f}') print(f'segment share overall: {segment_share:.4f}') # Lift: P(segment | event) / P(segment) events = df[df['target'] == 1] seg_among = (events['segment_flag'] == 1).mean() print(f'lift: {seg_among / segment_share:.3f}x')
overall outcome rate: 0.0xxx segment share overall: 0.xxxx lift: 1.0xx (>1 means over-represented)
# Outcome rate stratified by quartile of candidate variable df['var_q'] = pd.qcut(df['candidate_var'], 4, labels=['Q1 low','Q2','Q3','Q4 high']) print(df.groupby('var_q')['target'].mean().round(4))
var_q Q1 low 0.0xxx Q2 0.0xxx Q3 0.0xxx Q4 high 0.xxxx Name: target, dtype: float64
import statsmodels.api as sm features = ['segment_flag', 'candidate_var', 'control_a', 'control_b'] X = sm.add_constant(df[features]) y = df['target'] model = sm.Logit(y, X).fit(disp=False) print(model.summary())
coef OR ci_low ci_high p_value const -x.xxxx 0.0xx 0.0xx 0.0xx 0.000 segment_flag 0.0xxx 1.0xx 0.8xx 1.4xx 0.6xx candidate_var x.xxxx xx.xx xx.xx xxx.xx 0.000 control_a -0.0000 1.000 1.000 1.000 0.3xx control_b 0.0000 1.000 1.000 1.000 0.2xx
[Final-answer prompt goes here — asks the candidate to summarize findings and make a recommendation to a named stakeholder.]
[Sample candidate final answer — leads with the recommendation, then 2–3 supporting bullets, then a confidence note and a proposed validation step.] What I found: — First supporting bullet: the original hypothesis didn't survive the basic lift check. — Second bullet: the real signal sits on a different variable, with a clean monotonic gradient across quartiles. — Third bullet: a multivariate model confirms the direction — the focal variable is non-significant once controls are added; the alternative is strongly significant. Recommendation: ship the alternative routing rule; hold off on the original. Confidence: high on direction, medium on cutoff — next step is a holdout validation on a recent vintage.
[Resume-claim verification question — references a specific result the candidate cited and asks them to walk through how they validated it.]
Resume · prior role[Process-probe question — references a project on the resume and asks how the candidate handled the operational constraints around it.]
Resume · prior project[Critical-eval question — surfaces a specific output the candidate accepted at face value and asks them to interpret it more carefully.]
Cell #4 at min 6[Depth-check question — acknowledges the candidate stopped at a coarse cut and asks how they'd land a specific operational threshold with more time.]
Section 2 Q1[Hold-out-concern question — surfaces a model-quality metric the candidate didn't flag and asks whether it would change their ship/no-ship call.]
Cell #4 at min 6Not LeetCode. Not toy datasets. Every case starts from a real business
question — and tests judgment, framing, and AI collaboration in one sitting.
Talk through a problem. No hands on the data, no AI.
Not just talk. Not another coding test.
A real data project
LeetCode-style algorithm tasks without AI.
AI allowed, the goal is still get the code right.
LitMetrics is the only platform that measures what AI can't replace
— analytical framing, AI collaboration, and defensible judgment.
![]() |
![]() |
|||
|---|---|---|---|---|
| Case content | SQL + Python algorithm tasks | Standardized DS task batteries | Interviewer-brought notebooks or take-homes | Real-world DS cases — messy data, business framing |
| AI policy | Banned or flagged | Limited, discouraged | Up to the interviewer | AI required — full notebook + assistant, same as the actual job |
| What's measured | Code correctness + speed | Benchmarked task performance | Code quality + communication (interviewer-scored) | AI Collaboration + Edge Beyond AI (8 sub-metrics based on research) |
| Workflow realism | Single coding window | Guided single-task workspace | Live notebook + chat | Framing quiz → AI-native IDE → written findings (full loop) |
| Report evidence | Score + a few snippets | Rubric score, percentile | Interviewer notes | Every sub-score cites a specific cell, prompt, or quiz answer |
| Hiring-fit read | Generic percentile | Standardized benchmark | Interviewer judgment | Summary anchored against your JD + hiring priorities |
| Interview prep | None | None | Live-session notes | 5 case + 5 resume follow-up questions, evidence-tagged |
The data job has been quietly rebuilt around AI. Writing code isn't the work anymore — it's framing the right question, judging what the model gives back, and knowing when to push back. That's not on a résumé. You only see it in the work.
We're working closely with hiring managers in our early phase. You can use our cases, use your own case, or we can build customized cases just for you. Free.