Skip to content

Examples gallery

Two ways to start: example specs (hand-written, in the repo) and built-in templates (shipped with the package, one command to instantiate).

Example specs

Runnable specs in examples/. Each is part of the determinism gate, so every one regenerates byte-identically.

Spec Demonstrates
tabular-basic Plain distribution-only tabular data — the smallest useful spec.
causal-fraud A causal DAG (age → income → is_fraud, education → income) with a logistic target.
failure-fraud The fraud DAG plus six stacked failure modes; ships clean + injected variants.
difficulty-credit A credit-default label calibrated to the advanced difficulty band, with a latent score.
people-realistic Realistic-but-seeded text fields (names, emails, addresses) via mimesis providers.
timeseries-sensor An additive time-series feature: trend + seasonality + AR(p) + noise.

Run any of them:

datadoom run examples/causal-fraud.datadoom.yaml --seed 42 --out ./out
datadoom verify examples/causal-fraud.datadoom.yaml --seed 42 --against ./out

Built-in templates

Start from a curated domain spec with one command (or one click in the web Templates gallery):

datadoom template list                # all templates
datadoom template list --level hackathon
datadoom template use fraud-detection --out my.datadoom.yaml

Hackathon challenges

Enterprise-grade ML challenges built by composition — deep causal DAGs, latent confounders, mixed feature types, stacked data-quality failures, and calibrated difficulty bands. Each carries a meta.challenge brief.

Template Domain Highlights
credit-default-challenge Finance Deep DAG → latent risk score → default; advanced band; MNAR/MAR/drift/leakage/label-noise.
clinical-deterioration Healthcare Hidden-confounder design — latent severity drives vitals and outcome; advanced band.
predictive-maintenance Industrial IoT Three additive time-series sensors → latent wear → label; drift/noise/MCAR/leakage.
telecom-churn-challenge Telecom Realistic-text identity + latent dissatisfaction → churn; hard kaggle band; MNAR.

Starter templates

Template Domain Highlights
fraud-detection Finance Causal structure + failure injection.
customer-churn SaaS Difficulty targeting + a latent feature.
hospital-readmission Healthcare Causal structure + a latent feature.
ecommerce-orders E-commerce Quick distribution-only table.
iot-sensors IoT Sensor readings.
people-directory People Realistic text via mimesis providers.
marketing-ab-test Marketing A/B test design.
insurance-claims Insurance A Pareto heavy-tailed claim amount.