fake data. done right.

because your real data is a disaster anyway

your real data
is lying to you

You've seen it. A column called email containing "test@test.com", "N/A", and a phone number. A date field with 1900-01-01. An age column that thinks someone is 847 years old. A country column that contains 73 different spellings of "Italy".

Real data is not "raw" — it's rotten. And the worst part? You can't share it. It's sensitive, it's GDPR-locked, it's "confidential", it lives in a database only accessible from the office VPN on Tuesdays.

☠ real data

  • null values where there shouldn't be any
  • emails like "aaa@bbb" or just "no"
  • ages of 0, 999, or -3
  • duplicate IDs that are "definitely unique"
  • dates from 1900 and 2099 mixed together
  • GDPR-protected — can't share with the team
  • requires a VPN, two approvals and a sacrifice
  • breaks your pipeline in a new way every Monday

✓ fauxdata

  • zero nulls — unless you asked for them
  • emails that look like actual emails
  • ages between 18 and 90, as specified
  • IDs that are actually unique
  • dates strictly within your range
  • shareable, reproducible, seedable
  • runs in milliseconds, from any machine
  • validated before it even reaches your pipeline

define once.
generate forever.

Write a YAML schema. Run one command. Get a perfect, validated, reproducible dataset — in any format, any size, any locale.

schema-first

One YAML file defines everything: column types, ranges, presets, and validation rules. The schema is both the blueprint and the contract.

locale-aware

Set locale: IT and get Italian names, cities, email domains, IBANs, and phone formats — all coherent within each row. Works for 100+ countries.

validated by design

The same schema that defines generation also drives validation. Run --validate and know your data is correct before it touches your pipeline.

reproducible

Set seed: 42 and generate the exact same dataset every time. Share the schema, share the seed, share the data.

pipeline-friendly

Use --out - to pipe data directly to stdout. No files, no noise — just clean data flowing through your tools.

multi-format

CSV, Parquet, JSON, JSONL — the dataset, not the tool, decides the format. Switch with one flag.

pointblank 0.22+ polars python 3.11+ 100+ locales CI-friendly

YAML so readable
your PM could write it

A schema is a plain YAML file. It describes the structure of your dataset, the constraints for each column, and the validation rules to apply. One file. Everything in it.

# schemas/people.yml name: people rows: 1000 seed: 42 # reproducible — same seed, same data locale: IT # Italian names, cities, emails, IBANs output: format: csv # csv | parquet | json | jsonl path: people.csv columns: id: type: int unique: true min: 1 max: 99999 name: type: string preset: name # → "Giulia Ferretti", "Marco Rossi" email: type: string preset: email # → "g.ferretti@virgilio.it" age: type: int min: 18 max: 90 status: type: string values: [active, inactive, pending] signup_date: type: date min: "2020-01-01" max: "2024-12-31" validation: - rule: col_vals_not_null columns: [id, name, email] - rule: col_vals_between column: age min: 18 max: 90 - rule: rows_distinct columns: [id]

Available presets: name · email · phone_number · city · country_code_2 · company · job · address · postcode · ipv4 · uuid4 · iban · url · user_name · sentence · word and more.

four commands.
one tool.

command what it does
fauxdata generate SCHEMA Generate a dataset from a YAML schema. Options: --rows, --format, --seed, --out - (stdout), --validate
fauxdata validate DATASET SCHEMA Validate an existing file against a schema. Exits with code 1 on failure — CI-ready.
fauxdata preview DATASET Show the first N rows and column statistics (type, nulls, unique, min/max).
fauxdata init [--name] Interactive wizard to create a new schema template.

up and running
in 30 seconds.

uv tool install fauxdata-cli
fauxdata --help

uv installs fauxdata as an isolated tool available from any directory — no virtualenv activation needed.

pip install fauxdata-cli
fauxdata --help