1
Analyze
Detects encoding, delimiter, headers, and structural inconsistencies.
csvnorm
Normalize first, explore faster
A CLI built for messy CSVs: uncertain encodings, uneven columns, broken rows. In one command you get a coherent, validated output ready for quick analysis or the next step in your pipeline.
Output
Clean UTF-8
Checks
CSV validation
Report
Clear errors
Quick example
csvnorm data.csv \
| head -20
csvnorm data.csv -o output.csv
csvnorm "https://.../dataset.csv" -o output.csv ✓ Normalize delimiters to comma
✓ Snake_case headers
✓ Convert encoding to UTF-8
✓ Rejected-row report
csvnorm is a terminal command that takes a raw CSV and brings it to a standard form. It is built for the initial exploration stage: understand the dataset, find issues, and get a consistent file.
It is not a full ETL or a heavy cleaning tool. It is step zero to make a CSV readable, reliable, and ready for analysis with other tools.
See the transformation in action. Raw CSV on the left, normalized on the right.
Input file
MessyCity Name;Total Population;Year
London;8900000;2023
Berlin;3700000;2023
Zürich;421000;2023
Madrid;3300000;2023 ✗ Delimiter: ;
✗ Encoding: Latin-1 (mojibake)
✗ Headers: mixed case, spaces
Output file
Cleancity_name,total_population,year
London,8900000,2023
Berlin,3700000,2023
Zürich,421000,2023
Madrid,3300000,2023 ✓ Delimiter: ,
✓ Encoding: UTF-8 (readable)
✓ Headers: snake_case
When csvnorm finds malformed rows, it captures them in a detailed reject file using DuckDB's validation engine. Each error is documented with file, line, column, and error type.
Example: output_reject_errors.csv
Auto-generated| line | column_name | value | error_type |
|---|---|---|---|
| 45 | total_population | N/A | MISSING_COLUMNS |
| 67 | city_name | Paris"extra | UNQUOTED_VALUE |
| 103 | city_name | Rome;Milan | TOO_MANY_COLUMNS |
✓ Each malformed row documented with line number
✓ Error type classified by DuckDB validation
✓ Original problematic value preserved for review
CSVs from different sources waste time: mixed delimiters, wrong encodings, broken rows, uneven headers. csvnorm removes friction and lets you start with a clean file.
Uses DuckDB to check structure and bad rows.
Standard delimiter and normalized snake_case headers.
Stats on rows, columns, sizes, and errors.
Process CSV over HTTP without downloading first.
Promise
One command to move from "uncertain file" to "ready file".
csvnorm is non-destructive: it never overwrites the input file and keeps you in control of the output.
1
Detects encoding, delimiter, headers, and structural inconsistencies.
2
Converts to UTF-8, standardizes separators, and cleans column names.
3
Shows a summary with rows, columns, sizes, and rejected-row files.
csvnorm writes to stdout by default, so it plugs into other tools. If you want a file, just set the output path.
csvnorm input.csv
csvnorm input.csv -o output.csv
csvnorm input.csv | csvstat
csvnorm "https://.../dataset.csv" -o output.csv Recommended with uv:
uv tool install csvnorm Or with pip:
pip install csvnorm