csvnorm

Normalize first, explore faster

Csv | Clean | Ready

csvnorm prepares CSV files for a first-pass exploration without surprises.

A CLI built for messy CSVs: uncertain encodings, uneven columns, broken rows. In one command you get a coherent, validated output ready for quick analysis or the next step in your pipeline.

Output

Clean UTF-8

Checks

CSV validation

Report

Clear errors

Quick example

csvnorm data.csv \
  | head -20

csvnorm data.csv -o output.csv
csvnorm "https://.../dataset.csv" -o output.csv

✓ Normalize delimiters to comma

✓ Snake_case headers

✓ Convert encoding to UTF-8

✓ Rejected-row report

What is csvnorm

csvnorm is a terminal command that takes a raw CSV and brings it to a standard form. It is built for the initial exploration stage: understand the dataset, find issues, and get a consistent file.

It is not a full ETL or a heavy cleaning tool. It is step zero to make a CSV readable, reliable, and ready for analysis with other tools.

Before and after

See the transformation in action. Raw CSV on the left, normalized on the right.

Input file

Messy
City Name;Total Population;Year
London;8900000;2023
Berlin;3700000;2023
Zürich;421000;2023
Madrid;3300000;2023

✗ Delimiter: ;

✗ Encoding: Latin-1 (mojibake)

✗ Headers: mixed case, spaces

Output file

Clean
city_name,total_population,year
London,8900000,2023
Berlin,3700000,2023
Zürich,421000,2023
Madrid,3300000,2023

✓ Delimiter: ,

✓ Encoding: UTF-8 (readable)

✓ Headers: snake_case

Error tracking

When csvnorm finds malformed rows, it captures them in a detailed reject file using DuckDB's validation engine. Each error is documented with file, line, column, and error type.

Example: output_reject_errors.csv

Auto-generated
line column_name value error_type
45 total_population N/A MISSING_COLUMNS
67 city_name Paris"extra UNQUOTED_VALUE
103 city_name Rome;Milan TOO_MANY_COLUMNS

✓ Each malformed row documented with line number

✓ Error type classified by DuckDB validation

✓ Original problematic value preserved for review

Why use it

CSVs from different sources waste time: mixed delimiters, wrong encodings, broken rows, uneven headers. csvnorm removes friction and lets you start with a clean file.

Real validation

Uses DuckDB to check structure and bad rows.

Consistent output

Standard delimiter and normalized snake_case headers.

Detailed report

Stats on rows, columns, sizes, and errors.

Remote URLs

Process CSV over HTTP without downloading first.

Promise

One command to move from "uncertain file" to "ready file".

csvnorm is non-destructive: it never overwrites the input file and keeps you in control of the output.

How it works

1

Analyze

Detects encoding, delimiter, headers, and structural inconsistencies.

2

Normalize

Converts to UTF-8, standardizes separators, and cleans column names.

3

Report

Shows a summary with rows, columns, sizes, and rejected-row files.

Use it now

csvnorm writes to stdout by default, so it plugs into other tools. If you want a file, just set the output path.

csvnorm input.csv
csvnorm input.csv -o output.csv
csvnorm input.csv | csvstat
csvnorm "https://.../dataset.csv" -o output.csv

Quick install

Recommended with uv:

uv tool install csvnorm

Or with pip:

pip install csvnorm