scrape-cli
Extract HTML elements from the command line
using CSS or XPath
Pipe-friendly. Simple. Powerful.
Why scrape-cli?
Built for the terminal. Designed for pipelines.
Simple
One flag to extract, one to wrap. No boilerplate. No config files.
CSS & XPath
Use the selector language you already know. Switch anytime, same result.
Pipeline-friendly
Reads stdin, writes stdout. Composes naturally with curl, jq, xq.
LLM-ready
-t flag extracts clean text, perfect for AI pipelines.
How it works
CSS selectors and XPath — same result, your choice
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \ | scrape -be 'table.wikitable > tbody > tr > td > b > a'
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \ | scrape -be "//table[contains(@class, 'wikitable')]/tbody/tr/td/b/a"
scrape -e "table.data-table td" resources/test.html
scrape -e "//table[contains(@class, 'data-table')]//td" resources/test.html
scrape -e "a.external-link" -a href resources/test.html
scrape -e "//a[contains(@class, 'external-link')]/@href" resources/test.html
Key flags
-e
CSS selector or XPath expression
-b
Wrap output in html/head/body
-t
Extract plain text only
-a ATTR
Extract attribute value
--check-existence
Exit 0 if found, 1 if not
Installation
Get started in seconds
$ pipx install scrape-cli
$ uv tool install scrape-cli
$ pip install scrape-cli
Python ≥ 3.6 · requires: requests, lxml, cssselect
Practical examples
Real-world use cases straight from the terminal
Extract & convert to JSON
Pipe to xq for structured output
scrape -be "a.external-link" resources/test.html | xq .
Requires xq (kislyuk/yq) for XML/HTML-to-JSON conversion.
{
"html": {
"body": {
"a": {
"@href": "https://example.com",
"@class": "external-link",
"#text": "Example Link"
}
}
}
}
Extract text for LLMs
Clean plain text, no HTML tags
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \ | scrape -te 'table.wikitable td'
The -t flag strips HTML tags, excludes <script> and <style>, and cleans up whitespace — ideal for feeding content into an LLM or text pipeline.
Check element existence
Scriptable exit codes for automation
scrape -e "#main-title" --check-existence resources/test.html
Pipeline with curl
Scrape live web content instantly
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \ | scrape -be 'table.wikitable > tbody > tr > td > b > a'
<a href="/wiki/Afghanistan">Afghanistan</a> <a href="/wiki/Albania">Albania</a> <a href="/wiki/Algeria">Algeria</a> ...