Manual & Examples
Ten real examples using CSS selectors and XPath — all tested against a live HTML page.
Each example shows the objective, the exact command, and the expected output.
Input page used in all examples
resources/test.html
View page →
Also available at:
https://aborruso.github.io/scrape-cli/resources/test.html
Objective
Get the main heading text using the h1 tag selector and -t for plain text output.
Command
scrape -e "h1" -t \
resources/test.html
Output
Welcome to the Test Page
Objective
List all h2 elements with a specific class, using an XPath attribute predicate.
Command
scrape \
-e "//h2[@class='section-title']" \
-t resources/test.html
Output
European Countries Installation Steps Topics External Resources Data Attributes International Text
Objective
Get the href of every link pointing to an external HTTPS URL, using the -a flag to extract attributes.
Command
scrape \
-e "a[href^='https']" \
-a href \
resources/test.html
Output
https://example.com https://github.com/aborruso/scrape-cli https://pypi.org/project/scrape-cli/
Objective
Get the second <li> from a list using an XPath positional predicate wrapped in parentheses.
Command
scrape \
-e "(//ul[@class='items-list']/li)[2]" \
-t resources/test.html
Output
Second item
Objective
Pull all values from the first column of a table by combining an XPath class predicate with a column index.
Command
scrape \ -e "//table[@class='data-table'] /tbody/tr/td[1]" \ -t resources/test.html
Output
Italy France Germany Spain
Objective
Read arbitrary data-* attributes using -a. The -e selects the elements; -a names the attribute to extract.
Command
scrape \
-e "//span[@class='data-item']" \
-a data-value \
resources/test.html
Output
123 456 789
Objective
Extract all body text — no tags, no <script>, no <style> — in one command. Ideal for feeding pages into an LLM.
Command
scrape -t \ resources/test.html
Output (excerpt)
Home
Docs
Contact
Welcome to the Test Page
European Countries
Italy Rome 59
...
(script content excluded)
Objective
-x returns exit code 0 if the element is found, 1 if not — perfect for shell conditionals and CI checks.
Command
scrape -e "#main-title" -x \
resources/test.html \
&& echo "found" \
|| echo "not found"
Output
found
Objective
Extract an element and wrap it in a valid <html><body> document using -b. Useful for piping into other HTML tools.
Command
scrape \
-e "//section[@id='countries']//table" \
-b resources/test.html
Output
<!DOCTYPE html> <html> <body> <table class="data-table"> ... </table> </body> </html>
Objective
Pipe curl output directly into scrape to extract table column 2 (capitals) from the live test page on GitHub Pages.
Command
curl -s https://aborruso.github.io/ scrape-cli/resources/test.html \ | scrape \ -e "//table[@class='data-table'] /tbody/tr/td[2]" -t
Output
Rome Paris Berlin Madrid