Manual & Examples

Ten real examples using CSS selectors and XPath — all tested against a live HTML page.

Each example shows the objective, the exact command, and the expected output.

resources/test.html View page →

Also available at: https://aborruso.github.io/scrape-cli/resources/test.html

CSS selector XPath pipeline
01 CSS Extract the page title

Get the main heading text using the h1 tag selector and -t for plain text output.

scrape -e "h1" -t \
  resources/test.html
Welcome to the Test Page
02 XPath Extract all section headings

List all h2 elements with a specific class, using an XPath attribute predicate.

scrape \
  -e "//h2[@class='section-title']" \
  -t resources/test.html
European Countries
Installation Steps
Topics
External Resources
Data Attributes
International Text
03 CSS Extract all external link URLs

Get the href of every link pointing to an external HTTPS URL, using the -a flag to extract attributes.

scrape \
  -e "a[href^='https']" \
  -a href \
  resources/test.html
https://example.com
https://github.com/aborruso/scrape-cli
https://pypi.org/project/scrape-cli/
04 XPath Extract a specific list item by position

Get the second <li> from a list using an XPath positional predicate wrapped in parentheses.

scrape \
  -e "(//ul[@class='items-list']/li)[2]" \
  -t resources/test.html
Second item
05 XPath Extract a table column

Pull all values from the first column of a table by combining an XPath class predicate with a column index.

scrape \
  -e "//table[@class='data-table']
     /tbody/tr/td[1]" \
  -t resources/test.html
Italy
France
Germany
Spain
06 XPath Extract custom data attributes

Read arbitrary data-* attributes using -a. The -e selects the elements; -a names the attribute to extract.

scrape \
  -e "//span[@class='data-item']" \
  -a data-value \
  resources/test.html
123
456
789
07 CSS Extract all visible text (LLM-ready)

Extract all body text — no tags, no <script>, no <style> — in one command. Ideal for feeding pages into an LLM.

scrape -t \
  resources/test.html
Home
Docs
Contact
Welcome to the Test Page
European Countries
Italy Rome 59
...
(script content excluded)
08 CSS Check element existence (for scripting)

-x returns exit code 0 if the element is found, 1 if not — perfect for shell conditionals and CI checks.

scrape -e "#main-title" -x \
  resources/test.html \
  && echo "found" \
  || echo "not found"
found
0 Element found
1 Not found
09 XPath Wrap output in a full HTML document

Extract an element and wrap it in a valid <html><body> document using -b. Useful for piping into other HTML tools.

scrape \
  -e "//section[@id='countries']//table" \
  -b resources/test.html
<!DOCTYPE html>
<html>
<body>
<table class="data-table">
  ...
</table>
</body>
</html>
10 pipeline Fetch a live URL and extract data

Pipe curl output directly into scrape to extract table column 2 (capitals) from the live test page on GitHub Pages.

curl -s https://aborruso.github.io/
  scrape-cli/resources/test.html \
| scrape \
  -e "//table[@class='data-table']
     /tbody/tr/td[2]" -t
Rome
Paris
Berlin
Madrid