Manual & Examples

Ten real examples using CSS selectors and XPath — all tested against a live HTML page.

Each example shows the objective, the exact command, and the expected output.

Input page used in all examples

resources/test.html View page →

Also available at: https://aborruso.github.io/scrape-cli/resources/test.html

CSS selector XPath pipeline

01 CSS Extract the page title

Objective

Get the main heading text using the h1 tag selector and -t for plain text output.

Command

scrape -e "h1" -t \
  resources/test.html

Output

Welcome to the Test Page

02 XPath Extract all section headings

Objective

List all h2 elements with a specific class, using an XPath attribute predicate.

Command

scrape \
  -e "//h2[@class='section-title']" \
  -t resources/test.html

Output

European Countries
Installation Steps
Topics
External Resources
Data Attributes
International Text

03 CSS Extract all external link URLs

Objective

Get the href of every link pointing to an external HTTPS URL, using the -a flag to extract attributes.

Command

scrape \
  -e "a[href^='https']" \
  -a href \
  resources/test.html

Output

https://example.com
https://github.com/aborruso/scrape-cli
https://pypi.org/project/scrape-cli/

04 XPath Extract a specific list item by position

Objective

Get the second <li> from a list using an XPath positional predicate wrapped in parentheses.

Command

scrape \
  -e "(//ul[@class='items-list']/li)[2]" \
  -t resources/test.html

Output

Second item

05 XPath Extract a table column

Objective

Pull all values from the first column of a table by combining an XPath class predicate with a column index.

Command

scrape \
  -e "//table[@class='data-table']
     /tbody/tr/td[1]" \
  -t resources/test.html

Output

Italy
France
Germany
Spain

06 XPath Extract custom data attributes

Objective

Read arbitrary data-* attributes using -a. The -e selects the elements; -a names the attribute to extract.

Command

scrape \
  -e "//span[@class='data-item']" \
  -a data-value \
  resources/test.html

Output

123
456
789

07 CSS Extract all visible text (LLM-ready)

Objective

Extract all body text — no tags, no <script>, no <style> — in one command. Ideal for feeding pages into an LLM.

Command

scrape -t \
  resources/test.html

Output (excerpt)

Home
Docs
Contact
Welcome to the Test Page
European Countries
Italy Rome 59
...
(script content excluded)

08 CSS Check element existence (for scripting)

Objective

-x returns exit code 0 if the element is found, 1 if not — perfect for shell conditionals and CI checks.

Command

scrape -e "#main-title" -x \
  resources/test.html \
  && echo "found" \
  || echo "not found"

Output

found

0 Element found

1 Not found

09 XPath Wrap output in a full HTML document

Objective

Extract an element and wrap it in a valid <html><body> document using -b. Useful for piping into other HTML tools.

Command

scrape \
  -e "//section[@id='countries']//table" \
  -b resources/test.html

Output

<!DOCTYPE html>
<html>
<body>
<table class="data-table">
  ...
</table>
</body>
</html>

10 pipeline Fetch a live URL and extract data

Objective

Pipe curl output directly into scrape to extract table column 2 (capitals) from the live test page on GitHub Pages.

Command

curl -s https://aborruso.github.io/
  scrape-cli/resources/test.html \
| scrape \
  -e "//table[@class='data-table']
     /tbody/tr/td[2]" -t

Output

Rome
Paris
Berlin
Madrid