It’s a command-line tool to extract HTML elements using an XPath query or CSS3 selector.
It’s based on the great and simple scraping tool written by Jeroen Janssens.
You can install scrape-cli using pip:
pipx install scrape-cli
Using pip
pip install scrape-cli
Or install from source:
git clone https://github.com/aborruso/scrape-cli
cd scrape-cli
pip install -e .
A CSS selector query like this
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
| scrape -be 'table.wikitable > tbody > tr > td > b > a'
or an XPATH query like this one:
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
| scrape -be '//table[contains(@class, 'wikitable')]/tbody/tr/td/b/a'
gives you back:
<html>
<head>
</head>
<body>
<a href="/wiki/Afghanistan" title="Afghanistan">
Afghanistan
</a>
<a href="/wiki/Albania" title="Albania">
Albania
</a>
<a href="/wiki/Algeria" title="Algeria">
Algeria
</a>
<a href="/wiki/Andorra" title="Andorra">
Andorra
</a>
<a href="/wiki/Angola" title="Angola">
Angola
</a>
<a href="/wiki/Antigua_and_Barbuda" title="Antigua and Barbuda">
Antigua and Barbuda
</a>
<a href="/wiki/Argentina" title="Argentina">
Argentina
</a>
<a href="/wiki/Armenia" title="Armenia">
Armenia
</a>
...
...
</body>
</html>
Some notes on the commands:
-e
to set the query-b
to add <html>
, <head>
and <body>
tags to the HTML output.If you are looking for precompiled executables for Linux, please refer to the Releases page on GitHub where you can find the latest precompiled binary file.
I have built the scrape-linux-x86_64
precompiled binary, using pyinstaller and this command: pyinstaller --onefile scrape.py
.
Once you have built it, it’s an executable, and it’s possible to use it Linux 64 bit environment.