How to scrape Keepa with WebScraping.ai

Hello there.
I am trying to extract (scrape) informations from a web page and put those informations on Google Sheet.

The web page is similar with this one:

From this page I would like to retrieve the title and the 6 values (circled in red in the attached picture) from the table with CSS selector #statsTable

How can I set the code in Pipedream to do this?

Could be in NodeJS or Python.
I would need the Country to be Italy, I don’t think I need to render JS!

PS I’ve tried WebScraping.ai, but if you have other ideas/methods, they are welcome!

Thanks in advance

Ricky

Hi Ricky

First off, welcome to the Pipedream community. Happy to have you!

That’s an interesting idea, I have a few suggestions.

1. Use the Amazon API

Instead of trying to scrape a webpage, it might be possible to scrape this data directly from the Amazon API. It would save you a step of crafting CSS selectors and setting up an HTML scraper.

2. BeautifulSoup (Python) or JSdom or Cheerio (Node.js)

If this table is server side rendered on Keepa, then you can use any HTTP client such as requests (Python), axios or fetch (Node.js), or just use the built in HTTP request builder step with no code.

Then once the HTML has been retrieved, you can use data extraction tools like BeautifulSoup, JSdom or Cheerio in a code step to perform the HTML search and extraction of the data.

3. Browserless.io

However, if that table is rendered by JS on the frontend of the browser, using an HTTP client will not suffice.

If that’s the case, you can use a tool like Browserless.io which will emulate a browser and actually execute the Javascript on the page.

Here’s an example:

here is a Youtube video you might find useful: scraping website data to Google Sheets via Pipedream and ScrapeNinja

ScrapeNinja has cheerio selectors sandbox to quickly test your JS extractors against HTML output of the website: ScrapeNinja Cheerio Live Sandbox