Hello there.
I am trying to extract (scrape) informations from a web page and put those informations on Google Sheet.
The web page is similar with this one:
From this page I would like to retrieve the title and the 6 values (circled in red in the attached picture) from the table with CSS selector #statsTable
How can I set the code in Pipedream to do this?
Could be in NodeJS or Python.
I would need the Country to be Italy, I don’t think I need to render JS!
PS I’ve tried WebScraping.ai, but if you have other ideas/methods, they are welcome!
First off, welcome to the Pipedream community. Happy to have you!
That’s an interesting idea, I have a few suggestions.
1. Use the Amazon API
Instead of trying to scrape a webpage, it might be possible to scrape this data directly from the Amazon API. It would save you a step of crafting CSS selectors and setting up an HTML scraper.
2. BeautifulSoup (Python) or JSdom or Cheerio (Node.js)
If this table is server side rendered on Keepa, then you can use any HTTP client such as requests (Python), axios or fetch (Node.js), or just use the built in HTTP request builder step with no code.
Then once the HTML has been retrieved, you can use data extraction tools like BeautifulSoup, JSdom or Cheerio in a code step to perform the HTML search and extraction of the data.