Hello there.
I am trying to extract (scrape) informations from a web page and put those informations on Google Sheet.
The web page is similar with this one:
From this page I would like to retrieve the title and the 6 values (circled in red in the attached picture) from the table with CSS selector #statsTable
How can I set the code in Pipedream to do this?
Could be in NodeJS or Python.
I would need the Country to be Italy, I don’t think I need to render JS!
PS I’ve tried WebScraping.ai, but if you have other ideas/methods, they are welcome!
First off, welcome to the Pipedream community. Happy to have you!
That’s an interesting idea, I have a few suggestions.
1. Use the Amazon API
Instead of trying to scrape a webpage, it might be possible to scrape this data directly from the Amazon API. It would save you a step of crafting CSS selectors and setting up an HTML scraper.
2. BeautifulSoup (Python) or JSdom or Cheerio (Node.js)
If this table is server side rendered on Keepa, then you can use any HTTP client such as requests (Python), axios or fetch (Node.js), or just use the built in HTTP request builder step with no code.
Then once the HTML has been retrieved, you can use data extraction tools like BeautifulSoup, JSdom or Cheerio in a code step to perform the HTML search and extraction of the data.
I still do not have access to the Amazon API due to a lack of suitable sales.
BeautifulSoup (Python) or JSdom or Cheerio (Node.js) I have no idea how they are configured in Pipedream.
I tried Browserless, but you can only take a screenshot or export HTML to PDF. And then anyway, I can’t figure out how to filter the information via Pipedream to extract only the numbers I want.
@restyler
Saving on Google Sheet with Pipedream, there are no difficulties. The problem for me is configuring tools such as ScrapeNinja (or similar) so that they only extract the desired information.
I know nothing about Python, NodeJS or other languages. I would just like to know the code riches I have to put into Pipedream to get the desired information.