Scrape a website on a schedule, store to S3

Explore Connect Docs Changelog Pricing Templates Partners String

•@dylburger•

•code:

•data:private•last updated:4 years ago

@dylburger/

Scrape a website on a schedule, store to S3

Run a cron job to scrape the HTML from a website, retrieving data from it using the Cheerio library. Store the full HTML of the page in an Amazon S3 bucket.

How this works

This workflow scrapes a website on a schedule. You can extract any content you'd like from the HTML that's returned using the Cheerio library

This also stores the full HTML of the site in an Amazon S3 bucket. You'll need to create an S3 bucket (or use an existing one), and configure the Bucket Policy to allow Pipedream to send data to the bucket.

Please reach out if you have any questions!