Run a cron job to scrape the HTML from a website, retrieving data from it using the Cheerio library. Store the full HTML of the page in an Amazon S3 bucket.
This workflow scrapes a website on a schedule. You can extract any content you'd like from the HTML that's returned using the Cheerio library.
This also stores the full HTML of the site in an Amazon S3 bucket. You'll need to create an S3 bucket (or use an existing one), and configure the Bucket Policy to allow Pipedream to send data to the bucket.
Please reach out if you have any questions!