Scrape a website on a schedule, store to S3
@dylburger
code:
data:privatelast updated:3 years ago
today
Build integrations remarkably fast!
You're viewing a public workflow template.
Sign up to customize, add steps, modify code and more.
Join 800,000+ developers using the Pipedream platform
steps.
trigger
Cron Scheduler
Deploy to configure a custom schedule
This workflow runs on Pipedream's servers and is triggered on a custom schedule.
steps.
nodejs
auth
to use OAuth tokens and API keys in code via theauths object
code
Write any Node.jscodeand use anynpm package. You can alsoexport datafor use in later steps via return or this.key = 'value', pass input data to your code viaparams, and maintain state across executions with$checkpoint.
async (event, steps) => {
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
}
16
const axios = require("axios")
const cheerio = require("cheerio")

async function fetchHTML(url) {
  const { data } = await axios.get(url)
  return cheerio.load(data)
}

const $ = await fetchHTML("https://example.com")

// Save the HTML in the $event object, which allows you to
// store data in one step and use it in another. See
// https://docs.pipedream.com/notebook/dollar-event/#modifying-event
this.html = $.html()
steps.
send_to_s3
Send data to Amazon S3 using Pipedream's destination integration. See https://docs.pipedream.com/destinations/s3/
params
S3 Bucket

The name of the S3 bucket you'd like to send data to

 
my-s3-bucket
string ·params.bucket
Payload

An object, either a reference to a variable from a previous step (for example, event.body), or a set of hardcoded key-value pairs.

 
key
 
value
object ·params.payload
Optional
code
async params => {
1
2
3
4
5
6
}
7
$send.s3({
  bucket: params.bucket,
  prefix: params.prefix,
  payload: params.payload,
})