with Google Cloud and WebScraper.IO?
Emit new Pub/Sub topic in your GCP account. Messages published to this topic are emitted from the Pipedream source.
Emit new events with the results of an arbitrary query
Emit new event when a page scraping job has completed. See the docs here
Creates a scraping job (scrapes a sitemap). See the docs here
Creates a sitemap for the selected website. See the docs here
Retrieves a list of scraping jobs for a sitemap. See the docs here
Inserts rows into a BigQuery table. See the docs and for an example here
The Google Cloud API opens a world of possibilities for enhancing cloud operations and automating tasks. It empowers you to manage, scale, and fine-tune various services within the Google Cloud Platform (GCP) programmatically. With Pipedream, you can harness this power to create intricate workflows, trigger cloud functions based on events from other apps, manage resources, and analyze data, all in a serverless environment. The ability to interconnect GCP services with numerous other apps enriches automation, making it easier to synchronize data, streamline development workflows, and deploy applications efficiently.
module.exports = defineComponent({
props: {
google_cloud: {
type: "app",
app: "google_cloud",
}
},
async run({steps, $}) {
// Required workaround to get the @google-cloud/storage package
// working correctly on Pipedream
require("@dylburger/umask")()
const { Storage } = require('@google-cloud/storage')
const key = JSON.parse(this.google_cloud.$auth.key_json)
const storage = new Storage({
projectId: key.project_id,
credentials: {
client_email: key.client_email,
private_key: key.private_key,
}
})
await storage.authClient.getCredentials()
return {
status: "success",
authenticated: true,
projectId: key.project_id,
serviceAccount: key.client_email
}
},
})
The WebScraper.IO API allows you to programmatically perform web scraping tasks, extracting structured data from websites. With the API, you can automate the gathering of web content for analysis, monitoring, and integration with other data sources. In Pipedream, you can leverage this API to build workflows that process, analyze, and act on the data you scrape without writing code for backend infrastructure.
import { axios } from "@pipedream/platform"
export default defineComponent({
props: {
webscraper_io: {
type: "app",
app: "webscraper_io",
}
},
async run({steps, $}) {
return await axios($, {
url: `https://api.webscraper.io/api/v1/sitemaps`,
params: {
api_token: `${this.webscraper_io.$auth.api_key}`,
},
})
},
})