with ScrapingBee and Playwright?
Generates a pdf of the page and store it on /tmp directory. See the documentation
Store a new screenshot file on /tmp directory. See the documentation
The ScrapingBee API lets you extract data from web pages programmatically. It handles headless browsers and rotating proxies, so you can focus on data extraction without worrying about common web scraping issues like getting blocked. On Pipedream, you can tap into ScrapingBee's capability to create serverless workflows that scrape web data and connect them with numerous services for processing, storing, or triggering actions.
import { axios } from "@pipedream/platform"
export default defineComponent({
props: {
scrapingbee: {
type: "app",
app: "scrapingbee",
}
},
async run({steps, $}) {
return await axios($, {
url: `https://app.scrapingbee.com/api/v1/`,
params: {
api_key: `${this.scrapingbee.$auth.api_key}`,
json_response: `True`,
url: `https://pipedream.com`,
},
})
},
})
Playwright is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Playwright runs in headless mode on Chromium on Pipedream.
Using Playwright you can perform tasks including:
import { playwright } from '@pipedream/browsers';
export default defineComponent({
async run({steps, $}) {
const browser = await playwright.launch();
// Interact with the web page programmatically
// See Playwright's Page documentation for available methods:
// https://playwright.dev/docs/api/class-page
const page = await browser.newPage();
await page.goto('https://pipedream.com/');
const title = await page.title();
const content = await page.content();
// Close context and browser otherwise the step will hang
await page.context().close()
await browser.close();
return { title, content }
},
})