with WebScraping.AI and Puppeteer?
Gets an answer to a question about a given webpage. See the documentation
Get the HTML of a webpage using Puppeteer. See the documentation for details.
Returns the full HTML content of a webpage specified by the URL. See the documentation:
Get the title of a webpage using Puppeteer. See the documentation
Returns the visible text content of a webpage specified by the URL. See the documentation
WebScraping.AI API provides powerful tools for extracting data from websites, enabling users to retrieve structured information without the hassle of setting up a custom scraper. It handles proxy rotation, browsers, and CAPTCHAs, allowing you to focus on data collection. With Pipedream, you can harness this capability to create automated workflows that trigger on various events, process web content, and connect with countless other apps to feed data pipelines, monitor changes, or populate databases.
import { axios } from "@pipedream/platform"
export default defineComponent({
props: {
webscraping_ai: {
type: "app",
app: "webscraping_ai",
}
},
async run({steps, $}) {
return await axios($, {
url: `https://api.webscraping.ai/account`,
params: {
api_key: `${this.webscraping_ai.$auth.api_key}`,
},
})
},
})
Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode on Chromium on Pipedream.
Using Puppeteer you can perform tasks including:
import { puppeteer } from '@pipedream/browsers';
export default defineComponent({
async run({steps, $}) {
const browser = await puppeteer.browser();
// Interact with the web page programmatically
// See Puppeeter's Page documentation for available methods:
// https://pptr.dev/api/puppeteer.page
const page = await browser.newPage();
await page.goto('https://pipedream.com/');
const title = await page.title();
const content = await page.content();
// The browser needs to be closed, otherwise the step will hang
await browser.close();
return { title, content }
},
})