with Apify and Automatic Data Extraction?
Extract data from a specified URL See the docs here
Performs an execution of a selected Actor in Apify. See the documentation
Run a specific task and return its dataset items. See the documentation
Executes a scraper on a specific website and returns its content as text. This action is perfect for extracting content from a single page.
The Apify API unleashes the power to automate web scraping, process data, and orchestrate web automation workflows. By utilizing Apify on Pipedream, you can create dynamic serverless workflows to manage tasks like extracting data from websites, running browser automation, and scheduling these jobs to run autonomously. It integrates smoothly with Pipedream's capabilities to trigger actions on various other apps, store the results, and manage complex data flow with minimal setup.
import { axios } from "@pipedream/platform"
export default defineComponent({
props: {
apify: {
type: "app",
app: "apify",
}
},
async run({steps, $}) {
return await axios($, {
url: `https://api.apify.com/v2/users/me`,
headers: {
Authorization: `Bearer ${this.apify.$auth.api_token}`,
},
})
},
})
The Automatic Data Extraction API by Zyte specializes in extracting structured data from web pages. When incorporated into Pipedream workflows, this API allows you to automate the process of gathering web data, which can feed into various tasks such as market research, price monitoring, or even lead generation. By triggering workflows with new data inputs, processing and storing the extracted data, and connecting to other apps, Pipedream amplifies the API's utility.
import { axios } from "@pipedream/platform"
export default defineComponent({
props: {
automatic_data_extraction: {
type: "app",
app: "automatic_data_extraction",
}
},
async run({steps, $}) {
const data = JSON.stringify([{
'url': 'http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
'pageType': 'product',
}]);
return await axios($, {
method: "post",
url: `https://autoextract.scrapinghub.com/v1/extract`,
headers: {
"Content-Type": `application/json`,
},
auth: {
username: `${this.automatic_data_extraction.$auth.api_key}`,
password: ``,
},
data,
})
},
})