with OCR Web Service and Automatic Data Extraction?
Extract data from a specified URL See the docs here
The OCR Web Service API on Pipedream allows users to convert scanned documents, images, and PDFs into editable and searchable text formats. Leveraging OCR (Optical Character Recognition) technology, this API is powerful for extracting text data efficiently. On Pipedream, you can integrate OCR Web Service with various other platforms to automate workflows like document management, data entry, and content archiving, enhancing productivity and reducing manual errors across diverse business processes.
import { axios } from "@pipedream/platform"
export default defineComponent({
props: {
ocr_web_service: {
type: "app",
app: "ocr_web_service",
}
},
async run({steps, $}) {
return await axios($, {
url: `https://www.ocrwebservice.com/restservices/getAccountInformation`,
auth: {
username: `${this.ocr_web_service.$auth.username}`,
password: `${this.ocr_web_service.$auth.license_api_password}`,
},
})
},
})
The Automatic Data Extraction API by Zyte specializes in extracting structured data from web pages. When incorporated into Pipedream workflows, this API allows you to automate the process of gathering web data, which can feed into various tasks such as market research, price monitoring, or even lead generation. By triggering workflows with new data inputs, processing and storing the extracted data, and connecting to other apps, Pipedream amplifies the API's utility.
import { axios } from "@pipedream/platform"
export default defineComponent({
props: {
automatic_data_extraction: {
type: "app",
app: "automatic_data_extraction",
}
},
async run({steps, $}) {
const data = JSON.stringify([{
'url': 'http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
'pageType': 'product',
}]);
return await axios($, {
method: "post",
url: `https://autoextract.scrapinghub.com/v1/extract`,
headers: {
"Content-Type": `application/json`,
},
auth: {
username: `${this.automatic_data_extraction.$auth.api_key}`,
password: ``,
},
data,
})
},
})