with Automatic Data Extraction and LLMWhisperer?
Extract data from a specified URL See the docs here
Convert your PDF/scanned documents to text format which can be used by LLMs. See the documentation
Get the status of the whisper process. This can be used to check the status of the conversion process when the conversion is done in async mode. See the documentation
Generate highlight locations for a search term in the document. See the documentation
Retrieve the extracted text executed through the whisper API. This can be used to retrieve the text of the conversion process when the conversion is done in async mode. See the documentation
The Automatic Data Extraction API by Zyte specializes in extracting structured data from web pages. When incorporated into Pipedream workflows, this API allows you to automate the process of gathering web data, which can feed into various tasks such as market research, price monitoring, or even lead generation. By triggering workflows with new data inputs, processing and storing the extracted data, and connecting to other apps, Pipedream amplifies the API's utility.
import { axios } from "@pipedream/platform"
export default defineComponent({
props: {
automatic_data_extraction: {
type: "app",
app: "automatic_data_extraction",
}
},
async run({steps, $}) {
const data = JSON.stringify([{
'url': 'http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
'pageType': 'product',
}]);
return await axios($, {
method: "post",
url: `https://autoextract.scrapinghub.com/v1/extract`,
headers: {
"Content-Type": `application/json`,
},
auth: {
username: `${this.automatic_data_extraction.$auth.api_key}`,
password: ``,
},
data,
})
},
})
import { axios } from "@pipedream/platform"
export default defineComponent({
props: {
llmwhisperer: {
type: "app",
app: "llmwhisperer",
}
},
async run({steps, $}) {
return await axios($, {
url: `https://llmwhisperer-api.unstract.com/v1/get-usage-info`,
headers: {
"unstract-key": `${this.llmwhisperer.$auth.api_key}`,
},
})
},
})