Does anyone have any example code using pdf extract (pdf.js-extract - npm). I’m not using import correctly or something else?
Error: SyntaxError: ‘import’ and ‘export’ may appear only with ‘sourceType: module’ (4:0)
I’m running into errors about how to use import for this code:
import PDFExtract from “pdf.js-extract”
export default defineComponent({
async run({ steps, $ }) {
const pdfExtract = new PDFExtract();
const fs = require(‘fs’);
const buffer = fs.readFileSync(“https://www.zscaler.com/resources/data-sheets/zscaler-data-protection-benefits.pdf”);
const options = {}; /* see below */
pdfExtract.extractBuffer(buffer, options, (err, data) => {
if (err) return console.log(err);
console.log(data);
});
// Reference previous step data using the steps object and return data to use it in future steps
return steps.trigger.event
},
})
Hello @mrodgers.junk,
I think the error is because you’re using const fs = require('fs')
in your action code. Would you mind changing it to import fs from 'fs'
and put it on the top? For example:
import PDFExtract from "pdf.js-extract"
import fs from "fs"
export default defineComponent({
async run({steps, $}) {
/// action code
}
})
Thank you very much @vunguyenhung, I messed around with the code a bit and was able to get this working.
// working code
import { PDFExtract } from ‘pdf.js-extract’;
import fetch from “node-fetch”;
export default defineComponent({
async run({ steps, $ }) {
// const url = “https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf”;
const url = steps.trigger.event.body.text;
const response = await fetch(url);
const buffer = await response.buffer();
const options = { };
const pdfExtract = new PDFExtract();
let data;
try {
data = await pdfExtract.extractBuffer(buffer, options);
console.log(data);
} catch (err) {
console.log("Error extracting PDF data:", err);
} finally {
return data;
}
},
});
This code defines a function that extracts data from a PDF file available at a given URL. The function uses the pdf.js-extract
library to extract the data and the node-fetch
library to fetch the PDF file from the URL.
The function is an asynchronous function defined as a default export of a component. The function receives two arguments: steps
and $
. It returns a Promise that resolves to the extracted data.
The function first creates a url
variable that contains the URL of the PDF file to be extracted. It then uses node-fetch
library to fetch the PDF file from the URL and stores the file content in a buffer
variable.
It then defines an empty options
object and creates an instance of the PDFExtract
class from the pdf.js-extract
library. It also initializes a data
variable to an undefined value.
Next, the function tries to extract the data from the PDF file using the extractBuffer
method of the pdfExtract
object. If the extraction is successful, it logs the extracted data to the console and assigns the data to the data
variable. If the extraction fails, it logs an error message to the console with the error message.
Finally, the function returns the data
variable inside a finally
block. The finally
block ensures that the function always returns the data
variable, whether or not there was an error during the extraction.