What is the issue with the provided code related to the undefined 'steps'?

user-1 · April 5, 2024, 12:10pm

This topic was automatically generated from Slack. You can find the original thread here.

i have the following code, what is wrong?
Code Error
steps is not defined

const PDFDocument = require(‘pdfkit’);
const fs = require(‘fs’).promises;

// Function to extract data from the PDF file
async function extractData(filePath) {
// Read the PDF content
const pdfContent = await fs.readFile(filePath);

// Create a PDFKit document to parse the content
const doc = new PDFDocument();
doc.parse(pdfContent);

// Extract the date from the header (assuming format YYYY-MM-DD)
const date = doc.text().match(/(\d{4}-\d{2}-\d{2})/)[1];

// Extract names and kolli-IDs using a regular expression
const namesAndKolliIds = doc.text().match(/Inlämning Kolli-ID/inl.nr: (.**?)(?:\n|$)/g);

// Create an object to store the extracted data
const data = {
date,
namesAndKolliIds,
};

// Return the extracted data object
return data;
}

// Accessing the downloaded file path from Pipedream context
const filePath = steps.download_file_to_tmp.$return_value[1]; // Assuming second element holds path

(async () => {
try {
const extractedData = await extractData(filePath);
console.log(“Extracted data:”, extractedData); // Log the extracted data for testing

_// Use the extractedData object (date, namesAndKolliIds) in your Pipedream workflow_
_// ... your Pipedream workflow steps using extractedData ..._

} *catch (error) {
console.error(“Error extracting data:”, error);
}
})();

user-1 · April 5, 2024, 12:10pm

It seems like you’re trying to use the steps object outside of a Pipedream component’s run method, which is why you’re encountering the “steps is not defined” error. In Pipedream workflows, the steps object is only available within the run method of a code step or component. Additionally, your script uses Node.js require syntax and attempts to use the pdfkit library and file system operations, which need to be adapted for use within a Pipedream component. Let’s refactor your code into a Pipedream component that can be used within a workflow. This example assumes you have a trigger or previous step that downloads a PDF file to the /tmp directory, and you want to extract data from this PDF. ``` import { axios } from “@pipedream/platform”; import PDFDocument from “pdfkit”; import fs from “fs”; export default defineComponent({ async run({ steps, $ }) { // Assuming you have a step named download_file that downloads a PDF to /tmp const filePath = /tmp/your-pdf-file.pdf; // Replace with your actual file path // Function to extract data from the PDF file async function extractData(filePath) { // Read the PDF content const pdfContent = await fs.promises.readFile(filePath); // Create a PDFKit document to parse the content const doc = new PDFDocument(); doc.parse(pdfContent); // Extract the date from the header (assuming format YYYY-MM-DD) const date = doc.text().match(/(\d{4}-\d{2}-\d{2})/)[1]; // Extract names and kolli-IDs using a regular expression const namesAndKolliIds = doc.text().match(/Inlämning Kolli-ID/inl.nr: (.**?)(?:\n|$)/*g/); // Create an object to store the extracted data const data = { date, namesAndKolliIds, }; // Return the extracted data object return data; } try { const extractedData = await extractData(filePath); console.log(“Extracted data:”, extractedData); // Log the extracted data for testing // Use the extractedData object (date, namesAndKolliIds) in your Pipedream workflow

user-1 · April 5, 2024, 12:10pm

Im running below code and get the error:
Code Error
Unterminated regular expression (23:95)

import { axios } from "@pipedream/platform";
import PDFDocument from "pdfkit";
import fs from "fs";

export default defineComponent({
  async run({ steps, $ }) {
    // Assuming you have a step named `download_file` that downloads a PDF to /tmp
    const filePath = `/tmp/your-pdf-file.pdf`; // Replace with your actual file path

    // Function to extract data from the PDF file
    async function extractData(filePath) {
      // Read the PDF content
      const pdfContent = await fs.promises.readFile(filePath);

      // Create a PDFKit document to parse the content
      const doc = new PDFDocument();
      doc.parse(pdfContent);

      // Extract the date from the header (assuming format YYYY-MM-DD)
      const date = doc.text().match(/(\d{4}-\d{2}-\d{2})/)[1];

      // Extract names and kolli-IDs using a regular expression
      const namesAndKolliIds = doc.text().match(/Inlämning Kolli-ID\/inl.nr: (.**?)(?:\n|$)/**g*/);

      // Create an object to store the extracted data
      const data = {
        date,
        namesAndKolliIds,
      };

      // Return the extracted data object
      return data;
    }

    try {
      const extractedData = await extractData(filePath);
      console.log("Extracted data:", extractedData); // Log the extracted data for testing

      // Use the extractedData object (date, namesAndKolliIds) in your Pipedream workflow

user-1 · April 5, 2024, 12:10pm

The error you’re encountering, “Unterminated regular expression (23:95)”, is likely due to the way you’ve written the regular expression for matching names and Kolli-IDs. Specifically, the issue is with the /**g**/ part of your regular expression. It appears you were aiming to use the global flag g but formatted it incorrectly. Here’s the corrected part of your code: // Extract names and kolli-IDs using a regular expression const namesAndKolliIds = doc.text().match(/Inlämning Kolli-ID\/inl.nr: (.**?)(?:\n|$)/g); However, there’s another significant issue in your code. The PDFDocument from pdfkit is used for creating PDFs, not parsing them. Unfortunately, this means the doc.parse(pdfContent); and doc.text() methods you’re attempting to use do not exist in pdfkit. To extract text from a PDF in Node.js, you’ll need to use a different library that supports reading and parsing PDF files, such as pdf-parse. Here’s an example of how you could adjust your code to use pdf-parse for extracting text from a PDF: First, you’ll need to install pdf-parse by adding the following import statement: import pdfParse from "pdf-parse"; Then, you can modify your extractData function to use pdf-parse: async function extractData(filePath) { // Read the PDF content const pdfContent = await fs.promises.readFile(filePath); // Use pdf-parse to extract text from the PDF content const data = await pdfParse(pdfContent); // Extract the date from the text (assuming format YYYY-MM-DD) const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[1]; // Extract names and kolli-IDs using a regular expression const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID\/inl.nr: (.**?)(?:\n|$)/g); // Create an object to store the extracted data const extractedData = { date, namesAndKolliIds, }; // Return the extracted data object return extractedData; } Please note that `

user-1 · April 5, 2024, 12:10pm

Im getting a path to a PDF from steps.download_file_to_tmp.$return_value[1]

How do i add that to the code below?

async function extractData(filePath) {
  // Read the PDF content
  const pdfContent = await fs.promises.readFile(filePath);

  // Use pdf-parse to extract text from the PDF content
  const data = await pdfParse(pdfContent);

  // Extract the date from the text (assuming format YYYY-MM-DD)
  const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[1];

  // Extract names and kolli-IDs using a regular expression
  const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID\/inl.nr: (.*?)(?:\n|$)/g);

  // Create an object to store the extracted data
  const extractedData = {
    date,
    namesAndKolliIds,
  };

  // Return the extracted data object
  return extractedData;
}

user-1 · April 5, 2024, 12:10pm

To incorporate the path to the PDF you’re getting from steps.download_file_to_tmp.$return_value[1] into the code you’ve provided, you’ll need to call the extractData function and pass the path as the argument. Assuming the rest of your workflow is set up correctly, and you have the pdf-parse library available for use, here’s how you could modify your Pipedream component to include this step: import { axios, $ } from "@pipedream/platform"; import fs from "fs"; import pdfParse from "pdf-parse"; export default defineComponent({ async run({ steps, $ }) { // Assuming steps.download_file_to_tmp.$return_value[1] holds the path to the PDF const filePath = steps.download_file_to_tmp.$return_value[1]; // Call the extractData function with the filePath const extractedData = await extractData(filePath); // Return the extracted data return extractedData; }, }); async function extractData(filePath) { // Read the PDF content const pdfContent = await fs.promises.readFile(filePath); // Use pdf-parse to extract text from the PDF content const data = await pdfParse(pdfContent); // Extract the date from the text (assuming format YYYY-MM-DD) const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[1]; // Extract names and kolli-IDs using a regular expression const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID\/inl.nr: (.*?)(?:\n|$)/g); // Create an object to store the extracted data const extractedData = { date, namesAndKolliIds, }; // Return the extracted data object return extractedData; } Please ensure you have the pdf-parse library available in your environment, as Pipedream’s Node.js runtime includes a wide range of npm packages by default, but specific version requirements or less common packages might not be included. If pdf-parse is not available, you might need to find an alternative method or library for parsing PDF content in your workflow.

user-1 · April 5, 2024, 12:10pm

Im using the below code and get this error, whats wrong?

please print full code ready to copy , and remember that i get the filepath from steps.download_file_to_tmp.$return_value[1]

Code Error
The requested module ‘@pipedream/platform’ does not provide an export named ‘$’

import { axios, $ } from “@pipedream/platform”;
import fs from “fs”;
import pdfParse from “pdf-parse”;

export default defineComponent({
async run({ steps, $ }) {
// Assuming steps.download_file_to_tmp.$return_value[1] holds the path to the PDF
const filePath = steps.download_file_to_tmp.$return_value[1];

_// Call the extractData function with the filePath_
**const** extractedData = **await** extractData(filePath);

_// Return the extracted data_
**return** extractedData;

},
});

async function extractData(filePath) {
// Read the PDF content
const pdfContent = await fs.promises.readFile(filePath);

// Use pdf-parse to extract text from the PDF content
const data = await pdfParse(pdfContent);

// Extract the date from the text (assuming format YYYY-MM-DD)
const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[1];

// Extract names and kolli-IDs using a regular expression
const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID/inl.nr: (.**?)(?:\n|$)/g);

// Create an object to store the extracted data
const extractedData = {
date,
namesAndKolliIds,
};

// Return the extracted data object
*return extractedData;
}

user-1 · April 5, 2024, 12:10pm

 import { axios } from "@pipedream/platform"; import fs from "fs"; import pdfParse from "pdf-parse";  export default defineComponent({   async run({ steps, $ }) {     // Assuming steps.download_file_to_tmp.$return_value[1] holds the path to the PDF     const filePath = steps.download_file_to_tmp.$return_value[1];      // Call the extractData function with the filePath     const extractedData = await extractData(filePath);      // Return the extracted data     return extractedData;   }, });  async function extractData(filePath) {   // Read the PDF content   const pdfContent = await fs.promises.readFile(filePath);    // Use pdf-parse to extract text from the PDF content   const data = await pdfParse(pdfContent);    // Extract the date from the text (assuming format YYYY-MM-DD)   const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[0]; // Changed [1] to [0] to correctly capture the matched date    // Extract names and kolli-IDs using a regular expression   const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID\/inl.nr: (.**?)(?:\n|$)/g); // Removed the asterisk (**) after the slash to correct the regex    // Create an object to store the extracted data   const extractedData = {     date,     namesAndKolliIds,   };    // Return the extracted data object   return extractedData; } 
```  I've made two corrections in your code:  1. In the `extractData` function, when extracting the date, I changed `[1]` to `[0]` to correctly capture the matched date from the regex match result. 2. Corrected the regular expression for extracting names and kolli-IDs by removing the asterisk (*) after the slash to correct the regex pattern.  This code should now work as expected, assuming the PDF content and structure match your assumptions.

user-1 · April 5, 2024, 12:10pm

Im running below code and get an error, Remeber that i get the pdf file from the path /tmp/data.pdf

Code Error
Invalid regular expression flag (8:22)

import { axios } from "@pipedream/platform";
import fs from "fs";
import pdfParse from "pdf-parse";

export default defineComponent({
  async run({ steps, $ }) {
    // Assuming steps.download_file_to_tmp.$return_value[1] holds the path to the PDF
    const filePath = /tmp/data.pdf;

    // Call the extractData function with the filePath
    const extractedData = await extractData(filePath);

    // Return the extracted data
    return extractedData;
  },
});

async function extractData(filePath) {
  // Read the PDF content
  const pdfContent = await fs.promises.readFile(filePath);

  // Use pdf-parse to extract text from the PDF content
  const data = await pdfParse(pdfContent);

  // Extract the date from the text (assuming format YYYY-MM-DD)
  const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[0]; // Changed [1] to [0] to correctly capture the matched date

  // Extract names and kolli-IDs using a regular expression
  const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID\/inl.nr: (.**?)(?:\n|$)/g); // Removed the asterisk (**) after the slash to correct the regex

  // Create an object to store the extracted data
  const extractedData = {
    date,
    namesAndKolliIds,
  };

  // Return the extracted data object
  return extractedData;
}

user-1 · April 5, 2024, 12:10pm

The error you’re encountering, “Invalid regular expression flag (8:22)”, is likely due to the incorrect usage of the file path as a regular expression. In your code, the line const filePath = /tmp/data.pdf; is interpreted as a regular expression due to the forward slashes, which is not what you intend. You should use a string to specify the file path. Here’s the corrected version of your code: import { axios } from "@pipedream/platform"; import fs from "fs"; import pdfParse from "pdf-parse"; export default defineComponent({ async run({ steps, $ }) { // Correctly specify the file path as a string const filePath = "/tmp/data.pdf"; // Call the extractData function with the filePath const extractedData = await extractData(filePath); // Return the extracted data return extractedData; }, }); async function extractData(filePath) { // Read the PDF content const pdfContent = await fs.promises.readFile(filePath); // Use pdf-parse to extract text from the PDF content const data = await pdfParse(pdfContent); // Extract the date from the text (assuming format YYYY-MM-DD) const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[0]; // Changed [1] to [0] to correctly capture the matched date // Extract names and kolli-IDs using a regular expression const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID\/inl.nr: (.**?)(?:\n|$)/g); // Removed the asterisk (**) after the slash to correct the regex // Create an object to store the extracted data const extractedData = { date, namesAndKolliIds, }; // Return the extracted data object return extractedData; } In this corrected version, I’ve changed the filePath to be a string by enclosing it in quotes. This should resolve the “Invalid regular expression flag” error you were encountering.

user-1 · April 5, 2024, 12:10pm

import { axios } from "@pipedream/platform";
import fs from "fs";
import pdfParse from "pdf-parse";

export default defineComponent({
  async run({ steps, $ }) {
    // Correctly specify the file path as a string
    const filePath = "/tmp/data.pdf";

    // Call the extractData function with the filePath
    const extractedData = await extractData(filePath);

    // Return the extracted data
    return extractedData;
  },
});

async function extractData(filePath) {
  // Read the PDF content
  const pdfContent = await fs.promises.readFile(filePath);

  // Use pdf-parse to extract text from the PDF content
  const data = await pdfParse(pdfContent);

  // Extract the date from the text (assuming format YYYY-MM-DD)
  const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[0]; // Changed [1] to [0] to correctly capture the matched date

  // Extract names and kolli-IDs using a regular expression
  const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID\/inl.nr: (.**?)(?:\n|$)/g); // Removed the asterisk (**) after the slash to correct the regex

  // Create an object to store the extracted data
  const extractedData = {
    date,
    namesAndKolliIds,
  };

  // Return the extracted data object
  return extractedData;
}

user-1 · April 5, 2024, 12:10pm

 import { axios } from "@pipedream/platform"; import fs from "fs"; import pdfParse from "pdf-parse";  export default defineComponent({   async run({ steps, $ }) {     // Correctly specify the file path as a string     const filePath = "/tmp/data.pdf";      // Call the extractData function with the filePath     const extractedData = await extractData(filePath);      // Return the extracted data     return extractedData;   }, });  async function extractData(filePath) {   // Read the PDF content   const pdfContent = await fs.promises.readFile(filePath);    // Use pdf-parse to extract text from the PDF content   const data = await pdfParse(pdfContent);    // Extract the date from the text (assuming format YYYY-MM-DD)   const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[0]; // Changed [1] to [0] to correctly capture the matched date    // Extract names and kolli-IDs using a regular expression   const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID\/inl.nr: (.**?)(?:\n|$)/g); // Removed the asterisk (**) after the slash to correct the regex    // Create an object to store the extracted data   const extractedData = {     date,     namesAndKolliIds,   };    // Return the extracted data object   return extractedData; } 
```  This code snippet is designed to read a PDF file from the `/tmp` directory, extract text from it using the `pdf-parse` library, and then use regular expressions to extract specific pieces of information from the text. It correctly specifies the file path, reads the PDF content, extracts the date and names along with kolli-IDs from the text, and returns this data in an object.

user-1 · April 5, 2024, 12:10pm

im getting the error:
Code Error
ENOENT: no such file or directory, open ‘./test/data/05-versions-space.pdf’

I dont understand where the path ./test/data/05-versions-space.pdf comes from. What should i do for edit?

import { axios } from "@pipedream/platform";
import fs from "fs";
import pdfParse from "pdf-parse";

export default defineComponent({
  async run({ steps, $ }) {
    // Correctly specify the file path as a string
    const filePath = "/tmp/data.pdf";

    // Call the extractData function with the filePath
    const extractedData = await extractData(filePath);

    // Return the extracted data
    return extractedData;
  },
});

async function extractData(filePath) {
  // Read the PDF content
  const pdfContent = await fs.promises.readFile(filePath);

  // Use pdf-parse to extract text from the PDF content
  const data = await pdfParse(pdfContent);

  // Extract the date from the text (assuming format YYYY-MM-DD)
  const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[0]; // Changed [1] to [0] to correctly capture the matched date

  // Extract names and kolli-IDs using a regular expression
  const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID\/inl.nr: (.**?)(?:\n|$)/g); // Removed the asterisk (**) after the slash to correct the regex

  // Create an object to store the extracted data
  const extractedData = {
    date,
    namesAndKolliIds,
  };

  // Return the extracted data object
  return extractedData;
}

user-1 · April 5, 2024, 12:10pm

The error you’re encountering, ENOENT: no such file or directory, open './test/data/05-versions-space.pdf', typically indicates that the file path specified in your code does not exist in the expected location. In the code snippet you provided, however, the file path is specified as "/tmp/data.pdf", which does not match the error message’s path. This discrepancy suggests that the error might be originating from a different part of your workflow or from a previous version of your code. To resolve this issue, please ensure the following: 1. The file you’re trying to access, data.pdf, is indeed located in the /tmp directory. Files can be written to or read from the /tmp directory in Pipedream workflows (Working with files in /tmp directory). If the file is not present in /tmp, you’ll need to adjust your workflow to ensure it’s saved there before attempting to read it. 2. If you’ve recently changed the file path in your code, ensure that there are no other parts of your workflow (such as previous steps) that are still referencing the old file path (./test/data/05-versions-space.pdf). This could be causing the error if the workflow is attempting to access a file at a path that no longer exists. 3. Double-check that the code snippet you’re working with is the latest version and accurately reflects the file paths you intend to use. If you’re working in a team or using version control, ensure that your changes have been properly merged and deployed. 4. If the error persists and you’re unable to locate the source within your workflow, consider adding logging statements before the file access operation to confirm the file path and existence of the file. You can use console.log() to print the current working directory and list files in /tmp to debug the issue further. Given the information provided, it’s challenging to pinpoint the exact cause without more context about the rest of your workflow and how the file data.pdf is expected to be placed in /tmp. If you continue to experience difficulties, please provide additional details about how the file is generated or uploaded to /tmp, and any other relevant parts of your workflow.

user-1 · April 5, 2024, 12:10pm

I have this code and would like to add logging trough console.log()

import { axios } from “@pipedream/platform”;
import fs from “fs”;
import pdfParse from “pdf-parse”;
console.log()

export default defineComponent({
async run({ steps, $ }) {
// Correctly specify the file path as a string
const filePath = “/tmp/data.pdf”;

_// Call the extractData function with the filePath_
**const** extractedData = **await** extractData(filePath);

_// Return the extracted data_
**return** extractedData;

},
});

async function extractData(filePath) {
// Read the PDF content
const pdfContent = await fs.promises.readFile(filePath);

// Use pdf-parse to extract text from the PDF content
const data = await pdfParse(pdfContent);

// Extract the date from the text (assuming format YYYY-MM-DD)
const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[0]; // Changed [1] to [0] to correctly capture the matched date

// Extract names and kolli-IDs using a regular expression
const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID/inl.nr: (.?)(?:\n|$)/g); _// Removed the asterisk () after the slash to correct the regex_

// Create an object to store the extracted data
const extractedData = {
date,
namesAndKolliIds,
};

// Return the extracted data object
return extractedData;
}

user-1 · April 5, 2024, 12:10pm

To add logging through console.log() in your code, you can insert console.log() statements at various points in your code to log the information you’re interested in. For example, you might want to log the extracted data or any errors that occur during the extraction process. Here’s how you could modify your code to include logging: import { axios } from "@pipedream/platform"; import fs from "fs"; import pdfParse from "pdf-parse"; export default defineComponent({ async run({ steps, $ }) { // Correctly specify the file path as a string const filePath = "/tmp/data.pdf"; try { // Call the extractData function with the filePath const extractedData = await extractData(filePath); // Log the extracted data console.log("Extracted Data:", extractedData); // Return the extracted data return extractedData; } catch (error) { // Log any errors that occur console.error("Error extracting data:", error); } }, }); async function extractData(filePath) { // Read the PDF content const pdfContent = await fs.promises.readFile(filePath); // Use pdf-parse to extract text from the PDF content const data = await pdfParse(pdfContent); // Extract the date from the text (assuming format YYYY-MM-DD) const date = data.text.match(/(\d{4}-\d{2}-\d{2})/)[0]; // Changed [1] to [0] to correctly capture the matched date // Extract names and kolli-IDs using a regular expression const namesAndKolliIds = data.text.match(/Inlämning Kolli-ID\/inl.nr: (.**?)(?:\n|$)/g); // Removed the asterisk (**) after the slash to correct the regex // Create an object to store the extracted data const extractedData = { date, namesAndKolliIds, }; // Log the date and namesAndKolliIds for debugging console.log("Date:", date); console.log("Names and Kolli-IDs:", namesAndKolliIds); // Return the extracted data object return extractedData; } In this modified version of your code, console.log() is used to log the extracted data right before it’s returned by the `

user-1 · April 5, 2024, 12:10pm

Im trying to load
import pdfParse from “pdf-parse”;

But im getting the error below, does pipedream really support the pdf-parse library?

ENOENT: no such file or directory, open ‘./test/data/05-versions-space.pdf’

user-1 · April 5, 2024, 12:10pm

It looks like you’re encountering an issue with the pdf-parse library in a Pipedream workflow. Based on the error message you provided, the issue seems to be related to accessing a file ('./test/data/05-versions-space.pdf') that doesn’t exist in the Pipedream execution environment. Pipedream’s environment supports a wide range of NPM packages, including pdf-parse, as long as they are compatible with Node.js and don’t require access to a web browser or specific system binaries not available in the environment. However, when working with files, it’s important to remember that the Pipedream execution environment has certain limitations: 1. The only writable directory available is /tmp. If your code involves reading or writing files, you should ensure that you’re using this directory. For example, if you’re trying to read a PDF file, you’ll need to make sure the file is accessible within the Pipedream environment and referenced correctly. 2. If the library or your code attempts to access files that are not included in your workflow or are located in directories that are not accessible, you will encounter errors like the one you’re seeing. To resolve this issue, you might consider the following steps: - Ensure that the PDF file you’re trying to parse is uploaded to a location accessible by your Pipedream workflow. You could upload the file to a cloud storage service and download it to the /tmp directory in your workflow before parsing it. - Modify your code to reference the correct path where the PDF file is stored. If you’re downloading it to the /tmp directory, make sure your code reflects that path. Unfortunately, without more specific details about your workflow and how you’re handling files, I can only provide general advice. If you continue to experience issues or have specific questions about working with files or the pdf-parse library in Pipedream, please don’t hesitate to reach out for more help at Support - Pipedream.

user-1 · April 5, 2024, 12:10pm

I have a pdf in the tmp folder “/tmp/data.pdf”

I have been trying pdf-parse but its trowing an error when loading. How can i extract the text in the PDF without pdf-parse?

user-1 · April 5, 2024, 12:10pm

To extract text from a PDF stored in the /tmp directory without using pdf-parse, you can use the following approach with Node.js and the pdf2json package. This package allows you to parse PDF files and extract their text content. Please note that pdf2json is one of the many available packages for handling PDFs in Node.js, and it’s chosen here as an alternative to pdf-parse. First, ensure you import the pdf2json package at the top of your Node.js code: import PDFParser from "pdf2json"; Then, you can use the following code snippet to read and extract text from the PDF file located at /tmp/data.pdf: export default defineComponent({ async run({ steps, $ }) { const pdfParser = new PDFParser(); pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError)); pdfParser.on("pdfParser_dataReady", pdfData => { const text = pdfData.formImage.Pages.reduce((acc, page) => { return acc + page.Texts.reduce((accText, textItem) => { // Decode URI component and replace any additional encoding specific to pdf2json const decodedText = decodeURIComponent(textItem.R[0].T).replace(/\\'/g, "'"); return accText + decodedText + " "; }, ""); }, ""); console.log(text); // This will log the extracted text to the Pipedream logs return text; // This will return the extracted text as the step output }); pdfParser.loadPDF("/tmp/data.pdf"); }, }); This code initializes a PDFParser instance, sets up event listeners for both error and data-ready events, and starts the PDF parsing process with pdfParser.loadPDF("/tmp/data.pdf"). When the parsing is complete, the pdfParser_dataReady event is triggered, and the extracted text is constructed from the PDF data. Remember, the output of the parsing process is logged to the Pipedream logs and also returned as the step output, which can be used in subsequent steps of your workflow. Please note that the pdf2json library may not handle all PDFs in the same way, especially those with complex layouts or encrypted content. Testing with your specific PDF files is recommended to ensure it meets your needs.