Can I Update My Node Version to 22+ as Required by "pdfjs-dist"?

This topic was automatically generated from Slack. You can find the original thread here.

@U05A7KLNALC Can I change my node version to 22+?

“pdfjs-dist” needs this version

Currently, Pipedream supports Node.js v18 as the default runtime. Unfortunately, you cannot change the Node.js version to 22+ at this time. If you have specific requirements for a package like pdfjs-dist, you may need to explore alternative solutions or wait for future updates from Pipedream that might support newer Node.js versions. If you have further questions or need assistance, feel free to ask!

How am I able to use pdfjs-dist to parse pdfs?

To use pdfjs-dist to parse PDFs in a Pipedream component, you would typically need to import the library and use it to load and parse a PDF file. Here’s a basic example of how you might set this up in a Pipedream component: import { defineComponent } from '@pipedream/types'; import ** as pdfjsLib from 'pdfjs-dist'; export default defineComponent({ async run({ steps, $ }) { // Load the PDF file from a URL or a local path const loadingTask = pdfjsLib.getDocument('path_to_your_pdf_file.pdf'); // Wait for the PDF to be loaded const pdf = await loadingTask.promise; // Get the first page const page = await pdf.getPage(1); // Extract text content from the page const textContent = await page.getTextContent(); // Process the text content const textItems = textContent.items.map(item => item.str); const text = textItems.join(' '); return text; }, }); ### Key Points: - Importing pdfjs-dist: You need to import pdfjs-dist to use its functionalities. - Loading a PDF: Use pdfjsLib.getDocument() to load a PDF document. - Extracting Text: Once the PDF is loaded, you can extract text from a specific page using getTextContent(). ### Considerations: - PDF Source: You need to specify the source of the PDF file, whether it’s a URL or a local file path. - *Page Number: The example extracts text from the first page. You can adjust this to extract from other pages. If you have specific requirements or encounter issues, feel free to ask for further assistance!

@U03CXTHK4RF Assistance would be appreciated, maybe I am doing something wrong but I am unable to get pdfjs-dist working.

Do you know if the “legacy” version will work?

Yep, I tried importing import * as pdfjsLib from 'pdfjs-dist/legacy/build/pdf.js'

UserError
Error: Cannot find module '/tmp/pdg/dist/code/9027e5fe3edc16641541600050143219df9cb846b4d82e84e1fe988ea371ad02/node_modules/pdfjs-dist/legacy/build/pdf.js' imported from /tmp/pdg/dist/code/9027e5fe3edc16641541600050143219df9cb846b4d82e84e1fe988ea371ad02/component.mjs

Am I doing something wrong?

I also tried pdfjs-dist/legacy and that didn’t work either

Ahh I found the issue - it is pdf.mjs not pdf.js at the end. Thanks !

nice yes I got that working too from pdf.js/examples/node/getinfo.mjs at master · mozilla/pdf.js · GitHub

glad that worked