This topic was automatically generated from Slack. You can find the original thread here.
The beta pipedream AI wrote the following code for me - which is awesome. However it throws an error: No “exports” main defined in /tmp/pdg/dist/code/0f6dad7d0a08503de7a52c70bfb9ab23d9f503ebeaa8220a3c0bdf838e68ffee/node_modules/langchain/package.json imported from /tmp/pdg/dist/code/0f6dad7d0a08503de7a52c70bfb9ab23d9f503ebeaa8220a3c0bdf838e68ffee/component.mjs
Can you please help resolve the error.
The code is:
import { axios } from “@pipedream/platform”;
import { PDFDocumentLoader, WordDocumentLoader, TextDocumentLoader, CharacterTextSplitter } from “langchain”;
I apologize for the issue you’re facing. It seems like the langchain package might not be properly configured for use within Pipedream. Unfortunately, I cannot fix the package issue directly. However, I can suggest an alternative approach to handle different file types. For PDF files, you can use the pdf-parse package, and for Word files, you can use the mammoth package. Here’s an updated version of your code using these packages: import { axios } from "@pipedream/platform"; import pdfParse from "pdf-parse"; import mammoth from "mammoth"; export default defineComponent({ props: { filePath: { type: "string", label: "File Path", }, }, async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = this.filePath.split('.').pop(); let text; switch(extension) { case 'pdf': const pdfData = await axios($, { url: this.filePath, responseType: "arraybuffer" }); text = (await pdfParse(pdfData.data)).text; break; case 'doc': case 'docx': const wordData = await axios($, { url: this.filePath, responseType: "arraybuffer" }); const wordBuffer = Buffer.from(wordData.data); text = (await mammoth.extractRawText({ buffer: wordBuffer })).value; break; case 'txt': const textData = await axios($, { url: this.filePath }); text = textData.data; break; default: throw new Error(`Unsupported file type: ${extension}`); } const textSplitter = (text, maxLength, overlap) => { const chunks = []; for (let i = 0; i < text.length; i += maxLength - overlap) { chunks.push(text.slice(i, i + maxLength)); } return chunks; }; const chunks = textSplitter(text, 1000, 200); return chunks; }, }); This code uses the pdf-parse package for PDF files and the mammoth package for Word files. For text files, it directly reads the content using axios. Please try this updated code and let me know if you face any issues.
OK. Now I get the following error. ENOENT: NO SUCH FILE OR DIRECTORY, OPEN ‘./TEST/DATA/05-VERSIONS-SPACE.PDF’
DETAILS
Error: ENOENT: no such file or directory, open './test/data/05-versions-space.pdf'
at Object.openSync (node:fs:603:3)
at Object.readFileSync (node:fs:471:35)
at Object.<anonymous> (/tmp/__pdg__/dist/code/3a7ff250bdd2932f5f9e65895d548fb98f590156da052781192baa79a5cd2a09/node_modules/.pnpm/pdf-parse@1.1.1/node_modules/pdf-parse/index.js:15:25)
at Module._compile (node:internal/modules/cjs/loader:1256:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
at Module.load (node:internal/modules/cjs/loader:1119:32)
at Module._load (node:internal/modules/cjs/loader:960:12)
at ModuleWrap.<anonymous> (node:internal/modules/esm/translators:169:29)
at ModuleJob.run (node:internal/modules/esm/module_job:194:25)
It looks like the error is caused by trying to open a file that doesn’t exist at the specified path (./test/data/05-versions-space.pdf). To fix this issue, please ensure that the file exists at the correct path and that the path is accessible within your Pipedream workflow. If you’re trying to read a file from an external source, you might need to download the file first using an HTTP request with the axios package from @pipedream/platform. Then, you can save the file to the /tmp directory before processing it. Remember, you should always use the /tmp directory for temporary files in Pipedream workflows.
we have set the file path to the /tmp directory file that is loaded. the issue is that ./test/data/05-versions-space.pdf is not specified by us. It is no wether in anything we are doing - it seems to be a test document that is part of the
It seems that the issue is related to the pdf-parse package, which might be trying to access a test file that doesn’t exist in your workflow. To resolve this issue, I recommend double-checking your usage of the pdf-parse package and ensuring that you’re passing the correct file path from the /tmp directory when using the package. If you’re still facing issues, please share the relevant code snippet where you’re using the pdf-parse package, so I can help you better.
this is the code snippet causing the error: async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = this.filePath.split(‘.’).pop(); let text;
I see the issue in your code snippet. You’re using this.filePath instead of the filePath variable you defined earlier. Please update your code as follows: ```
async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = filePath.split(‘.’).pop(); let text; switch (extension) { case ‘pdf’: const pdfData = await axios($, { url: filePath, responseType: “arraybuffer” }); text = (await pdfParse(pdfData.data)).text; break; } }
This is the updated code, that still generates the same error: async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = this.filePath.split(‘.’).pop(); let text;
I apologize for the confusion earlier. I see that the code still uses this.filePath instead of the filePath variable. Please update the code as follows: ```
async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = filePath.split(‘.’).pop(); let text; switch (extension) { case ‘pdf’: const pdfData = await axios($, { url: filePath, responseType: “arraybuffer” }); text = (await pdfParse(pdfData.data)).text; break; } }
this is the updated code that still generates the same error: async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = filePath.split(‘.’).pop(); let text;
I apologize for the confusion earlier. It seems I missed importing the required pdf-parse package. Please update your code as follows: ```
import { axios } from “@pipedream/platform”; import pdfParse from “pdf-parse”; export default defineComponent({ async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = filePath.split(‘.’).pop(); let text; switch (extension) { case ‘pdf’: const pdfData = await axios($, { url: filePath, responseType: “arraybuffer” }); text = (await pdfParse(pdfData.data)).text; break; } // You can return the extracted text or use it for further processing return text; }, });
this is the full updated code, which still generates the same error: import { axios } from “@pipedream/platform”; import pdfParse from “pdf-parse”; import mammoth from “mammoth”;
I apologize for the confusion earlier. I see that the issue might be related to the pdf-parse and mammoth imports. Since these are external packages, you’ll need to import them using their respective URLs. Please update your imports as follows: ```
import { axios } from “@pipedream/platform”; import pdfParse from “https://cdn.skypack.dev/pdf-parse”; import mammoth from “https://cdn.skypack.dev/mammoth”;
UserError: cannot require: https://cdn.skypack.dev/pdf-parse
at handleCode (file:///var/task/gud_builder.mjs:493:19)
at handleCodePath (file:///var/task/gud_builder.mjs:511:13)
at GudBuilder.parseNpmDependencies (file:///var/task/gud_builder.mjs:514:11)
at GudBuilder.packageCode (file:///var/task/gud_builder.mjs:281:24)
at GudBuilder.buildComponent (file:///var/task/gud_builder.mjs:138:18)
at async ComponentMaker.run (file:///var/task/component_maker.mjs:51:14)
at async Runtime.handler (file:///var/task/lambda_handler.mjs:921:14)
I apologize for the confusion earlier. It seems that using Skypack for importing pdf-parse is causing issues. Instead, let’s try using the pdfjs-dist package, which is compatible with Pipedream. Please update your imports and code as follows: ```
import { axios } from “@pipedream/platform”; import { getDocument } from “pdfjs-dist/es5/build/pdf”; export default defineComponent({ props: { filePath: { type: “string”, label: “File Path”, }, }, async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = filePath.split(‘.’).pop(); let text; switch (extension) { case ‘pdf’: const pdfData = await axios($, { url: filePath, responseType: “arraybuffer” }); const pdfDoc = await getDocument({ data: pdfData.data }).promise; const numPages = pdfDoc.numPages; text = “”; for (let i = 1; i <= numPages; i++) { const page = await pdfDoc.getPage(i); const content = await page.getTextContent(); text += content.items.map(item => item.str).join(" ") + “\n”; } break; // … handle other file types } // You can return the extracted text or use it for further processing return text; }, });
UserError: cannot require: https://cdn.skypack.dev/mammoth
at handleCode (file:///var/task/gud_builder.mjs:493:19)
at handleCodePath (file:///var/task/gud_builder.mjs:511:13)
at GudBuilder.parseNpmDependencies (file:///var/task/gud_builder.mjs:514:11)
at GudBuilder.packageCode (file:///var/task/gud_builder.mjs:281:24)
at GudBuilder.buildComponent (file:///var/task/gud_builder.mjs:138:18)
at async ComponentMaker.run (file:///var/task/component_maker.mjs:51:14)
at async Runtime.handler (file:///var/task/lambda_handler.mjs:921:14)
I’ve commented out the mammoth sections to focus on getting the PDF case to work - it generates a new error: CODE ERROR CANNOT FIND MODULE ‘/TMP/PDG/DIST/CODE/218495F6BFE5B580C43E54429FB3BD55355E7853F92FC9C982099E795FC885D5/NODE_MODULES/PDFJS-DIST/ES5/BUILD/PDF’ IMPORTED FROM /TMP/PDG/DIST/CODE/218495F6BFE5B580C43E54429FB3BD55355E7853F92FC9C982099E795FC885D5/COMPONENT.MJS
DETAILS
Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/tmp/__pdg__/dist/code/218495f6bfe5b580c43e54429fb3bd55355e7853f92fc9c982099e795fc885d5/node_modules/pdfjs-dist/es5/build/pdf' imported from /tmp/__pdg__/dist/code/218495f6bfe5b580c43e54429fb3bd55355e7853f92fc9c982099e795fc885d5/component.mjs
at __node_internal_captureLargerStackTrace (node:internal/errors:496:5)
at new NodeError (node:internal/errors:405:5)
at finalizeResolution (node:internal/modules/esm/resolve:332:11)
at moduleResolve (node:internal/modules/esm/resolve:995:10)
at moduleResolveWithNodePath (node:internal/modules/esm/resolve:939:12)
at defaultResolve (node:internal/modules/esm/resolve:1181:79)
at nextResolve (node:internal/modules/esm/loader:163:28)
at resolve (file:///var/task/import-hooks.mjs:7:10)
at nextResolve (node:internal/modules/esm/loader:163:28)
at ESMLoader.resolve (node:internal/modules/esm/loader:835:30)
at ESMLoader.getModuleJob (node:internal/modules/esm/loader:424:18)
at ModuleWrap.<anonymous> (node:internal/modules/esm/module_job:77:40)
at link (node:internal/modules/esm/module_job:76:36)
the updated code is: import { axios } from “@pipedream/platform”; import { getDocument } from “pdfjs-dist/es5/build/pdf”; //import mammoth from “mammoth”;
It looks like you’re having trouble importing the pdfjs-dist package. Please make sure you’re using the correct import statement for the package. Try changing your import statement to: import { getDocument } from "pdfjs-dist/esm/pdf"; Also, ensure that the pdfjs-dist package is available in your Pipedream environment. If you still encounter issues, please let me know and I’ll help you further.