How to resolve "No 'exports' main defined" error in the code generated by Pipedream AI?

user-1 · October 31, 2023, 1:20pm

This topic was automatically generated from Slack. You can find the original thread here.

The beta pipedream AI wrote the following code for me - which is awesome. However it throws an error: No “exports” main defined in /tmp/pdg/dist/code/0f6dad7d0a08503de7a52c70bfb9ab23d9f503ebeaa8220a3c0bdf838e68ffee/node_modules/langchain/package.json imported from /tmp/pdg/dist/code/0f6dad7d0a08503de7a52c70bfb9ab23d9f503ebeaa8220a3c0bdf838e68ffee/component.mjs
Can you please help resolve the error.

The code is:
import { axios } from “@pipedream/platform”;
import { PDFDocumentLoader, WordDocumentLoader, TextDocumentLoader, CharacterTextSplitter } from “langchain”;

export default defineComponent({
props: {
filePath: {
type: “string”,
label: “File Path”,
},
},
async run({ steps, $ }) {
let documentLoader;
const filePath = steps.download_file_to_tmp.$return_value.tmpPath;
const extension = this.filePath.split(‘.’).pop();

switch(extension) {
  case 'pdf':
    documentLoader = new PDFDocumentLoader();
    break;
  case 'doc':
  case 'docx':
    documentLoader = new WordDocumentLoader();
    break;
  case 'txt':
    documentLoader = new TextDocumentLoader();
    break;
  default:
    throw new Error(`Unsupported file type: ${extension}`);
}

const document = await documentLoader.load(this.filePath);
const textSplitter = new CharacterTextSplitter(1000, 200);
const chunks = textSplitter.split(document.getText());

return chunks;

},
});

user-1 · October 31, 2023, 1:20pm

I apologize for the issue you’re facing. It seems like the langchain package might not be properly configured for use within Pipedream. Unfortunately, I cannot fix the package issue directly. However, I can suggest an alternative approach to handle different file types. For PDF files, you can use the pdf-parse package, and for Word files, you can use the mammoth package. Here’s an updated version of your code using these packages: import { axios } from "@pipedream/platform"; import pdfParse from "pdf-parse"; import mammoth from "mammoth"; export default defineComponent({ props: { filePath: { type: "string", label: "File Path", }, }, async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = this.filePath.split('.').pop(); let text; switch(extension) { case 'pdf': const pdfData = await axios($, { url: this.filePath, responseType: "arraybuffer" }); text = (await pdfParse(pdfData.data)).text; break; case 'doc': case 'docx': const wordData = await axios($, { url: this.filePath, responseType: "arraybuffer" }); const wordBuffer = Buffer.from(wordData.data); text = (await mammoth.extractRawText({ buffer: wordBuffer })).value; break; case 'txt': const textData = await axios($, { url: this.filePath }); text = textData.data; break; default: throw new Error(`Unsupported file type: ${extension}`); } const textSplitter = (text, maxLength, overlap) => { const chunks = []; for (let i = 0; i < text.length; i += maxLength - overlap) { chunks.push(text.slice(i, i + maxLength)); } return chunks; }; const chunks = textSplitter(text, 1000, 200); return chunks; }, }); This code uses the pdf-parse package for PDF files and the mammoth package for Word files. For text files, it directly reads the content using axios. Please try this updated code and let me know if you face any issues.

user-1 · October 31, 2023, 1:20pm

OK. Now I get the following error.
ENOENT: NO SUCH FILE OR DIRECTORY, OPEN ‘./TEST/DATA/05-VERSIONS-SPACE.PDF’

DETAILS

Error: ENOENT: no such file or directory, open './test/data/05-versions-space.pdf'
    at Object.openSync (node:fs:603:3)
    at Object.readFileSync (node:fs:471:35)
    at Object.<anonymous> (/tmp/__pdg__/dist/code/3a7ff250bdd2932f5f9e65895d548fb98f590156da052781192baa79a5cd2a09/node_modules/.pnpm/pdf-parse@1.1.1/node_modules/pdf-parse/index.js:15:25)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/translators:169:29)
    at ModuleJob.run (node:internal/modules/esm/module_job:194:25)

user-1 · October 31, 2023, 1:20pm

It looks like the error is caused by trying to open a file that doesn’t exist at the specified path (./test/data/05-versions-space.pdf). To fix this issue, please ensure that the file exists at the correct path and that the path is accessible within your Pipedream workflow. If you’re trying to read a file from an external source, you might need to download the file first using an HTTP request with the axios package from @pipedream/platform. Then, you can save the file to the /tmp directory before processing it. Remember, you should always use the /tmp directory for temporary files in Pipedream workflows.

user-1 · October 31, 2023, 1:20pm

we have set the file path to the /tmp directory file that is loaded. the issue is that ./test/data/05-versions-space.pdf is not specified by us. It is no wether in anything we are doing - it seems to be a test document that is part of the

/tmp/__pdg__/dist/code/001ae3339751a4fbebb3895a0ecce5f88e4d1748573041b306bbe2980d113aa0/node_modules/.pnpm/pdf-parse@1.1.1/node_modules/pdf-parse/index.js:15:25

user-1 · October 31, 2023, 1:20pm

It seems that the issue is related to the pdf-parse package, which might be trying to access a test file that doesn’t exist in your workflow. To resolve this issue, I recommend double-checking your usage of the pdf-parse package and ensuring that you’re passing the correct file path from the /tmp directory when using the package. If you’re still facing issues, please share the relevant code snippet where you’re using the pdf-parse package, so I can help you better.

user-1 · October 31, 2023, 1:20pm

this is the code snippet causing the error:
async run({ steps, $ }) {
const filePath = steps.download_file_to_tmp.$return_value.tmpPath;
const extension = this.filePath.split(‘.’).pop();
let text;

**switch**(extension) {
  **case** 'pdf':
    **const** pdfData = **await** axios($, { url: **this**.filePath, responseType: "arraybuffer" });
    text = (**await** pdfParse(pdfData.data)).text;
    **break**;

user-1 · October 31, 2023, 1:20pm

I see the issue in your code snippet. You’re using this.filePath instead of the filePath variable you defined earlier. Please update your code as follows: ```
async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = filePath.split(‘.’).pop(); let text; switch (extension) { case ‘pdf’: const pdfData = await axios($, { url: filePath, responseType: “arraybuffer” }); text = (await pdfParse(pdfData.data)).text; break; } }

user-1 · October 31, 2023, 1:20pm

This is the updated code, that still generates the same error:
async run({ steps, $ }) {
const filePath = steps.download_file_to_tmp.$return_value.tmpPath;
const extension = this.filePath.split(‘.’).pop();
let text;

**switch**(extension) {
  **case** 'pdf':
    **const** pdfData = **await** axios($, { url: **this**.filePath, responseType: "arraybuffer" });
    text = (**await** pdfParse(pdfData.data)).text;
    **break**;

user-1 · October 31, 2023, 1:20pm

I apologize for the confusion earlier. I see that the code still uses this.filePath instead of the filePath variable. Please update the code as follows: ```
async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = filePath.split(‘.’).pop(); let text; switch (extension) { case ‘pdf’: const pdfData = await axios($, { url: filePath, responseType: “arraybuffer” }); text = (await pdfParse(pdfData.data)).text; break; } }

user-1 · October 31, 2023, 1:20pm

this is the updated code that still generates the same error:
async run({ steps, $ }) {
const filePath = steps.download_file_to_tmp.$return_value.tmpPath;
const extension = filePath.split(‘.’).pop();
let text;

**switch**(extension) {
  **case** 'pdf':
    **const** pdfData = **await** axios($, { url: filePath, responseType: "arraybuffer" });
    text = (**await** pdfParse(pdfData.data)).text;
    **break**;

user-1 · October 31, 2023, 1:20pm

I apologize for the confusion earlier. It seems I missed importing the required pdf-parse package. Please update your code as follows: ```
import { axios } from “@pipedream/platform”; import pdfParse from “pdf-parse”; export default defineComponent({ async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = filePath.split(‘.’).pop(); let text; switch (extension) { case ‘pdf’: const pdfData = await axios($, { url: filePath, responseType: “arraybuffer” }); text = (await pdfParse(pdfData.data)).text; break; } // You can return the extracted text or use it for further processing return text; }, });

user-1 · October 31, 2023, 1:20pm

this is the full updated code, which still generates the same error:
import { axios } from “@pipedream/platform”;
import pdfParse from “pdf-parse”;
import mammoth from “mammoth”;

export default defineComponent({
props: {
filePath: {
type: “string”,
label: “File Path”,
},
},
async run({ steps, $ }) {
const filePath = steps.download_file_to_tmp.$return_value.tmpPath;
const extension = filePath.split(‘.’).pop();
let text;

**switch**(extension) {
  **case** 'pdf':
    **const** pdfData = **await** axios($, { url: filePath, responseType: "arraybuffer" });
    text = (**await** pdfParse(pdfData.data)).text;
    **break**;
  **case** 'doc':
  **case** 'docx':
    **const** wordData = **await** axios($, { url: filePath, responseType: "arraybuffer" });
    **const** wordBuffer = Buffer.**from**(wordData.data);
    text = (**await** mammoth.extractRawText({ buffer: wordBuffer })).value;
    **break**;
  **case** 'txt':
    **const** textData = **await** axios($, { url: filePath });
    text = textData.data;
    **break**;
  **default**:
    **throw** **new** Error(`Unsupported file type: ${extension}`);
}

**const** textSplitter = (text, maxLength, overlap) => {
  **const** chunks = [];
  **for** (**let** i = 0; i < text.length; i += maxLength - overlap) {
    chunks.push(text.slice(i, i + maxLength));
  }
  **return** chunks;
};

**const** chunks = textSplitter(text, 1000, 200);
**return** chunks;

},
});

user-1 · October 31, 2023, 1:20pm

I apologize for the confusion earlier. I see that the issue might be related to the pdf-parse and mammoth imports. Since these are external packages, you’ll need to import them using their respective URLs. Please update your imports as follows: ```
import { axios } from “@pipedream/platform”; import pdfParse from “https://cdn.skypack.dev/pdf-parse”; import mammoth from “https://cdn.skypack.dev/mammoth”;

user-1 · October 31, 2023, 1:20pm

Now it generates a different error:
CODE ERROR
CANNOT REQUIRE: HTTPS://CDN.SKYPACK.DEV/PDF-PARSE

DETAILS

UserError: cannot require: https://cdn.skypack.dev/pdf-parse
    at handleCode (file:///var/task/gud_builder.mjs:493:19)
    at handleCodePath (file:///var/task/gud_builder.mjs:511:13)
    at GudBuilder.parseNpmDependencies (file:///var/task/gud_builder.mjs:514:11)
    at GudBuilder.packageCode (file:///var/task/gud_builder.mjs:281:24)
    at GudBuilder.buildComponent (file:///var/task/gud_builder.mjs:138:18)
    at async ComponentMaker.run (file:///var/task/component_maker.mjs:51:14)
    at async Runtime.handler (file:///var/task/lambda_handler.mjs:921:14)

user-1 · October 31, 2023, 1:20pm

I apologize for the confusion earlier. It seems that using Skypack for importing pdf-parse is causing issues. Instead, let’s try using the pdfjs-dist package, which is compatible with Pipedream. Please update your imports and code as follows: ```
import { axios } from “@pipedream/platform”; import { getDocument } from “pdfjs-dist/es5/build/pdf”; export default defineComponent({ props: { filePath: { type: “string”, label: “File Path”, }, }, async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = filePath.split(‘.’).pop(); let text; switch (extension) { case ‘pdf’: const pdfData = await axios($, { url: filePath, responseType: “arraybuffer” }); const pdfDoc = await getDocument({ data: pdfData.data }).promise; const numPages = pdfDoc.numPages; text = “”; for (let i = 1; i <= numPages; i++) { const page = await pdfDoc.getPage(i); const content = await page.getTextContent(); text += content.items.map(item => item.str).join(" ") + “\n”; } break; // … handle other file types } // You can return the extracted text or use it for further processing return text; }, });

user-1 · October 31, 2023, 1:20pm

now I get the following error:
CODE ERROR
CANNOT REQUIRE: HTTPS://CDN.SKYPACK.DEV/MAMMOTH

DETAILS

UserError: cannot require: https://cdn.skypack.dev/mammoth
    at handleCode (file:///var/task/gud_builder.mjs:493:19)
    at handleCodePath (file:///var/task/gud_builder.mjs:511:13)
    at GudBuilder.parseNpmDependencies (file:///var/task/gud_builder.mjs:514:11)
    at GudBuilder.packageCode (file:///var/task/gud_builder.mjs:281:24)
    at GudBuilder.buildComponent (file:///var/task/gud_builder.mjs:138:18)
    at async ComponentMaker.run (file:///var/task/component_maker.mjs:51:14)
    at async Runtime.handler (file:///var/task/lambda_handler.mjs:921:14)

user-1 · October 31, 2023, 1:20pm

I apologize for the confusion earlier. It seems that using Skypack for importing mammoth is causing issues. Instead, let’s try using the js-xlsx package to handle .docx files, which is compatible with Pipedream. Please update your imports and code as follows: ```
import { axios } from “@pipedream/platform”; import { getDocument } from “pdfjs-dist/es5/build/pdf”; import XLSX from “xlsx”; export default defineComponent({ props: { filePath: { type: “string”, label: “File Path”, }, }, async run({ steps, $ }) { const filePath = steps.download_file_to_tmp.$return_value.tmpPath; const extension = filePath.split(‘.’).pop(); let text; switch (extension) { case ‘pdf’: const pdfData = await axios($, { url: filePath, responseType: “arraybuffer” }); const pdfDoc = await getDocument({ data: pdfData.data }).promise; const numPages = pdfDoc.numPages; text = “”; for (let i = 1; i <= numPages; i++) { const page = await pdfDoc.getPage(i); const content = await page.getTextContent(); text += content.items.map(item => item.str).join(" ") + “\n”; } break; case ‘docx’: const docxData = await axios($, { url: filePath, responseType: “arraybuffer” }); const workbook = XLSX.read(docxData.data, { type: “buffer” }); const sheet = workbook.Sheets[workbook.SheetNames[0]]; text = XLSX.utils.sheet_to_txt(sheet); break; // … handle other file types } // You can return the extracted text or use it for further processing return text; }, });

user-1 · October 31, 2023, 1:20pm

I’ve commented out the mammoth sections to focus on getting the PDF case to work - it generates a new error:
CODE ERROR
CANNOT FIND MODULE ‘/TMP/PDG/DIST/CODE/218495F6BFE5B580C43E54429FB3BD55355E7853F92FC9C982099E795FC885D5/NODE_MODULES/PDFJS-DIST/ES5/BUILD/PDF’ IMPORTED FROM /TMP/PDG/DIST/CODE/218495F6BFE5B580C43E54429FB3BD55355E7853F92FC9C982099E795FC885D5/COMPONENT.MJS

DETAILS

Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/tmp/__pdg__/dist/code/218495f6bfe5b580c43e54429fb3bd55355e7853f92fc9c982099e795fc885d5/node_modules/pdfjs-dist/es5/build/pdf' imported from /tmp/__pdg__/dist/code/218495f6bfe5b580c43e54429fb3bd55355e7853f92fc9c982099e795fc885d5/component.mjs
    at __node_internal_captureLargerStackTrace (node:internal/errors:496:5)
    at new NodeError (node:internal/errors:405:5)
    at finalizeResolution (node:internal/modules/esm/resolve:332:11)
    at moduleResolve (node:internal/modules/esm/resolve:995:10)
    at moduleResolveWithNodePath (node:internal/modules/esm/resolve:939:12)
    at defaultResolve (node:internal/modules/esm/resolve:1181:79)
    at nextResolve (node:internal/modules/esm/loader:163:28)
    at resolve (file:///var/task/import-hooks.mjs:7:10)
    at nextResolve (node:internal/modules/esm/loader:163:28)
    at ESMLoader.resolve (node:internal/modules/esm/loader:835:30)
    at ESMLoader.getModuleJob (node:internal/modules/esm/loader:424:18)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/module_job:77:40)
    at link (node:internal/modules/esm/module_job:76:36)

the updated code is:
import { axios } from “@pipedream/platform”;
import { getDocument } from “pdfjs-dist/es5/build/pdf”;
//import mammoth from “mammoth”;

export default defineComponent({
props: {
filePath: {
type: “string”,
label: “File Path”,
},
},
async run({ steps, $ }) {
const filePath = steps.download_file_to_tmp.$return_value.tmpPath;
const extension = filePath.split(‘.’).pop();
let text;

**switch**(extension) {
  **case** 'pdf':
    **const** pdfData = **await** axios($, { url: filePath, responseType: "arraybuffer" });
    **const** pdfDoc = **await** getDocument({ data: pdfData.data }).promise;
    **const** numPages = pdfDoc.numPages;
    text = "";

    **for** (**let** i = 1; i <= numPages; i++) {
      **const** page = **await** pdfDoc.getPage(i);
      **const** content = **await** page.getTextContent();
      text += content.items.map(item => item.str).join(" ") + "\n";
    }
    **break**;

// case ‘doc’:
// case ‘docx’:
// const wordData = await axios($, { url: filePath, responseType: “arraybuffer” });
// const wordBuffer = Buffer.from(wordData.data);
// text = (await mammoth.extractRawText({ buffer: wordBuffer })).value;
// break;
case ‘txt’:
const textData = await axios($, { url: filePath });
text = textData.data;
break;
default:
throw new Error(Unsupported file type: ${extension});
}

**const** textSplitter = (text, maxLength, overlap) => {
  **const** chunks = [];
  **for** (**let** i = 0; i < text.length; i += maxLength - overlap) {
    chunks.push(text.slice(i, i + maxLength));
  }
  **return** chunks;
};

**const** chunks = textSplitter(text, 1000, 200);
**return** chunks;

},
});

user-1 · October 31, 2023, 1:20pm

It looks like you’re having trouble importing the pdfjs-dist package. Please make sure you’re using the correct import statement for the package. Try changing your import statement to: import { getDocument } from "pdfjs-dist/esm/pdf"; Also, ensure that the pdfjs-dist package is available in your Pipedream environment. If you still encounter issues, please let me know and I’ll help you further.