Workflow payload limit exceeded and Out of Memory error on workflow

This topic was automatically generated from Slack. You can find the original thread here.

Luke : Hi! I’m working on a workflow with a large payload and keep getting hit with “Workflow payload limit exceeded” and “out of memory”. I’ve just started building it, but the final version will look like:

• Fetch ~60k documents from FaunaDB
• For each document, pass it to the Github API and get back data from Github
• For each API response from Github, update the corresponding document in Fauna.
Right now, I’m getting the errors on step 1. I’ve read about using /tmp but trying to get my head around it all, are there any examples of similarly large workflows?

Dylan Sather (Pipedream) : you might be getting the “Workflow payload limit exceeded” error because of large step exports or console.log statements. The total size of this data can’t currently exceed 8MB. Does that sound like it might be the case here?

Luke : Yep that sounds about right! I need to pass the 60k documents to step #2, not sure how to do it without getting limited.

Dylan Sather (Pipedream) : Have you seen https://docs.pipedream.com/workflows/steps/code/nodejs/working-with-files/ ?

Dylan Sather (Pipedream) : in general here, like you mentioned, I’d recommend serializing the documents to JSON and writing that data to file(s) in /tmp

Dylan Sather (Pipedream) : that way you don’t need to pass the documents via step exports, and can write them in one step and read them in the next

Dylan Sather (Pipedream) : you may encounter other limits here, though (e.g. the max workflow timeout is 5 minutes), and may consider a mechanism to chunk the input documents so that you can run the workflow multiple times to process all 60k. Let’s see if we can get it working, first

Luke : Thank you! That makes sense. I’ll implement tomorrow and let you know how that goes.

Luke : I have it reading and writing a JSON, but am not sure where to go next / how to go about writing a mechanism to “chunk” my files. Should that mechanism read the JSON then perform an action? If you have any color here that would be really helpful!

Dylan Sather (Pipedream) : Re: chunking above I was mainly saying that processing 60k documents at once, in a single run of the workflow, may yield some other errors. For example I don’t know if you’ll be able to make requests to the Github API for 60k documents and successfully process those responses within 5 minutes (max execution time for a Pipedream workflow).

In this case I’d recommend using an HTTP trigger for your workflow, which accepts a list / range of Fauna document IDs as input, or some identifier that allows you to run the workflow on a subset of documents. The workflow will receive that input from the HTTP payload, for example, and fetch the correct documents from Fauna. That way you can run the workflow on e.g. 5000 documents at a time.

Was that your question re: chunking or were you asking something else?

Luke : Thank you! That makes sense. I’ll try to implement and let you know if I run into any issues.