This topic was automatically generated from Slack. You can find the original thread here.
Luke : Hi! I’m working on a workflow with a large payload and keep getting hit with “Workflow payload limit exceeded” and “out of memory”. I’ve just started building it, but the final version will look like:
• Fetch ~60k documents from FaunaDB
• For each document, pass it to the Github API and get back data from Github
• For each API response from Github, update the corresponding document in Fauna.
Right now, I’m getting the errors on step 1. I’ve read about using /tmp but trying to get my head around it all, are there any examples of similarly large workflows?
Dylan Sather (Pipedream) : you might be getting the “Workflow payload limit exceeded” error because of large step exports or console.log statements. The total size of this data can’t currently exceed 8MB. Does that sound like it might be the case here?
Dylan Sather (Pipedream) : in general here, like you mentioned, I’d recommend serializing the documents to JSON and writing that data to file(s) in /tmp
Dylan Sather (Pipedream) : you may encounter other limits here, though (e.g. the max workflow timeout is 5 minutes), and may consider a mechanism to chunk the input documents so that you can run the workflow multiple times to process all 60k. Let’s see if we can get it working, first
Luke : I have it reading and writing a JSON, but am not sure where to go next / how to go about writing a mechanism to “chunk” my files. Should that mechanism read the JSON then perform an action? If you have any color here that would be really helpful!
Dylan Sather (Pipedream) : Re: chunking above I was mainly saying that processing 60k documents at once, in a single run of the workflow, may yield some other errors. For example I don’t know if you’ll be able to make requests to the Github API for 60k documents and successfully process those responses within 5 minutes (max execution time for a Pipedream workflow).
In this case I’d recommend using an HTTP trigger for your workflow, which accepts a list / range of Fauna document IDs as input, or some identifier that allows you to run the workflow on a subset of documents. The workflow will receive that input from the HTTP payload, for example, and fetch the correct documents from Fauna. That way you can run the workflow on e.g. 5000 documents at a time.
Was that your question re: chunking or were you asking something else?