Why does workflow execution time increase after deployment despite individual steps taking less time, and how can it be mitigated?

This topic was automatically generated from Slack. You can find the original thread here.

We have a fairly large workflow that needs to maintain pretty strict run timing. The workflow consistently runs in around or under 30 seconds, but often not long after we deploy a change to the workflow the time it takes for the workflow to execute can balloon to 2 or 3 times the normal length (up to 90 seconds), despite the total time of all the steps added together taking less than 30 seconds.

Is there an explanation for this and or a way to mitigate it? Because of this deploying changes is causing instability in our system.

It really depends on the workflow. if you’re e.g. talking to specific APIs that have a high response variance, that can impact the variance in the total workflow time.

Re: the issue you’re seeing when testing: when you test a workflow, we give you 3GB of memory / proportional CPU so you can test quickly. When you deploy the workflow, we set the memory to the default 256MB, but if you need to get better performance, you can try to increase the memory of the workflow in the workflow’s Settings. That may help if you’re constrained on memory / CPU.

Workflows with many steps can also encounter longer “cold starts”, where it takes some time for us to spin up the container to run the environment. We’re about to ship a feature to let you run N workers in a “warm” state so you can make sure N containers are always available for requests, which should eliminate that specific delta.

We have the memory of the workflow set to 4GB so that shouldn’t be the issue, and since each step of the workflow gives its individual execution time we can see that most of the time the workflow spends executing it isn’t actually in any of the steps. I suppose this could be the cold start issue, but the workflow itself is triggered 50+ times per day so it is usually “warm”, although if it resets when a new change is deployed that could explain it I suppose

yes, and you can also spin up a new worker if two requests arrive concurrently. Is that happening, as well?

yes we have unlimited concurrency and throttling off

got it, so yes if we receive one request and the current “warm” worker handles that, and a second request arrives during the time that first run is processing, we’d spin up another container to handle the 2nd request

that could also explain the more-frequent cold starts

Oh that is interesting. So the new feature will allow us to keep multiple workers warm to run concurrent requests?

that’s right!

you’d be able to specify 1, 2… N workers. You’d be charged credits based on the total uptime of the workers.