This topic was automatically generated from Slack. You can find the original thread here.
Is there in-depth documentation on retry mechanisms & fault tolerance? In particular, do PD sources have a retry mechanism to retrigger workflows when they fail? (more details in the thread)
I’ve been using a specific pattern with good results for webhook triggered workflows. I guess most people are not aware of it since it’s apparently not documented nor explicitly facilitated/suggested by the builder UI. The pattern is to have an HTTP trigger with the option “return custom response from workflow”. Then only respond HTTP success (200) after the steps completed successfully (using await $.respond({ status: 200, body: 'ok' });). Whenever the workflow fails, it would respond an 5xx error HTTP response. This is great because it leverages the retry mechanism of the webhook source (assuming your source has one - please check it and also check their retry and discard policy).
My question is whether this retry mechanism also work for workflows triggered by PD sources (for example the official typeform new-submission). Specifically, will the PD source retrigger its downstream workflows when they fail? If not, I would have to refactor replacing the PD source with a raw HTTP trigger then follow the above pattern, right?
We’ve completed Auto-retry functionality for workflows, but not yet sources. We’re planning to release the workflow setting soon!
When you enable the setting on a specific workflow, we’d retry failed runs with backoff (and you could throw your own error to trigger a retry, if necessary). We haven’t exposed the backoff settings / delay to the user, so you can’t customize the retry behavior quite yet, but we’re experimenting with that.
Auto-retry doesn’t yet apply to sources, but I passed that feedback to the team.
We’ll email you soon when we release this, and you can follow this issue for more updates.
Cool! Can you clarify “it doesn’t yet apply to sources” ? I’m assuming that the source won’t retry at this point; but what about a source-triggered workflow: will you auto-retry failed wf execution when enabled?
That’s correct. Once the source emits the event to the workflow, if the workflow fails, it will retry.
But if the event source execution itself fails (e.g. we fetch all events since the last run, but the API endpoint is down), that execution won’t retry.
Timer-based event sources are built to be resilient to this, and will pull events only since the last successful run. But some webhook-based sources make API calls within the source, and if those fail, they would not retry (unless the source returns the 5XX error to the service that sent the webhook)
If an event source throws, we should return a 400 error to the caller with the message “Error in workflow”. I’m looking into it more just to confirm that happens for the Typeform source.
On regular HTTP triggered workflow: if any step throws, it’s not enough to respond an error to HTTP. The key setting is to enable option “return custom response from workflow” in the HTTP trigger. Just wondering how this translates to source definitions
Yes, looking at Typeform, it will not. The HTTP interface has to set customResponse: true for the error to be returned to the client. If that’s false (default), we’ll issue a 200 OK immediately.
There are some challenges making this the default behavior for all sources, but we could enable for Typeform and address the question more generally for other sources.
I’ll open an issue for Typeform and put it on the board. I believe you’re probably familiar enough with components to make the change (we just need to set customResponse: true in the HTTP interface prop, and issue a 200 OK at the appropriate place in the run method). So if you want to give it a try on your own and contribute back, feel free! Otherwise we’ll get to it soon.