Is a Pipedream webhook listener Highly Available (HA)?

The only source for information essential for our financial reporting is a webhook. We lose essential info if we don’t receive and process a webhook. This is likely not uncommon across the webhook ecosystem: a webhook is sent but not received, and a webhook sender doesn’t take the effort to retry later.

In our case, the data in the webhook is only accessible via the webhook event – the app doesn’t provide another means of querying the data we need nor does it provide an API on the history of events sent via a webhook.

Because of this, I need to know how Pipedream is implementing the Webhook/HTTP triggering tools.

  • Does Pipedream have a high availability task queue which holds requests sent to the URL provided by an HTTP trigger before that event source processes it?
  • When two workflows subscribe to the same custom event source, is the webhook-handling availability profile defined by the custom event source?

The Webhooks doc page, says “Webhooks are managed at an account-level, and you send data to these webhooks using subscriptions.”, which makes me think it is a standalone entity, rather than a “macro”-like thing used to re-use a standard HTTP trigger having the same parameters (URL, etc.) across workflows using the same parameters.

The Pipedream Status page says all backend services have 100% uptime over the last three months, which leads me to believe that the Pipedream team is striving for High Availability. It’s the “Event Sources - HTTP” one, right?

@chexxor Happy to help:

Does Pipedream have a high availability task queue which holds requests sent to the URL provided by an HTTP trigger before that event source processes it?

There’s not a user-facing queue that stores the HTTP request data prior to invoking the event source. As soon as the HTTP request arrives, it triggers the event source and the corresponding event data is emitted. By default, you’ve probably seen that you have access to two distinct HTTP sources: one that emits the full HTTP request, and another that emits just the payload. I’d recommend using the source that emits the full request in your case, since it sounds like you want to hold on to all of the details sent by the provider.

The last 100 events emitted by the event source are made available via API. As you noted, you can also create a subscription that delivers those events to a destination service you own via webhook, or simply invoke a workflow.

As you saw on our status page, this system is highly available, but we currently make no guarantees on uptime or have SLAs for that service. Occasionally, we may return a 5XX HTTP error to the client making the webhook request. In that case, we haven’t stored the details of the incoming request, and we expect the client to retry the request at a later time.

When two workflows subscribe to the same custom event source, is the webhook-handling availability profile defined by the custom event source?

Can you clarify what you mean by “webhook-handling availability” in this context?

I am referring to the likelihood of an incoming HTTP request being dropped. I was trying to reverse-design your implementation – like, maybe an event source includes a queue to ensure guaranteed processing? If so, would that also apply to a standard webhook event source?

I think this question doesn’t make much sense and is too hypothetical to answer, so don’t worry because a perfect response might be impossible.

In general, if you aren’t using custom responses, than once you get the 200 reply from our systems, then your requests are in our internal queues. The request may still get dropped if you are using concurrency controls and you fill up your queue. I have written a high level blog post: Scalable Concurrency Controls for Heterogeneous Workloads that may help.

If you are using custom responses, there is no out of band way to know if your requests has been enqueued, but once you get the custom response back, you know your workflow has been run.

1 Like

I recently discovered this doc page which helps me understand that there is, indeed, a platform separation between “Event Sources” and “Triggers”, which are the built-in, default options for a workflow trigger.

What’s the difference between an event source and a trigger?
~ Triggers (pipedream.com)

So it makes sense to believe that these might have different performance or availability behavior.

It’s not important to me now, but it’s good to know.