How to Prevent Duplicate Data in Webhook Source using Pipedream?

This topic was automatically generated from Slack. You can find the original thread here.

Hi there - long-time happy Pipedream user here! I’ve been using Pipedream in our Event Notifications tutorial since we launched the first private preview a few months ago - feedback has been very positive - it’s a great way to show how this kind of integration works.

I just came across one issue… Sometimes, Pipedream takes a few seconds to respond. Our platform can interpret that as an issue and retry the request, leading to duplicate data. I see the docs on deduplication in a custom source… Is there any way to do the same for a Webhook source?

Never mind - I found the answer at Using Data Stores - Pipedream

I was searching for ‘deduplication’, which doesn’t appear on this doc page.

thanks for the kind words!

We’ve also considered adding a deduper directly on the HTTP trigger. For example, envision you could specify a key on the incoming event data (like event.headers["x-event-id"] that represents the unique event ID from a third party API webhook). Pipedream would dedupe on that key and emit only the first event with that ID.

That would essentially wrap what that example you linked to does, behind the scenes. We’d also ideally not charge you credits for duplicate events.

Would something like that work?

Hi - nice to ‘meet’ you! I’ve been using Pipedream, I think, pretty much since its inception, when [requestbin.com](http://requestbin.com) took over from request.bin :slightly_smiling_face:

My webhook receives a batch of events in the POST payload, each with its own eventId:

{
  "events": [
    {
      "eventId": "1234",
      ...
    },
    {
      "eventId": "2345",
      ...
    },
    ...
  ]
}

so a built-in deduper would need to be a bit more elaborate.

I think this is a common pattern, though (you probably know better than me!), so it might be possible to just let the user specify the array key (events in the example above) and the event ID key (eventId) within the array elements.

makes a lot of sense. Like you noted with a custom source it’s trivial, with the built-in deduper we’d have to figure out the right API to expose to developers but the array use case is so standard I’d hope we’d be able to develop a solution

this isn’t on our immediate backlog but it’s helpful to know. For now the data store option is good if that works for you

The data store works great, the only downside is the 50-key limit for free accounts, and the fact that it’s kind of overkill for simple deduping. I could imagine a similar feature that was simply a set of keys (no values), with just set, has and delete operations. It might even have a configurable ‘ttl’ so that keys could be deleted as they age out.

Having said that, I just realized that, since I’m using a timestamp as the value for my data store entries, I could easily create a second workflow to delete data store entries based on that timestamp, or any condition at all :slightly_smiling_face:

TTL on those keys is a common ask and def makes sense