How does the last dedupe strategy work?

This topic was automatically generated from Slack. You can find the original thread here.

Jay Vercellone : Good evening :slightly_smiling_face:
Anyone has any experience using the last dedupe strategy? I’m giving it a try but it doesn’t seem to be working as per my understanding

calpa : https://docs.pipedream.com/workflows/steps/code/state/#example-workflow-dedupe-incoming-data

Jay Vercellone : Actually I was referring to this:

https://github.com/PipedreamHQ/pipedream/blob/master/COMPONENT-API.md#dedupe-strategies

Giao Phan : What are you seeing that seems wrong?

Jay Vercellone : Let me give you an example.
Let’s say an event source receives the following webhook calls/events:

Event 1:

{
  "id": "foo",
  "updated_on": "2020-01-01"
}

Event 2:

{
  "id": "foo",
  "updated_on": "2020-01-01"
}

Event 3:

{
  "id": "foo",
  "updated_on": "2020-01-02"
}

What I expect from the last dedupe strategy is for events #1 and #3 to be processed, and event 2 be ignored.

That is, when event #2 arrives, there was already an event with the same ID with an earlier/equal timestamp, and so we should skip this one.

Then, when event 3 arrives, it finds a match for the ID as well, but in this case the timestamp is greater and so we’d process this new event.

Giao Phan : The id field is the dedupe key, so if you pass the same one for each event, they will always be deduped. You can pass the updated_on as the id instead. In this case the greatest strategy is probably a better fit as well.

Giao Phan : lastis intended for when you are emitting batches of ascending sorted items, in those cases it will drop items that it has seen before.