How to Implement Transactional Workflows with Single-Worker Access to Data Using DynamoDB and Pipedream?

user-1 · September 8, 2024, 11:35pm

This topic was automatically generated from Slack. You can find the original thread here.

How can I implement transactional workflows i.e. only one worker across all different workflows can access data at one time? I’m trying to use dynamodb but this is only at the action-level, not the workflow-level. I have multiple operations that I want to perform on a row at a time with various actions in between. It doesn’t seem this is supported with the concurrency configurations in Pipedream which are specific to a single workflow.

user-1 · September 8, 2024, 11:35pm

You might consider having a parent workflow that handles all the concurrency, then fans out to sub-workflows, depending on the conditions?

user-1 · September 8, 2024, 11:35pm

Under the hood, that’s exactly how if/else branching works, with $.flow.trigger(), which is a new API within Pipedream. Actually, so new it’s not yet documented

user-1 · September 8, 2024, 11:35pm

But there are some initial notes in the channel canvas here: Slack

user-1 · September 8, 2024, 11:35pm

An alternative could use Data Stores, where you write a record that you check before continuing (Data Stores are accessible to all workflows within a workspace), but there’s no guarantee you wouldn’t have race conditions that way.

user-1 · September 8, 2024, 11:35pm

If you’re using one workflow for all row operations, does setting the concurrency to 1 make it transactional?

user-1 · September 8, 2024, 11:35pm

These would be good options if I didn’t have to share event information between workflows. The workflow has to check whether another event has happened to determine the if/else branch. In this case I can outsource the shared data upstream to Twilio to solve this but if it were internal app data I don’t know how I would handle this without having to implement the transactions in code.

user-1 · September 8, 2024, 11:35pm

Sounds like you need to intentionally bottleneck your workflows.

The way I would do this is:
• Instead of performing transactional operations inside of each workflow, call a specific workflow to handle these.
◦ That workflow can be configured with a concurrency of 1, so it would be guaranteed never to run more than one transaction at a time. This would basically be the bottleneck.
• But before calling that centralized transactional workflow, I would call $.flow.rerun() in order to get a resume_url to pass to the transactional workflow.
◦ And once the transactional workflow is done, it could call the resume_url (and pass data back to it if necessary).
What do you think?

user-1 · September 8, 2024, 11:35pm

Yeah this works. Appreciate the help.