$checkpoint failing to persist across executions

This topic was automatically generated from Slack. You can find the original thread here.

Nicholas Reilingh : Hey, I’m having an issue where initially setting $checkpoint from null is not persisting after a workflow exits. I’m having trouble nailing down the issue – I’ve gotten the value to persist immediately after deploying an edit to my workflow, but then if I clear the checkpoint, it doesn’t work on subsequent events…

Nicholas Reilingh : On the timeout task, I am mutating $checkpoint.link_state to 'DOWN', but on the very next event (or in workflow settings), it’s still up. I have this workflow set to serialized concurrency

Nicholas Reilingh : Also, my !$checkpoint case never successfully persisted the checkpoint value. Once I used a different step to just set it to a placeholder value, that 'heartbeat' case is apparently able to update the value

Nicholas Reilingh : n.b. this workflow is triggered by the alpha scheduled task action. Could that have something to do with it?

Nicholas Reilingh : The answer to that is no – I’m seeing exactly the same issues when I factor out the checkpoint stuff into a separate HTTP workflow

Nicholas Reilingh : Yeah, this makes absolutely no sense to me and I need help

Nicholas Reilingh : Okay, so I switched to a step-based this.$checkpoint pattern and things seem to be working more as expected. I did have to refactor a value that I was passing in as a parameter to a subsequent step which I was originally referencing directly from $checkpoint

Dylan Sather (Pipedream) : I tried implementing a simple case to reproduce. I also serialized concurrency and hit my endpoint with a large number of events and I confirmed they incremented as I expected. I also cleared the contents of $checkpoint and ran a number more events, and it reset / incremented as I expected.

If you can get a minimal workflow that reproduces the issue you were seeing I’m happy to take a deeper look

Nicholas Reilingh : Yeah, I’ll bet the issue is somewhere in between the simple case and the example in my screenshot, which may take a little bit of time to isolate. Is there anything special or fancy about the way that persistence is implemented that could be running contrary to my assumptions about the feature, or is the design intended to fully abstract that away so I can treat it like any other javascript variable?

Dylan Sather (Pipedream) : From the perspective of the workflow, $checkpoint should be a standard JavaScript value corresponding to the data you saved. The main limitation is that it must be JSON-serializable, since that’s how it’s stored between executions

Nicholas Reilingh : One more question about $checkpoint — is it expected that a write to this.$checkpoint will consume a considerable amount of compute? I’m seeing a difference of about 4 seconds of compute between a read-only and read-write case.

Dylan Sather (Pipedream) : That’s not expected. Is the longer compute happening immediately after a deploy? Since we have to deploy a new execution environment and download all npm packages, you’ll notice that the first event takes longer.

Nicholas Reilingh : Context: I’m using Pipedream to help monitor a field device with an inconsistent internet connection. That device makes an HTTP request to workflow 1 every 7 minutes. That workflow schedules two events in workflow 2: one for Now (heartbeat), and one for 15 minutes in the future (timeout). The heartbeat event stores the heartbeat timestamp in a checkpoint, and the timeout event checks to see if the timestamp has changed in the intervening 15 minutes. I’m getting 4 sec of compute on every write, which chewed nearly straight through my compute time usage. Workflow 2 is pretty much just my earlier screenshot plus one additional step to send a slack notification.

Dylan Sather (Pipedream) : Based on the timing, it looks like the workflow is going “cold” (i.e. the execution environment has been spun down since we haven’t received an event in the last ~5 min). Notice how the request every ~7 minutes takes the longest time (you’re not charged for the time it takes to spin up the virtual machine, but you are charged for the time it takes to download npm packages). Then the request ~1 min after only takes a few hundred ms.

Nicholas Reilingh : Oh wow! So I completely by accident managed to absolutely maximize the number of cold starts on this workflow. Lol. I should be able to delay my timeout events to 17.5m instead of 15m and avoid the cold starts completely, then.

Nicholas Reilingh : Here’s a thought: it would be really neat if the event compute time was displayed with a little :ice_cube: or something to indicate when it was a cold start, assuming that information is easy to surface.

Dylan Sather (Pipedream) : Yeah definitely experiment with the timing a little bit. Unfortunately the cold starts are out of our control (AWS manages this), but that’s a neat idea. We do know when it’s a fresh env.

Hi @user-1 Nicholas, I’m trying to reproduce a similar use case for monitoring a web application. I’m currently having trouble implementing your point “That workflow schedules two events in workflow 2”. Would it be possible to elaborate or even share a relevant code snippet?
Kind regards
Marc