Today I tried to implement something I often used to do in Yahoo Pipes with Pipedream but sadly could not achieve it in an easy manner:
I want to ingest an RSS feed, process (e.g. filter) it’s items and then output another RSS feed URL that can be ingested by other RSS capable tools. So a workflow would look something like “RSS trigger → Code step to modify event → Push item to new RSS feed → Publish new RSS feed”. This is not possible with pipedream in a straightforward way as the only way to call into a workflow is by defining an appropriate trigger (in this case an HTTP API trigger), there is no concept of “Output Actions” or similar (except for actions that just pass data through another API to another service but I wouldn’t call them “Outputs” as they don’t terminate the Workflow).
I see two ways to support such a use-case on PD:
1.) Add a way to share state between multiple workflows. Even sharing $checkpoint would be enough (but that will make race conditions an even bigger issue). This is probably not going to happen as the SQL Service will be shut down and the blogpost mentions workflows to be the focus.
2.) Create an action that can accumulate data in an internal state through the workflow but can serve this data through an HTTP API.
In my case this would be specific to RSS feeds and that action would need the flexibility to expose RSS, ATOM and JSON feeds but I would imagine this could be generalised in a way to allow any data to be exposed.
As I mentioned this concept does not seem to be present in PD as of now. Workflows can’t have outputs that can be called into, only triggers.
Is this something PD is interested in adding in the future?
Does anyone lese have other ideas how to approach this?
What I could do right now without using 3rd party tools like a DB:
I could of course define a HTTP API trigger and then as a second step retrieve the feed and parse it to generate the items, dedupe them, process them, and then store them in the $checkpoint just to finally $respond with the data from the $checkpoint. This approach has the problem that it would re-parse the entire feed on every request, which is wasteful and very slow if the feed is very large (for me it always ran into timeouts maybe I used the feedparser wrong as the RSS source seems to work even with very large feeds). I could further change this by only re-parsing the feed if the request to the API was of a special shape and then create another workflow to call this to periodically regenerate the items in the main workflow. This is similar to dylan’s workflow here: https://pipedream.com/@dylburger/generate-an-rss-feed-from-http-post-requests-retrieve-via-get-request-p_n1CrQG/edit - All of this is a lot of complexity though.