Publish RSS Feed Action

chrigi · January 28, 2022, 10:27pm

Today I tried to implement something I often used to do in Yahoo Pipes with Pipedream but sadly could not achieve it in an easy manner:
I want to ingest an RSS feed, process (e.g. filter) it’s items and then output another RSS feed URL that can be ingested by other RSS capable tools. So a workflow would look something like “RSS trigger → Code step to modify event → Push item to new RSS feed → Publish new RSS feed”. This is not possible with pipedream in a straightforward way as the only way to call into a workflow is by defining an appropriate trigger (in this case an HTTP API trigger), there is no concept of “Output Actions” or similar (except for actions that just pass data through another API to another service but I wouldn’t call them “Outputs” as they don’t terminate the Workflow).

I see two ways to support such a use-case on PD:
1.) Add a way to share state between multiple workflows. Even sharing $checkpoint would be enough (but that will make race conditions an even bigger issue). This is probably not going to happen as the SQL Service will be shut down and the blogpost mentions workflows to be the focus.
2.) Create an action that can accumulate data in an internal state through the workflow but can serve this data through an HTTP API.

In my case this would be specific to RSS feeds and that action would need the flexibility to expose RSS, ATOM and JSON feeds but I would imagine this could be generalised in a way to allow any data to be exposed.

As I mentioned this concept does not seem to be present in PD as of now. Workflows can’t have outputs that can be called into, only triggers.

Is this something PD is interested in adding in the future?
Does anyone lese have other ideas how to approach this?

What I could do right now without using 3rd party tools like a DB:
I could of course define a HTTP API trigger and then as a second step retrieve the feed and parse it to generate the items, dedupe them, process them, and then store them in the $checkpoint just to finally $respond with the data from the $checkpoint. This approach has the problem that it would re-parse the entire feed on every request, which is wasteful and very slow if the feed is very large (for me it always ran into timeouts maybe I used the feedparser wrong as the RSS source seems to work even with very large feeds). I could further change this by only re-parsing the feed if the request to the API was of a special shape and then create another workflow to call this to periodically regenerate the items in the main workflow. This is similar to dylan’s workflow here: https://pipedream.com/@dylburger/generate-an-rss-feed-from-http-post-requests-retrieve-via-get-request-p_n1CrQG/edit - All of this is a lot of complexity though.

pierce · February 1, 2022, 3:16pm

Hi @chrigi,

Thanks for the detailed questions. Let me try to breakdown my answer to match the different points you bring up.

This is not possible with pipedream in a straightforward way as the only way to call into a workflow is by defining an appropriate trigger (in this case an HTTP API trigger), there is no concept of “Output Actions”…

You can pass data from one step to another in the same workflow without terminating the workflow - not sure if you’ve seen the Step Exports feature

You can export data from your RSS deduping step to downstream steps in the same workflow, then consume these items in the RSS publishing step.

However to tie back to your other question, there is no way to share the same data across multiple workflows without a 3rd party database.

Pipedream can still handle that flow you describe, but you may find it the easiest to spin up a database just to serve as your global state between workflows.

We do have a $.service.db but it only stores data in the same step between workflows. You wouldn’t for example be able to retrieve data in the “Publish RSS feed” step that was stored in the “Dedupe RSS feed” step.

chrigi · February 1, 2022, 8:24pm

Hi @pierce

I think we have talked past each other. I do know the basics of PD on how to pass data between steps, what service.db does and how to use $checkpoint.

The workflow I am trying to create is:
(1) Ingest RSS feed items with an RSS feed trigger → (2) Filter the item ($end if it doesn’t match) → (3) Publish the matching items to a new RSS feed.

(1) and (2) are easy enough but there is no “Publish RSS Feed Action”, that doesn’t exist on PD so (3) is impossible. Hence this Feature Request.

My main question is: Could you please add a “Publish RSS Feed Action” to PD?
My secondary question was if anyone had a useable workaround for the meantime.

In my post I tried to further elaborate on the following points:

The only way to get something out of a Workflow is to either actively send it to another service outside PD when the workflow runs or define a HTTP API trigger, adding a “Publish RSS Feed Action” would go counter to that in a way
I’ve tried to explain two options I see how this could be addressed in the most straightforward ways, these were just my musings, no clue what you guys at PD want to do about this (if even)
I tried to hint at the possibility that creating a “Publish RSS Feed Action” would be great but that pattern COULD be generalised into a more flexible action
In the last part I’ve explained a rather complex and cumbersome way of maybe doing what I want to illustrate what kind of hoops have to be jumped through to work around this limitation and that even that doesn’t work for large RSS feeds it seems

pierce · February 1, 2022, 8:49pm

Got it thanks @chrigi,

Thanks for clarifying, I thought you were running into workflow data issues first and publishing an RSS feed second.

We have lots of examples of consuming RSS feeds, but not much on the publishing side.

The best place to ask for new actions is here in our public Github issue tracker: Sign in to GitHub · GitHub

We are constantly adding new actions and integrations and this is our backlog.

Or you can take a stab at making it yourself if you’d like. There’s documentation on how to build your own actions here: Quickstart: Action Development

Sorry for the misinterpretation! But I hope this helps somewhat.

chrigi · February 2, 2022, 9:39am

Hi @pierce

Ok if this is the wrong place to ask this then I’ll open a GitHub issue for this. //EDIT: [FEATURE] Publish RSS Feed Action · Issue #2164 · PipedreamHQ/pipedream · GitHub

Sadly it seems impossible with the current platform to write this myself, if I could I would have. It would need additional features to support such a use case, at least I think so, that are not available to code steps or custom actions.