How can I filter and post only the text section from an RSS feed's description in Pipedream's Webhook feature, similar to Zapier's Twitter integration?

user-1 · April 27, 2023, 3:55pm

This topic was automatically generated from Slack. You can find the original thread here.

Hi all, I am a new Pipedream user and really enjoy the Webhook POST feature. I am implementing an RSS to Webhook pipedream app to post new Letterboxd entries to Mastodon. It generally works well but I’d like to fine-tune the post using the description section of the RSS entry. The description
includes an and a text section. Pipedream can read the whole description but cannot filter the text section while posting the whole description will mess up the Mastodon post.
In Zapier’s Twitter integration, they provided a processed description that only keeps the text tag content along with the raw information.
Can you please help me achieve a similar result? Thanks!

Example Letterboxd RSS feed: Letterboxd - tyler
Example description entry:

<description>
<![CDATA[ 
<p><img src="https://a.ltrbxd.com/resized/film-poster/7/1/5/8/5/6/715856-beau-is-afraid-0-600-0-900-crop.jpg?v=b5fd01665a"/></p> 
<p>the greatest cinematic depiction of mommy issues since <i>psycho (1960) </i></p> 
]]>
</description>

For example, Zapier provides a filter to select: the greatest cinematic depiction of mommy issues since psycho (1960) .

user-1 · April 27, 2023, 3:55pm

I’m going to tag our new support bot, which provides a decent workaround that I think may help

user-1 · April 27, 2023, 3:55pm

Hi all, I am a new Pipedream user and really enjoy the Webhook POST feature. I am implementing an RSS to Webhook pipedream app to post new Letterboxd entries to Mastodon. It generally works well but I’d like to fine-tune the post using the description section of the RSS entry. The description
includes an and a text section. Pipedream can read the whole description but cannot filter the text section while posting the whole description will mess up the Mastodon post.
In Zapier’s Twitter integration, they provided a processed description that only keeps the text tag content along with the raw information.
Can you please help me achieve a similar result? Thanks!

Example Letterboxd RSS feed: Letterboxd - tyler
Example description entry:

<description>
<![CDATA[ 
<p><img src="https://a.ltrbxd.com/resized/film-poster/7/1/5/8/5/6/715856-beau-is-afraid-0-600-0-900-crop.jpg?v=b5fd01665a"/></p> 
<p>the greatest cinematic depiction of mommy issues since <i>psycho (1960) </i></p> 
]]>
</description>

For example, Zapier provides a filter to select: the greatest cinematic depiction of mommy issues since psycho (1960) .

user-1 · April 27, 2023, 3:55pm

To extract the text content from the description field of an RSS entry in Pipedream, you can use a code step to parse the HTML and remove the unwanted tags. Here’s an example of how to do this using the cheerio package: 1. Add a new code step in your Pipedream workflow. 2. Click on the “Add Package” button and search for “cheerio”. Select it and click “Add Package”. 3. In the code step, add the following code: import cheerio from "cheerio"; const rawDescription = steps.your_previous_step_name.description; // Replace with the actual step name that provides the description const $ = cheerio.load(rawDescription); const textContent = $("p").eq(1).text(); return { textContent }; Replace steps.your_previous_step_name.description with the actual step name that provides the description from the RSS feed. This code will load the description content using Cheerio, select the second  tag, and extract its text content. The extracted text content will be available as steps.your_code_step_name.textContent in the following steps of your workflow, where your_code_step_name should be replaced with the actual name of the code step you added. Now you can use the extracted text content in your webhook POST request to Mastodon.

user-1 · April 27, 2023, 3:55pm

The approach Mario suggests is a little specific to the example you provided, but generally, you can use Cheerio in the same way to parse the text

user-1 · April 27, 2023, 3:55pm

How do I parse the text in this description tag? I want

<description>
<![CDATA[ 
<p><img src="https://a.ltrbxd.com/resized/film-poster/7/1/5/8/5/6/715856-beau-is-afraid-0-600-0-900-crop.jpg?v=b5fd01665a"/></p> 
<p>the greatest cinematic depiction of mommy issues since <i>psycho (1960) </i></p> 
]]>
</description>

to return

the greatest cinematic depiction of mommy issues since psycho (1960) .

user-1 · April 27, 2023, 3:55pm

You can use a combination of regular expressions and the decodeURIComponent function to parse the text in the description tag. Here’s a Pipedream component that demonstrates how to do this: import { axios } from "@pipedream/platform"; export default defineComponent({ props: { description: { type: "string", label: "Description", }, }, async run({ steps, $ }) { const regex = /(.+?)<\/p>/; const match = this.description.match(regex); if (match && match[1]) { const decodedText = decodeURIComponent(match[1]); return decodedText; } else { return "No matching text found"; } }, }); Replace the description prop value with the text you provided, and this component will return the desired output: ```
the greatest cinematic depiction of mommy issues since psycho (1960)

user-1 · April 27, 2023, 3:55pm

Let me know if that helps. Both are decent approaches and show how you can use a library like Cheerio to parse HTML in a code step, or use regular expressions (in the second example) to parse a more specific format like the one you provided.

user-1 · April 27, 2023, 3:55pm

Thanks! I will try the solution suggested by the bot.

user-1 · April 27, 2023, 3:55pm

Your bot is amazing and I feel sorry for not carefully reading the docs and not knowing the custom code step.

user-1 · April 27, 2023, 3:55pm

I am more comfortable with Python so my solution is

from bs4 import BeautifulSoup
from pipedream.script_helpers import (steps, export)

# Parse the description with BeautifulSoup
soup = BeautifulSoup(steps["trigger"]["event"]["description"], 'html.parser')

# Find all img tags and remove them
for img in soup.find_all('img'):
    img.extract()

export('review', soup.get_text().strip())

user-1 · April 27, 2023, 3:55pm

Very cool!