Example of RSS Feed Capturing, Summarizing with OpenAI, and Posting to Notion Workflow?

This topic was automatically generated from Slack. You can find the original thread here.

Hello! Does anyone have an example of capturing an RSS feed, creating a summary of a post with OpenAI, and then posting that summary to Notion workflow?

Hello! Does anyone have an example of capturing an RSS feed, creating a summary of a post with OpenAI, and then posting that summary to Notion workflow?

To create a workflow that captures an RSS feed, summarizes a post with OpenAI, and posts the summary to Notion, you can follow these steps: 1. Add the “Random item from multiple RSS feeds” trigger from the RSS app. To do this, search for the RSS app in the trigger menu and select the “Random item from multiple RSS feeds” trigger. Connect your RSS account and select one or more Feed URLs or pass an array of strings from a previous step. 2. Add an action to summarize the post using OpenAI. You’ll need to use a code step to make an API call to OpenAI’s GPT-3 API. You can follow the OpenAI API documentation to make the request: OpenAI API 3. Add a Notion action to create a new page or update an existing page with the summary. Search for the Notion app in the action menu and select the appropriate action (e.g., “Create a new page” or “Update an existing page”). Connect your Notion account and configure the action with the necessary information, such as the database or page ID and the summary from the previous step. Remember to connect your OpenAI and Notion accounts to Pipedream and provide the necessary API keys for authentication.

isn’t there an RSS trigger I can use to get every new item in a feed? Also, isn’t there a pre-built summarize action for OpenAI?

Yes, there is an RSS trigger called “New Item in Feed” that you can use to get every new item in a feed. To set it up: 1. Search for the RSS app from the trigger menu. 2. Select the “New Item in Feed” trigger. 3. Connect your RSS account. 4. Select a Feed URL or pass a string from a previous step. As for the OpenAI summarization, there isn’t a pre-built action specifically for summarization. However, you can use a code step to make an API call to OpenAI’s GPT-3 API for summarization. Follow the OpenAI API documentation to make the request: OpenAI API After summarizing the content, you can use a Notion action to create or update a page with the summary.

Nice feature. I tried this earlier and the the results from the RSS feed was only the summary. Looking through the results I couldn’t find the entire article.

You’ll probably need to scrape the URL. How can I scrape the URL from the item in the feed and return just the text from the HTML?

Also we do have a summarize action: https://pipedream.com/apps/openai/actions/summarize . So once you get the text of the URL, you can pass that directly to this action

How can I scrape the URL from the item in the RSS trigger and return just the text from the HTML?

To scrape the URL from the item in the RSS trigger and return just the text from the HTML, you can use the “New Item in Feed” trigger and then add a code step to extract the text from the HTML. You can use the “cheerio” package to parse and extract the text from the HTML content. Here’s an example workflow: 1. Add the “New Item in Feed” trigger from the RSS app. 2. Add a code step to parse the HTML content and extract the text using the “cheerio” package. Make sure to import the package at the top of your code: ```
import cheerio from “cheerio”; export default async function (event, { axios }) { const url = event.url; // Assuming the URL is available in the event object const response = await axios.get(url); const htmlContent = response.data; const $ = cheerio.load(htmlContent); const textContent = $(“body”).text(); return { textContent }; }

Sweet! Thanks for the help .

No problem. Unfortunately that code still isn’t right - Pi is still learning - but I think I can get it to return correct code

That code almost looks good, but it needs to be a Pipedream component to run on Pipedream. Can you make sure the core code is in a defineComponent function with a run method and a prop for the URL to scrape?

Sure, here’s the updated code with the defineComponent function and a prop for the URL to scrape: import cheerio from "cheerio"; import { axios } from "@pipedream/platform"; export default defineComponent({ props: { urlToScrape: { type: "string", label: "URL to Scrape", description: "Enter the URL you'd like to scrape for text content.", }, }, async run() { const response = await axios(this, { method: "GET", url: this.urlToScrape, }); const htmlContent = response.data; const $ = cheerio.load(htmlContent); const textContent = $("body").text(); return { textContent }; }, }); This component has a prop for the URL to scrape and uses the cheerio package to extract the text content from the HTML. The extracted text content is returned as an output.

give that a try

This is close, but when testing the step I get an error, Invalid HTML content. Expected a string.Can you help with that?

I apologize for the confusion. It seems the issue might be related to the response data type. Let’s modify the code to handle the response properly. Here’s the updated code: import cheerio from "cheerio"; import { axios } from "@pipedream/platform"; export default defineComponent({ props: { urlToScrape: { type: "string", label: "URL to Scrape", description: "Enter the URL you'd like to scrape for text content.", }, }, async run() { const response = await axios(this, { method: "GET", url: this.urlToScrape, responseType: "text", }); const htmlContent = response.data; const $ = cheerio.load(htmlContent); const textContent = $("body").text(); return { textContent }; }, }); In this updated code, I’ve added the responseType: "text" option to the axios request. This should ensure that the response is treated as text, and the error you encountered should be resolved.

The response is still coming back as undefined. Do you have any other suggestions?

I apologize for the confusion earlier. Let’s try another approach using the “html-to-text” package to extract the text content from the HTML. Here’s the updated code: import cheerio from "cheerio"; import { axios } from "@pipedream/platform"; import { htmlToText } from "html-to-text"; export default defineComponent({ props: { urlToScrape: { type: "string", label: "URL to Scrape", description: "Enter the URL you'd like to scrape for text content.", }, }, async run() { const response = await axios(this, { method: "GET", url: this.urlToScrape, responseType: "text", }); const htmlContent = response.data; const textContent = htmlToText(htmlContent, { wordwrap: false, ignoreHref: true, ignoreImage: true, }); return { textContent }; }, }); In this updated code, I’ve replaced the Cheerio package with the “html-to-text” package to extract the text content from the HTML. This should resolve the issue with the undefined response.