How to parse HTML and return a found image to the next step?

How to parse HTML and return a found image to the next step?

I have this Node.js step in my workflow:

What I want is to extract an image from that URL (in the code) and pass it to the next step.
What would be the correct way to do it?

I can add a line of code

document.querySelector(“#thumbnail > yt-image > img”).src

to get image src value - a path to an image.

How do I pass it to the next step and access it there?

Does Node.js code in the screenshot understand browser’s “document” object?

Thanks!

Hi @slishnevsky,

To parse HTML and return a found image to the next step in your Pipedream workflow, you can use the axios library to fetch the HTML content and a library like cheerio to parse the HTML and extract the image URL. Here’s how you can do it:

  1. Install the axios and cheerio libraries: You don’t need to explicitly install these libraries in Pipedream; just import them at the top of your code step.

  2. Fetch the HTML content: Use axios to fetch the HTML content from the URL.

  3. Parse the HTML and extract the image URL: Use cheerio to parse the HTML and extract the image URL using the selector you provided.

  4. Return the image URL: Return the extracted image URL so it can be used in the next step.

Here’s a complete example of how you can achieve this in a Pipedream Node.js code step:

import { axios } from "@pipedream/platform";
import cheerio from "cheerio";

export default defineComponent({
  props: {
    url: {
      type: "string",
      label: "URL to scrape image from",
    },
  },
  async run({ steps, $ }) {
    // Fetch the HTML content from the URL
    const response = await axios($, {
      url: this.url,
    });

    // Load the HTML content into cheerio
    const $ = cheerio.load(response);

    // Extract the image URL using the provided selector
    const imageUrl = $("#thumbnail > yt-image > img").attr("src");

    // Return the image URL so it can be used in the next step
    return imageUrl;
  },
});

Explanation:

  1. Import Libraries: The axios and cheerio libraries are imported at the top of the file.
  2. Props: A url prop is defined to accept the URL from which you want to scrape the image.
  3. Fetch HTML Content: The axios library is used to fetch the HTML content from the provided URL.
  4. Parse HTML: The HTML content is loaded into cheerio for parsing.
  5. Extract Image URL: The image URL is extracted using the provided CSS selector.
  6. Return Image URL: The extracted image URL is returned so it can be used in the

I’m a bot powered by Pipedream and GPT-4. I’m still learning, so please double-check my answers and code! Learn how to build your own.

And how do I access the returned value in the next step?