Convert URL To LLM-Friendly Input with Jina Reader API on New Message in Channel from Discord Bot API

Pipedream makes it easy to connect APIs for Jina Reader, Discord Bot and 2,800+ other apps remarkably fast.

Trigger workflow on

New Message in Channel from the Discord Bot API

Next, do this

Convert URL To LLM-Friendly Input with the Jina Reader API

No credit card required

▶

Watch us build a workflow

8 min

Watch now ➜

Trusted by 1,000,000+ developers from startups to Fortune 500 companies

Developers ♥ Pipedream

Getting Started#

This integration creates a workflow with a Discord Bot trigger and Jina Reader action. When you configure and deploy the workflow, it will run on Pipedream's servers 24x7 for free.

Select this integration
Configure the New Message in Channel trigger
1. Connect your Discord Bot account
2. Select a Guild
3. Select one or more Channels
4. Optional- Configure Emit messages as a single event
5. Optional- Configure Ignore Bot Messages
6. Configure timer
Configure the Convert URL To LLM-Friendly Input action
1. Connect your Jina Reader account
2. Optional- Configure URL
3. Optional- Select a Content Format
4. Optional- Configure Timeout
5. Optional- Configure Target Selector
6. Optional- Configure Wait For Selector
7. Optional- Configure Excluded Selector
8. Optional- Configure JSON Response
9. Optional- Configure Forward Cookie
10. Optional- Configure Proxy Server URL
11. Optional- Configure Bypass Cache
12. Optional- Configure Stream Mode
13. Optional- Configure Browser Locale
14. Optional- Configure Iframe
15. Optional- Configure Include Shadow DOM Content
16. Optional- Configure PDF File Path or URL
17. Optional- Configure HTML File Path or URL
18. Optional- Configure syncDir
Deploy the workflow
Send a test event to validate your setup
Turn on the trigger

Details#

This integration uses pre-built, source-available components from Pipedream's GitHub repo. These components are developed by Pipedream and the community, and verified and maintained by Pipedream.

To contribute an update to an existing component or create a new component, create a PR on GitHub. If you're new to Pipedream component development, you can start with quickstarts for trigger span and action development, and then review the component API reference.

Trigger#

New Message in Channel on Discord Bot

Description:Emit new event for each message posted to one or more channels

Version:1.0.0

Key:discord_bot-new-message-in-channel

View on GitHub

Discord Bot Overview#

The Discord Bot API unlocks the power to interact with Discord users and channels programmatically, making it possible to automate messages, manage servers, and integrate with other services. With Pipedream's serverless platform, you can create complex workflows that respond to events in Discord, process data, and trigger actions in other apps. This opens up opportunities for community engagement, content moderation, analytics, and more, without the overhead of managing infrastructure.

Trigger Code#

import { DEFAULT_POLLING_SOURCE_TIMER_INTERVAL } from "@pipedream/platform";
import maxBy from "lodash.maxby";
import common from "../common/common.mjs";
import sampleEmit from "./test-event.mjs";

const { discord } = common.props;

export default {
  ...common,
  key: "discord_bot-new-message-in-channel",
  name: "New Message in Channel",
  description: "Emit new event for each message posted to one or more channels",
  type: "source",
  version: "1.0.0",

  dedupe: "unique", // Dedupe events based on the Discord message ID
  props: {
    ...common.props,
    db: "$.service.db",
    channels: {
      type: "string[]",
      label: "Channels",
      description: "The channels you'd like to watch for new messages",
      propDefinition: [
        discord,
        "channelId",
        ({ guildId }) => ({
          guildId,
        }),
      ],
    },
    emitEventsInBatch: {
      type: "boolean",
      label: "Emit messages as a single event",
      description:
        "If `true`, all messages are emitted as an array, within a single Pipedream event. Defaults to `false`, emitting each Discord message as its own event in Pipedream",
      optional: true,
      default: false,
    },
    ignoreBotMessages: {
      type: "boolean",
      label: "Ignore Bot Messages",
      description: "Set to `true` to only emit messages NOT from the configured Discord bot",
      optional: true,
    },
    timer: {
      type: "$.interface.timer",
      default: {
        intervalSeconds: DEFAULT_POLLING_SOURCE_TIMER_INTERVAL,
      },
    },
  },
  hooks: {
    async deploy() {
      if (this.ignoreBotMessages) {
        const { id } = await this.getBotProfile();
        this._setBotId(id);
      }
    },
  },
  async run({ $ }) {
    // We store a cursor to the last message ID
    let lastMessageIDs = this._getLastMessageIDs();
    const botId = this.ignoreBotMessages
      ? this._getBotId()
      : null;

    // If this is our first time running this source,
    // get the last N messages, emit them, and checkpoint
    for (const channelId of this.channels) {
      let lastMessageID;
      let messages = [];

      if (!lastMessageID) {
        messages = await this.discord.getMessages({
          $,
          channelId,
          params: {
            limit: 100,
          },
        });

        lastMessageID = messages.length
          ? maxBy(messages, (message) => message.id).id
          : 1;

      } else {
        let newMessages = [];

        do {
          newMessages = await this.discord.getMessages({
            $,
            channelId,
            params: {
              after: lastMessageIDs[channelId],
            },
          });

          messages = messages.concat(newMessages);

          lastMessageID = newMessages.length
            ? maxBy(newMessages, (message) => message.id).id
            : lastMessageIDs[channelId];

        } while (newMessages.length);
      }

      // Set the new high water mark for the last message ID
      // for this channel
      lastMessageIDs[channelId] = lastMessageID;

      if (!messages.length) {
        console.log(`No new messages in channel ${channelId}`);
        return;
      }

      if (botId) {
        messages = messages.filter((message) => message.author.id !== botId);
      }

      console.log(`${messages.length} new messages in channel ${channelId}`);

      // Batched emits do not take advantage of the built-in deduper
      if (this.emitEventsInBatch) {
        const suffixChar =
          messages.length > 1
            ? "s"
            : "";

        this.$emit(messages, {
          summary: `${messages.length} new message${suffixChar}`,
          id: lastMessageID,
        });

      } else {
        messages.forEach((message) => {
          this.$emit(message, {
            summary: message.content,
            id: message.id, // dedupes events based on this ID
          });
        });
      }
    }

    // Set the last message ID for the next run
    this._setLastMessageIDs(lastMessageIDs);
  },
  sampleEmit,
};

Trigger Configuration#

This component may be configured based on the props defined in the component code. Pipedream automatically prompts for input values in the UI and CLI.

Label	Prop	Type	Description
Discord Bot	`discord`	`app`	This component uses the Discord Bot app.
Guild	`guildId`	`string`	Select a value from the drop down menu.
N/A	`db`	`$.service.db`	This component uses `$.service.db` to maintain state between executions.
Channels	`channels`	`string[]`	Select a value from the drop down menu.
Emit messages as a single event	`emitEventsInBatch`	`boolean`	If `true`, all messages are emitted as an array, within a single Pipedream event. Defaults to `false`, emitting each Discord message as its own event in Pipedream
Ignore Bot Messages	`ignoreBotMessages`	`boolean`	Set to `true` to only emit messages NOT from the configured Discord bot
	`timer`	`$.interface.timer`

Trigger Authentication#

Discord Bot uses API keys for authentication. When you connect your Discord Bot account, Pipedream securely stores the keys so you can easily authenticate to Discord Bot APIs in both code and no-code steps.

This app allows you to use the Discord API using your own Discord bot. If you don't want to use a custom bot, and you just need to use the Discord API, exit this screen and use the Discord app, instead.

If you want to use your own Discord bot, but haven't created one yet, see these instructions or watch this video. You'll need to add this bot to your server(s) to make successful API requests.

Once you've created your bot, you'll find the Bot token within the Bot section of your app. Enter that token below.

About Discord Bot#

Use this app to interact with the Discord API using a bot in your account

Action#

Convert URL To LLM-Friendly Input on Jina Reader

Description:Converts a provided URL to an LLM-friendly input using Jina Reader. [See the documentation](https://github.com/jina-ai/reader)

Version:1.0.1

Key:jina_reader-convert-to-llm-friendly-input

View on GitHub

Action Code#

import {
  ConfigurationError, getFileStream,
} from "@pipedream/platform";
import app from "../../jina_reader.app.mjs";

export default {
  key: "jina_reader-convert-to-llm-friendly-input",
  name: "Convert URL To LLM-Friendly Input",
  description: "Converts a provided URL to an LLM-friendly input using Jina Reader. [See the documentation](https://github.com/jina-ai/reader)",
  version: "1.0.1",
  type: "action",
  props: {
    app,
    url: {
      type: "string",
      label: "URL",
      description: "The URL to convert to an LLM-friendly input.",
      optional: true,
    },
    contentFormat: {
      type: "string",
      label: "Content Format",
      description: "You can control the level of detail in the response to prevent over-filtering. The default pipeline is optimized for most websites and LLM input.",
      optional: true,
      options: [
        "markdown",
        "html",
        "text",
        "screenshot",
        "pageshot",
      ],
    },
    timeout: {
      type: "integer",
      label: "Timeout",
      description: "Maximum time to wait for the webpage to load. Note that this is NOT the total time for the whole end-to-end request.",
      optional: true,
    },
    targetSelector: {
      type: "string",
      label: "Target Selector",
      description: "Provide a list of CSS selector to focus on more specific parts of the page. Useful when your desired content doesn't show under the default settings. E.g., `body, .class, #id`.",
      optional: true,
    },
    waitForSelector: {
      type: "string",
      label: "Wait For Selector",
      description: "Provide a list of CSS selector to wait for specific elements to appear before returning. Useful when your desired content doesn't show under the default settings. E.g., `body, .class, #id`.",
      optional: true,
    },
    excludedSelector: {
      type: "string",
      label: "Excluded Selector",
      description: "Provide a list of CSS selector to remove the specified elements of the page. Useful when you want to exclude specific parts of the page like headers, footers, etc. E.g., `header, .class, #id`.",
      optional: true,
    },
    jsonResponse: {
      type: "boolean",
      label: "JSON Response",
      description: "The response will be in JSON format, containing the URL, title, content, and timestamp (if available). In Search mode, it returns a list of five entries, each following the described JSON structure. Keep in mind **JSON Response** will take piority over **Stream mode** if both are enabled.",
      optional: true,
    },
    forwardCookie: {
      type: "string",
      label: "Forward Cookie",
      description: "The API server can forward your custom cookie settings when accessing the URL, which is useful for pages requiring extra authentication. Note that requests with cookies will not be cached. E.g., `<cookie-name>=<cookie-value>, <cookie-name-1>=<cookie-value>; domain=<cookie-1-domain>`. [Learn more here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie).",
      optional: true,
    },
    useProxyServer: {
      type: "string",
      label: "Proxy Server URL",
      description: "The API server can utilize your proxy to access URLs, which is helpful for pages accessible only through specific proxies. E.g., `http://your_proxy_server.com`. [Learn more here](https://en.wikipedia.org/wiki/Proxy_server).",
      optional: true,
    },
    bypassCache: {
      type: "boolean",
      label: "Bypass Cache",
      description: "The API server caches both Read and Search mode contents for a certain amount of time. To bypass this cache, set this header to `true`.",
      optional: true,
    },
    streamMode: {
      type: "boolean",
      label: "Stream Mode",
      description: "Stream mode is beneficial for large target pages, allowing more time for the page to fully render. If standard mode results in incomplete content, consider using **Stream mode**. [Learn more here](https://github.com/jina-ai/reader?tab=readme-ov-file#streaming-mode). Keep in mind **JSON Response** will take piority over **Stream mode** if both are enabled.",
      optional: true,
    },
    browserLocale: {
      type: "string",
      label: "Browser Locale",
      description: "Control the browser locale to render the page. eg. `en-US`. [Learn more here](https://developer.mozilla.org/en-US/docs/Web/API/Navigator/language).",
      optional: true,
    },
    iframeContent: {
      type: "boolean",
      label: "Iframe",
      description: "Returning result will also include the content of the iframes on the page.",
      optional: true,
    },
    shadowDomContent: {
      type: "boolean",
      label: "Include Shadow DOM Content",
      description: "Returning result will also include the content of the shadow DOM on the page.",
      optional: true,
    },
    pdf: {
      type: "string",
      label: "PDF File Path or URL",
      description: "The path or URL to the pdf file.",
      optional: true,
    },
    html: {
      type: "string",
      label: "HTML File Path or URL",
      description: "The path or URL to the html file.",
      optional: true,
    },
    syncDir: {
      type: "dir",
      accessMode: "read",
      sync: true,
      optional: true,
    },
  },
  methods: {
    streamToBase64(stream) {
      return new Promise((resolve, reject) => {
        const chunks = [];
        stream.on("data", (chunk) => chunks.push(chunk));
        stream.on("end", () => {
          const buffer = Buffer.concat(chunks);
          resolve(buffer.toString("base64"));
        });
        stream.on("error", reject);
      });
    },
    streamToUtf8(stream) {
      return new Promise((resolve, reject) => {
        let data = "";
        stream.setEncoding("utf-8");
        stream.on("data", (chunk) => data += chunk);
        stream.on("end", () => resolve(data));
        stream.on("error", reject);
      });
    },
  },
  async run({ $ }) {
    const {
      app,
      url,
      contentFormat,
      timeout,
      targetSelector,
      waitForSelector,
      excludedSelector,
      jsonResponse,
      forwardCookie,
      useProxyServer,
      bypassCache,
      streamMode,
      browserLocale,
      iframeContent,
      shadowDomContent,
      pdf,
      html,
    } = this;

    if (!url && !pdf && !html) {
      throw new ConfigurationError("You must provide at least one of **URL**, **PDF File Path or URL**, or **HTML File Path or URL**.");
    }

    const data = {
      url,
    };

    if (pdf) {
      const stream = await getFileStream(pdf);
      data.pdf = await this.streamToBase64(stream);
    }

    if (html) {
      const stream = await getFileStream(html);
      data.html = await this.streamToUtf8(stream);
    }

    const response = await app.post({
      $,
      headers: {
        "X-Return-Format": contentFormat,
        "X-Timeout": timeout,
        "X-Target-Selector": targetSelector,
        "X-Wait-For-Selector": waitForSelector,
        "X-Remove-Selector": excludedSelector,
        "X-Set-Cookie": forwardCookie,
        "X-Proxy-Url": useProxyServer,
        "X-No-Cache": bypassCache,
        "Accept": jsonResponse
          ? "application/json"
          : streamMode
            ? "text/event-stream"
            : undefined,
        "X-Locale": browserLocale,
        "X-With-Shadow-Dom": shadowDomContent,
        "X-Iframe": iframeContent,
      },
      data,
    });

    $.export("$summary", "Converted URL to LLM-friendly input successfully.");
    return response;
  },
};

Action Configuration#

This component may be configured based on the props defined in the component code. Pipedream automatically prompts for input values in the UI.

Label	Prop	Type	Description
Jina Reader	`app`	`app`	This component uses the Jina Reader app.
URL	`url`	`string`	The URL to convert to an LLM-friendly input.
Content Format	`contentFormat`	`string`	Select a value from the drop down menu:`markdownhtmltextscreenshotpageshot`
Timeout	`timeout`	`integer`	Maximum time to wait for the webpage to load. Note that this is NOT the total time for the whole end-to-end request.
Target Selector	`targetSelector`	`string`	Provide a list of CSS selector to focus on more specific parts of the page. Useful when your desired content doesn't show under the default settings. E.g., `body, .class, #id`.
Wait For Selector	`waitForSelector`	`string`	Provide a list of CSS selector to wait for specific elements to appear before returning. Useful when your desired content doesn't show under the default settings. E.g., `body, .class, #id`.
Excluded Selector	`excludedSelector`	`string`	Provide a list of CSS selector to remove the specified elements of the page. Useful when you want to exclude specific parts of the page like headers, footers, etc. E.g., `header, .class, #id`.
JSON Response	`jsonResponse`	`boolean`	The response will be in JSON format, containing the URL, title, content, and timestamp (if available). In Search mode, it returns a list of five entries, each following the described JSON structure. Keep in mind JSON Response will take piority over Stream mode if both are enabled.
Forward Cookie	`forwardCookie`	`string`	The API server can forward your custom cookie settings when accessing the URL, which is useful for pages requiring extra authentication. Note that requests with cookies will not be cached. E.g., `<cookie-name>=<cookie-value>, <cookie-name-1>=<cookie-value>; domain=<cookie-1-domain>`. Learn more here
Proxy Server URL	`useProxyServer`	`string`	The API server can utilize your proxy to access URLs, which is helpful for pages accessible only through specific proxies. E.g., `http://your_proxy_server.com`. Learn more here
Bypass Cache	`bypassCache`	`boolean`	The API server caches both Read and Search mode contents for a certain amount of time. To bypass this cache, set this header to `true`.
Stream Mode	`streamMode`	`boolean`	Stream mode is beneficial for large target pages, allowing more time for the page to fully render. If standard mode results in incomplete content, consider using Stream mode. Learn more here. Keep in mind JSON Response will take piority over Stream mode if both are enabled.
Browser Locale	`browserLocale`	`string`	Control the browser locale to render the page. eg. `en-US`. Learn more here
Iframe	`iframeContent`	`boolean`	Returning result will also include the content of the iframes on the page.
Include Shadow DOM Content	`shadowDomContent`	`boolean`	Returning result will also include the content of the shadow DOM on the page.
PDF File Path or URL	`pdf`	`string`	The path or URL to the pdf file.
HTML File Path or URL	`html`	`string`	The path or URL to the html file.
syncDir	`syncDir`	`dir`