Why am I getting an out of memory error for uploading 30-min voice recordings in my workflow with Google Drive, AWS S3, OpenAI Whisper, and Notion?

Though I did learn that apparently ffmpeg will just cut off some of the end of an audio file if you try to compress it too much

At least the way I was doing it

As for using ffmpeg on Pipedream:

I’d like to make the workflow robust enough that it can handle files over 23mb. And I think the only way to do that would be to slice any mp3 over that amount into chunks, and send the chunks one by one to Whisper for transcription

got it, so you’d like to split the file into chunks, send to Whisper, and recombine the chunks into a single transcription?

Yep!

I’ve already taken care of the other bottlenecks. I have logic that will split the transcript into chunks of 2500 tokens max in order to deal with ChatGPT limits, and then I have another set of loops to deal with the Notion API limits for how many blocks you can send in a single request

I think we can build this logic into the action, let me see if I can get that working. If the file exceeds the max Whisper limit, we should just split the file within the action, send to Whisper, recombine, return the full transcript

Oh that would be awesome. Do you know how long that’d take? I’m aiming to make a tutorial for this workflow this week, so if I need to split via ffmpeg in the mean time, I can do that

Though I know y’all ship pretty fast

working on it now, so I’ll see if I can get you something tonight!

Awesome! I’ll be around if you need any additional feedback

In my naive research, I was only able to find the segment_time param for splitting audio files with ffmpeg, so it splits based on time and not size. The size of the final file depends on the bitrate and other params. But I think if we split into small chunks, it should stay within the Whisper limits.

To get you started, here’s code that should split a given .mp3 file into 60 second chunks, and place them in the /tmp/chunks dir:

import { axios } from "@pipedream/platform";
import { createWriteStream } from "fs";
import { join } from "path";
import { promisify } from "util";
import { exec } from "child_process";
import ffmpegInstaller from "@ffmpeg-installer/ffmpeg";
import ffprobeInstaller from "@ffprobe-installer/ffprobe";

const execAsync = promisify(exec);

export default defineComponent({
  props: {
    url: {
      type: "string",
      label: "URL of the MP3 file",
    },
  },
  async run({ $ }) {
    const ffmpegPath = ffmpegInstaller.path;
    const ffprobePath = ffprobeInstaller.path;

    const downloadPath = join("/tmp", "downloaded.mp3");
    const fileStream = createWriteStream(downloadPath);

    const response = await axios($, {
      method: "GET",
      url: this.url,
      responseType: "stream",
    });

    response.pipe(fileStream);

    await new Promise((resolve, reject) => {
      fileStream.on("finish", resolve);
      fileStream.on("error", reject);
    });

    const outputDir = join("/tmp", "chunks");
    await execAsync(`mkdir -p ${outputDir}`);

    const command = `${ffmpegPath} -i ${downloadPath} -f segment -segment_time 60 -c copy ${outputDir}/chunk-%03d.mp3`;
    await execAsync(command);

    return { message: "MP3 file has been split into 60-second chunks" };
  },
});

[ai.m.pipedream.net](http://ai.m.pipedream.net) generated this code for me:

curl -d '{"prompt": "I want to download an mp3 file via URL, and use ffmpeg to split a file into 60 second chunks. ffmpeg is not installed in the environment, so will need to be downloaded via npm and provided to the environment"}' https://ai.m.pipedream.net

modifying the action now, and making some other improvements that should reduce memory

is that ai.m domain a new feature?

I’m working to get this into the Pipedream UI, we’re just testing it via that endpoint for now to figure out the warts. Here’s a quick overview, open to any feedback!

Whoa that’s cool!

I didn’t get this working tonight and have to leave for the evening, would you mind testing and improving this PR? Main changes were to add the stream.PassThrough logic and the file chunking + sending the chunks to OpenAI. You can use a large file like this to test.

@U04Q6GJ2B7D does the file splitting code above work for you for tonight? You should be able to iterate over the chunked files and send a request to the Whisper API for each file.

will test first thing tomorrow!

I updated the code to split into max 24MB equal parts, and then call Whisper asynchronously, then joins the output transcriptions. I tested with this audio file that has 41.9 MB size.

I’ll test a little bit more just to find how much memory the workflow needs