Why am I getting an out of memory error for uploading 30-min voice recordings in my workflow with Google Drive, AWS S3, OpenAI Whisper, and Notion?

user-1 · April 17, 2023, 1:08pm

Though I did learn that apparently ffmpeg will just cut off some of the end of an audio file if you try to compress it too much

user-1 · April 17, 2023, 1:08pm

At least the way I was doing it

user-1 · April 17, 2023, 1:08pm

As for using ffmpeg on Pipedream:

I’d like to make the workflow robust enough that it can handle files over 23mb. And I think the only way to do that would be to slice any mp3 over that amount into chunks, and send the chunks one by one to Whisper for transcription

user-1 · April 17, 2023, 1:08pm

got it, so you’d like to split the file into chunks, send to Whisper, and recombine the chunks into a single transcription?

user-1 · April 17, 2023, 1:08pm

Yep!

user-1 · April 17, 2023, 1:08pm

I’ve already taken care of the other bottlenecks. I have logic that will split the transcript into chunks of 2500 tokens max in order to deal with ChatGPT limits, and then I have another set of loops to deal with the Notion API limits for how many blocks you can send in a single request

user-1 · April 17, 2023, 1:08pm

I think we can build this logic into the action, let me see if I can get that working. If the file exceeds the max Whisper limit, we should just split the file within the action, send to Whisper, recombine, return the full transcript

user-1 · April 17, 2023, 1:08pm

Oh that would be awesome. Do you know how long that’d take? I’m aiming to make a tutorial for this workflow this week, so if I need to split via ffmpeg in the mean time, I can do that

user-1 · April 17, 2023, 1:08pm

Though I know y’all ship pretty fast

user-1 · April 17, 2023, 1:08pm

working on it now, so I’ll see if I can get you something tonight!

user-1 · April 17, 2023, 1:08pm

Awesome! I’ll be around if you need any additional feedback

user-1 · April 17, 2023, 1:08pm

In my naive research, I was only able to find the segment_time param for splitting audio files with ffmpeg, so it splits based on time and not size. The size of the final file depends on the bitrate and other params. But I think if we split into small chunks, it should stay within the Whisper limits.

To get you started, here’s code that should split a given .mp3 file into 60 second chunks, and place them in the /tmp/chunks dir:

import { axios } from "@pipedream/platform";
import { createWriteStream } from "fs";
import { join } from "path";
import { promisify } from "util";
import { exec } from "child_process";
import ffmpegInstaller from "@ffmpeg-installer/ffmpeg";
import ffprobeInstaller from "@ffprobe-installer/ffprobe";

const execAsync = promisify(exec);

export default defineComponent({
  props: {
    url: {
      type: "string",
      label: "URL of the MP3 file",
    },
  },
  async run({ $ }) {
    const ffmpegPath = ffmpegInstaller.path;
    const ffprobePath = ffprobeInstaller.path;

    const downloadPath = join("/tmp", "downloaded.mp3");
    const fileStream = createWriteStream(downloadPath);

    const response = await axios($, {
      method: "GET",
      url: this.url,
      responseType: "stream",
    });

    response.pipe(fileStream);

    await new Promise((resolve, reject) => {
      fileStream.on("finish", resolve);
      fileStream.on("error", reject);
    });

    const outputDir = join("/tmp", "chunks");
    await execAsync(`mkdir -p ${outputDir}`);

    const command = `${ffmpegPath} -i ${downloadPath} -f segment -segment_time 60 -c copy ${outputDir}/chunk-%03d.mp3`;
    await execAsync(command);

    return { message: "MP3 file has been split into 60-second chunks" };
  },
});

[ai.m.pipedream.net](http://ai.m.pipedream.net) generated this code for me:

curl -d '{"prompt": "I want to download an mp3 file via URL, and use ffmpeg to split a file into 60 second chunks. ffmpeg is not installed in the environment, so will need to be downloaded via npm and provided to the environment"}' https://ai.m.pipedream.net

user-1 · April 17, 2023, 1:08pm

modifying the action now, and making some other improvements that should reduce memory

user-1 · April 17, 2023, 1:08pm

is that ai.m domain a new feature?

user-1 · April 17, 2023, 1:08pm

I’m working to get this into the Pipedream UI, we’re just testing it via that endpoint for now to figure out the warts. Here’s a quick overview, open to any feedback!

user-1 · April 17, 2023, 1:08pm

Whoa that’s cool!

user-1 · April 17, 2023, 1:08pm

I didn’t get this working tonight and have to leave for the evening, would you mind testing and improving this PR? Main changes were to add the stream.PassThrough logic and the file chunking + sending the chunks to OpenAI. You can use a large file like this to test.

@U04Q6GJ2B7D does the file splitting code above work for you for tonight? You should be able to iterate over the chunked files and send a request to the Whisper API for each file.

user-1 · April 17, 2023, 1:08pm

will test first thing tomorrow!

user-1 · April 17, 2023, 1:08pm

I updated the code to split into max 24MB equal parts, and then call Whisper asynchronously, then joins the output transcriptions. I tested with this audio file that has 41.9 MB size.

user-1 · April 17, 2023, 1:08pm

I’ll test a little bit more just to find how much memory the workflow needs