What is the coolest use of Pipedream+GPT for transcribing, summarizing, and organizing audio recordings?

This topic was automatically generated from Slack. You can find the original thread here.

I have to say, I think I found one of, if not the, coolest use of pipedream+gpt that affects my life immediately.

  1. Take audio recordings of meetings, personal monologues, conversations, whatever
  2. Pipe to gpt whisper to get transcription
  3. Pipe to gpt chat to get summaries, advice, devil’s advocate/opposition positions, categorize, whatever
  4. Store notes directly in OneNote, Notion, Obsidian, etc, with link back to original audio recording
    Note I didn’t come up with this solution: not sure if links allowed, but stolen from How to Take Perfect Notes with Your Voice Using ChatGPT and Notion

I have a node heavy workflow that uses Onedrive+Obsidian if anyone is interested in that instead of the example of Google drive+Notion. This is absolutely life changing.

2 Likes

That’s awesome! We were just about to send Thomas’s blog out to our community, it’s so good. Your riff on it is really cool.

Ya, I need a repository of pipedream+gpt workflows to explore, i’m sure there’s so much cool stuff out there that is being done! (wink wink)

1 Like

Completely hear you. Would love to publish all sorts of templates people can pull / fork directly in our main repo, in a workflows dir

I’m watching the video now. It’s absolute fire! And the breakdown of pipedream along the way is gold. I’ve been brainstorming a few open-ai workflows and this is serious inspiration. Thanks for sharing Morgan!

this is very, very good

Thanks for the video. I’m watching right now and reading around the Whisper docs/forums. It seems it’s not capable of handling 2 voices. Specifically its overly capable of dealing with 2 voices to the extent it doesn’t separate them in the transcript. Having separate voices in the transcript would be very useful (interviews, podcasts etc).

People in the Whisper forums are suggesting splitting the main audio files into separate voices and then sending them sequentially to Whisper using GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Do you have any experience with that? Can we just import the package on pipedream or is it more complicated?

OpenAI’s own Whisper can’t do this, but Deepgram’s models can! They actually claim their new Nova model is even more accurate than Whisper, and they also offer their own hosted version of Whisper that costs less than OpenAI’s. (Nova costs even less)

The feature you want is diarization, which is a flag you can set in an API call: Diarization - Documentation, Use Cases, Posts, and Tutorials - Deepgram Docs

I’m not 100% sure if their Whisper model can do it, but I know their Nova model can. I was talking with them a bunch before I released this tutorial, as the AI community in general has been struggling with getting Whisper to do diarization and accurate timestamps for captions

Deepgram is actually how I found out about Pipedream. I ran across this blog post: Automatically Transcribing Podcast Episodes with Pipedream and Python - Deepgram Blog ⚡️

This is what got me to sign up. I made an augmentation of this that uses Obsidian instead of Notion. I love it.

nice! Did you just save everything to a .md file and send it back to Drive so your Vault picks it up, or can you actually send stuff directly to Obsidian now?

Nah, i just point my Vault to Google Drive so I just upload the MD back to a directory

Gotcha, that’s what I figured you were doing!

I watched this video and love it.

I tried it but need some code written for my purposes. Anyone that can help write a little code so I can make my Pipedream + ChatGPT work?

Love this application use case. Followed Thomas Frank’s video exactly and it was enlightening, so much so that my coworkers are wanting to setup this exact workflow now. Only challenge I have ran into is that sometimes my meeting recording is over an hour long and I cant get the full trancription done as my OpenAI step fails due to lack of tokens. Only way I could get it to work was to use Audacity and really trim down the recording first. Not sure if there is another way to handle this but thought I would ask.

Nice work! I attempted a Pipedream transcription workflow by YouTuber, Thomas Frank:

But I wanted to use Deepgram and Obsidian instead. Unfortunately I can’t get it to work. The 2 blogs from Deepgram are too code heavy for me. But Pipedream to the rescue because we can now share our workflows:

I’d be super- grateful if you’d consider sharing your working workflow! :slight_smile:

Sweeeeeet!