I’m watching the video now. It’s absolute fire! And the breakdown of pipedream along the way is gold. I’ve been brainstorming a few open-ai workflows and this is serious inspiration. Thanks for sharing Morgan!
Thanks for the video. I’m watching right now and reading around the Whisper docs/forums. It seems it’s not capable of handling 2 voices. Specifically its overly capable of dealing with 2 voices to the extent it doesn’t separate them in the transcript. Having separate voices in the transcript would be very useful (interviews, podcasts etc).
OpenAI’s own Whisper can’t do this, but Deepgram’s models can! They actually claim their new Nova model is even more accurate than Whisper, and they also offer their own hosted version of Whisper that costs less than OpenAI’s. (Nova costs even less)
I’m not 100% sure if their Whisper model can do it, but I know their Nova model can. I was talking with them a bunch before I released this tutorial, as the AI community in general has been struggling with getting Whisper to do diarization and accurate timestamps for captions