Can OpenAI Whisper distinguish between different voices during transcription?

This topic was automatically generated from Slack. You can find the original thread here.

Is there a way for the open.ai whisper to transcribe and delineate between different voices?

Hi , I think your question is a great case for Pipedream bot to help. I’ll pass your question to the Pipedream bot now. Please feel free to ask it any question

Is there a way for the open.ai whisper to transcribe and delineate between different voices?

The OpenAI Whisper model is designed to transcribe audio to text, but it does not have a built-in feature to delineate between different voices. It will transcribe the audio content without distinguishing between speakers. If you need to separate speakers in a conversation, you might need to use additional tools or services specifically designed for speaker diarization.

https://github.com/yinruiqing/pyannote-whisper

Is there a way in which to incorporate this into a transcribe model.

@U04T6B6HMC6

Incorporating the pyannote-whisper model into a transcription workflow would require you to use a custom solution, as Pipedream does not have a built-in integration for this specific model. You can create a custom Pipedream component that utilizes the pyannote-whisper model for speaker diarization and combines it with the transcription results from another service like OpenAI Whisper. To achieve this, you might need to set up a custom environment, such as a server or a cloud function, to run the pyannote-whisper model and process the audio data. Then, you can create a Pipedream component that sends the audio data to your custom environment, receives the speaker diarization results, and combines them with the transcription results from another service. Keep in mind that this process requires custom development and is not directly supported by Pipedream’s built-in components.