Transcribing Zoom recordings with WhisperX on AWS Fargate

I recently wrote about AWS Copilot, a CLI tool that makes it easy to deploy Dockerized applications on AWS.

I used AWS Copilot to run CPU-only WhisperX on AWS Fargate: https://github.com/DigitalHarborFoundation/whisperx-on-aws-fargate

Take a look at the repository to learn more, but the basic point is that it’s straightforward to Dockerize complex ML models and deploy them in ETL jobs using Copilot. I used Max Bain’s WhisperX library to do automatic speech recognition and speaker diarization, with the intended use case of batch-transcribing Zoom recordings.

If you’re using AWS already and interested in transcribing batches of audio or video files, take a look.

If you’re looking for alternative’s to using WhisperX directly, Mahmoud Ashraf’s whisper-diarization is a nice alternative.