Walter van Heuven
Speech2Text provides a simple graphical user interface for different automatic speech recognition (ASR) systems and services (OpenAI's whisper, whisper.cpp, faster-whisper, whisper ASR webservice, and the whisper API). Speech2Text transcribes the speech from audio and video files (e.g., .mp3, m4a, .wav, .mp4, .mov). The output is a text or subtitle file (.vtt or .srt). When you select OpenAI's whisper, whisper.cpp, or faster-whisper the ASR runs locally on your computer.
The app will convert mp4, mov, avi, and m4a files to (16 kHz, 16-bit) wav files if FFmpeg is installed on your computer. On macOS you can use brew to install FFmpeg, and on Windows use Scoop or Chocolatey to install FFmpeg. If you use the whisper ASR webservice or the whisper API, there is no need to install FFmpeg because the file conversion takes place on the server.
The application uses by default 'whisper.cpp' with the 'base' model for transcribing speech. This runs completely locally on your computer. To improve the accuracy of the transcription, select the medium or large/large-v2 model in the Settings. The medium and large models improve, in particular, accuracy of transcribing foreign names, specialist terminology, transcribing poor/noisy recordings and accented speech. However, these models require a powerful computer because the transcription process is processor intensive and requires a significant amount of memory.
The macOS versions have been tested with macOS Ventura. If you encounter issues with running the Apple silicon version on macOS Monterey, try the Intel version.
On MacOS open the DMG file and drag Speech2Text to the Applications folder. On Windows double click on the "Speech2Text_Setup.zip" to open the zip file that contains the installer. Next, double click "Speech2Text Setup" to install Speech2Text.
The app is not code signed. Therefore, you will see a security message when you start the application for the first time or run the installer. Below are instructions to open the app in macOS and to run the installer in Windows.
The first time you start the application on macOS (double click on the application icon in Finder), you will see the dialog box below:
Please note that starting the application the first time might take a while.
To fix the security warning, right-click on the application icon in the Finder, and then select in the popup menu "Open". Next, you will see this message:
Select "Open" and then it will take a few seconds before the app starts. The security warning will now not appear anymore for this app and the application should start up quickly.
The following message will appear when you start the installer.
Click on "More info", and the following message will appear.
Click on "Run anyway" to start the installer.
See Table 1 for some indication of how long it takes to generate captions (VTT file) for a 19-minute recording.
|Model||Dell Latitudea||iMac (M1)b||MacBook Pro (M1 Max)c||ASR webserviced|
|aDell Latitude 5520 (Windows 11, 16 GB, Intel i7 11th gen). b24" iMac M1 (macOS Ventura, 16 GB) whisper.cpp using GPU (Metal). c14" MacBook Pro M1 Max (macOS Ventura, 32 GB) whisper.cpp using GPU (Metal). dASR webservice running on a ThinkStation P360 (Windows 11, 32 GB, Intel i9 12th gen, RTX A4000) using CUDA.|
On Windows PCs with an 11th generation or older Intel CPU or on Macs with an Intel CPU, it is best to use the whisper ASR Webservice running an a remote computer.
The source code will be made available on GitHub.
If you have any questions, contact Walter van Heuven