Walter van Heuven

 
About Lab    Research Publications Software

waltervanheuven.net > Software > Speech2Text

Speech2Text

animation_s2t

Speech2Text provides a simple and easy to use graphical user interface for different automatic speech recognition (ASR) systems and services based on OpenAI's Whisper (mlx-whisper, whisper.cpp, faster-whisper, whisper ASR webservice, and the whisper API). Speech2Text transcribes the speech in audio and video files (e.g., .mp3, m4a, .wav, .mp4, .mov). The output is a text or subtitle file (.vtt or .srt). When you select OpenAI's Whisper, mlx-whisper, whisper.cpp, or faster-whisper, the ASR runs locally on your computer.

The app will work with wav files and it can convert mp3, mp4, mov, avi, and other audio/video files to (16 kHz, 16-bit) wav files when FFmpeg is installed on your computer (FFmpeg installation instructions). If you use the whisper ASR webservice or the whisper API, there is no need to install FFmpeg because the file conversion takes place on the server.

The application uses by default mlx-whisper on Mac computers with Apple silicon and on other computers whisper.cpp is used with the 'base' model for transcribing speech. Both of these implementations of Whisper are much faster than the original Whisper implementation from OpenAI. To improve the accuracy of the transcription, select the medium or one of the large models in the Settings (e.g., large-v2 or large-v3-turbo). The medium and large models improve, in particular, accuracy of transcribing foreign names, specialist terminology, transcribing poor/noisy recordings and accented speech. However, these models require a powerful computer because the transcription process is processor/GPU intensive and requires a large amount of memory (transcription speed info).

s2t_icon

Download

Version: 2.2.1

What's new in version 2.2.1

Installation and starting Speech2Text

On MacOS open the DMG file and drag Speech2Text to the Applications folder. On Windows double click on the "Speech2Text_Setup.zip" to open the zip file that contains the installer. Next, double click "Speech2Text Setup" to install Speech2Text.

The app is not code signed. Therefore, you will see a security message when you start the application for the first time or run the installer. Below are instructions to open the app in macOS and to run the installer in Windows.

macOS Sequoia

The first time you start the application on macOS (double click on the application icon in Finder), you will see the dialog box below. Click on the button "Done".

macOS Sequoia security msg 1

To open the App, go to the Settings and then select "Privacy & Security", enable Allow applications from "App Store & Known Developers", and click on the button "Open Anyway".

macOS Sequoia Privacy & Security in Settings

Next, a dialog box will appear and click again on the button "Open Anyway".

macOS Sequoia security msg 3, Open Speech2Text

Finally, enter your password to allow the application to Open now and in the future.

macOS Sequoia security msg 4, Use Password...

macOS Sonoma and earlier

The first time you start the application on macOS (double click on the application icon in Finder), you will see the dialog box below:

security msg 1

Please note that starting the application the first time might take a while.

To fix the security warning, right-click on the application icon in the Finder, and then select in the popup menu "Open". Next, you will see this message:

security msg 2

Select "Open" and then it will take a few seconds before the app starts. The security warning will now not appear anymore for this app and the application should start up quickly.

Windows

The following message will appear when you start the installer.

security msg Windows 1

Click on "More info", and the following message will appear.

security msg Windows 2

Click on "Run anyway" to start the installer.

Transcription speed

Tables 1 and 2 provide some indication of how long it takes for different Whisper implementations in Speech2Text (v2.2.0) to generate captions (VTT file) for a 19-minute audio recording.

Table 1. Comparison between whisper, mlx-whisper, whisper.cpp and faster-whisper (base model).
MacBook Pro (M1 Max)a PC (Intel i9, RTX A4000)b
whisper (CPU) 04m15s 02m49s
MLX whisper 00m22s NA
whisper.cpp (CPU) NA 01m33s
whisper.cpp (CUDA) NA 00m22s
whisper.cpp (Metal) 00m32s NA
whisper.cpp (CoreML) 00m33s NA
faster-whisper (CPU, int8) 01m00s
a14" MacBook Pro M1 Max (macOS 15.1, 32 GB, 24 GPU cores). bThinkStation P360 (Windows 11, 32 GB, Intel i9 12th gen, RTX A4000).
Table 2. Comparison between whisper, mlx-whisper, whisper.cpp and faster-whisper (large-v2 model).
MacBook Pro (M1 Max)a PC (Intel i9, RTX A4000)b
whisper (CPU)
MLX whisper 01m39s NA
whisper.cpp (CPU) NA 21m17s
whisper.cpp (CUDA) NA 01m51s
whisper.cpp (Metal) 03m50s NA
faster-whisper (CPU, int8)
a14" MacBook Pro M1 Max (macOS 15.1, 32 GB, 24 GPU cores). bThinkStation P360 (Windows 11, 32 GB, Intel i9 12th gen, RTX A4000).

Install FFmpeg

Instructions how to install FFmpeg can be found in the sections below.

macOS

Open the Terminal in macOS.

If you have not yet installed Homebrew (brew) on your computer, follow the instruction on this website: https://brew.sh

Next, enter the following command to install FFmpeg.

%brew install ffmpeg

Windows

You can use Scoop or Chocolatey to install FFmpeg on Windows.

Open the PowerShell Terminal in Windows.

Scoop

If you have not yet installed scoop on your computer, follow the instruction on this website: https://scoop.sh

Next, enter the following command to install FFmpeg.

>scoop install ffmpeg

Chocolatey

If you have not yet installed Chocolatey on your computer, follow the instruction on this website: https://community.chocolatey.org

Next, enter the following command to install FFmpeg.

>choco install ffmpeg

Source code

The source code will be made available on GitHub.

If you have any questions, contact Walter van Heuven

Known issues