In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired.
Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.