I have been studying this article, and how it can use this method for my one or more of my project:
This method may be a good method to identify keywords and isolate audiables segments where I need to target.
The current challenge I am facing with using speech to text recognition is training a profile on a specific narrator. I had to purchase expensive speech to text software and still need time to spend to train it, optimize it, and I am only achieving a close 90% accuracy. Still, need to clean up the audio if there is heavy background noise and reduce or introduce silence. To further this challenge, my target audible segment is not the narrator, but other audible which can be different between audio sources.
To achieve the spectrometer, I came across SoX. SoX is the Swiss Army Knife of sound processing utilities. It can convert audio files to other popular audio file types and also apply sound effects and filters during the conversion.
In addition to being a nice processing utility application, it comes in command line version which can be integrated with any scripting language to process audio files as they get submitted. This allows for automation in the back end and depending if the scripting language has modules, can be used in the front end.
I will be putting together a process on how to achieve my goals to test this method. Sign up for notifications.