On this part, I discuss a manual process and thoughts of using spectrogram with Tensorflow object detection methods. On the video, I just discuss how I am planning to label the image.
I his command is video rendering show is my thoughts on how I think the approach you using machine learning learning to identify audio snippets volume to help me identify the audio from a former project from the operations investigations project what and him and the prizes of of doing the data collection but did human thought is that there is an article you have posted on the blog about how to use pictograph popped of the potential uses spectrograph for CNN networks and intensive floor Museum faster our CNN model to do them all training and in in this article he was on about using spectrograph I have a Dossey appear right now and I’m one of record a phrase like for example hello hello So this is what I have here in the press play a it is a lowering so this is the waveform for this 70 that the immediate for this of in the belief that as well is very little noise but this is just an example now what I do here is an article here to the audio track and in the use spectrogram and I’m 90% sure yet of how I do this but I’m assuming that a particular break the spectrogram’s every second and to create an image just America here it is an image saying hello I don’t know yet if on a break the word itself for example in a select cared this for obtained half had had had so it goes to put another layer just for tax year that was a had and then maybe year low low low and I put here low right and then I’m 90% sure what this will be here so let’s see here so if it finishes out hello so does this become part of Melissa Livermore we get the idea right so we have had low so maybe of to hear Lolo Lolo Lolo seat you’d be trying to do though the I seldom see recognize here and say hello and had low no worse is like the spark for hello and then this at think we can delete that you want to do is this week here in delete label okay knowledge of this year. Just like a so so it was to highlight the so tired think here as she get the entire word hello hello hello hello so you putting here hello certain citizen my notes areas here in this will be the hair in the low so what I would want to do from here it is to actually if I was to continue using a Dossey the as a source of creating the spectrogram within the one second right I said hello less than a second but the thoughts here is that the said have a video extract the audio and then I break that audio for every second I I will evaluate the audio to the point where I cannot buy have isolated certain audio snippets I will go in into a Dossey create a spectrogram here and then process is so what that would like to do after this point here is to get an image and and having that this is Windows’s party using’s that the tool to get is just an example I haven’t yet figured out figure all the everything at this point that she needed the trade agreements that the not in I want you so what what I want not want to do is to use the snipping tool to come over here and actually capture this whole area here as an image and save it as an image not coming here and save it into this folder that I’m been creating range of been example as part of model spectrogram name is as hello and then I would say this as JPEG testifying could save so this is this is a spectrogram of me saying hello of course I’ll get additional samples from other people talking to get this is data samples here and then there is there is another method or another way of identifying tags which are party going to more details as far sounds audios schedule you want to do text-to-speech soul. But that later but is just an example general idea how one approach to so what I do is when I’m going without into the labeling on the known goal back to the labeling image application and been once he comes up and running on the open of the directory and navigate to the same folder and then on the go to this folder and select the spectrogram and it’s been a Share the all the images that have their and then now while start doing it also save change the same directory as well in here so much XML file could be saved one of the labeling so when I start labeling here I once I will want to select this part here that you know yet how much on a label but here I was say hello maybe the letter and I don’t know yet to channel say hello to channel and are not about what putting here English exec I do to speak Spanish so so well do here is a do to channel and then do another one for each of the channels and this will party be left and right so I said bottom will be my left and then this one will be right and then went to do its is these peaks here and again I’m not going here in a label these think so and then I could party select this but take out the low part and then to be left left channel and in the right channel this is said to channel recording of like and then under this one this is the low solicitude channel low and then the low left in the low right so you can see these will be on my labels for this specific word I know there’s thousands and thousands of words on so this was also may be time-consuming but I think I have an idea of how to automate all the images attention the identify audio to tax and then all-out to do is create spectrogram and then to the labeling process so this is how I would approach this so this is I think a potential way of teaching machine learning on a CNN network to learn how to read and audio files so in the end of post production environment you have a video importer and audio input you weren’t break down the entire audio your extract the audio from the video has any and then would you do is comprises the audio breaking down every second creating the spectrogram image of an and then analyzed that image to identify potential words on or forget the words of the use for each of these segments are not harmonies acting there on a forget the words are but they guys but so to analyze each one of on within the second center identify it would in effect in different young the the confidence level and that in that image it has had or low the course when you combine both will know create a word hello so it easy to imagine it in once once identifying the words you innovated the stars to learn how to identify them so we can query and and fine the information network for some of these guys get you some kind of idea what I plan to use spectrogram it has a potential have a lot of work but again it’s it’s this project is going to take a while to build in the be continue to work on it and hoping that people contribute to this health with identify some of this some with words that I’m you be using so it was a welcome your feedback hope this information is helpful the this memento. Either way to know as well and you