AI Chronicles: Technical Support Edition - Part 2

Converting Training Videos to Audio & Getting Full Transcripts

Introduction

You can find Part 1 of this series here

This article will show you how to convert any instructional or training video, or web session recording into an audio file and extract the full transcript. So you can either listen to it or read the transcript, making it easier to understand the content of the video or web session recording. Plus, searching for keywords in the transcript is simple. This is the first step in the process of turning your entire knowledge base into a searchable format. But don't worry, if you just want to listen to the audio file or read the transcript or summary, you can do that too after reading this article.

The Problem

As members of a developer or support engineering team, it's imperative to deeply understand and support multiple products. We're often provided with lots of training material, including numerous videos for reference and to assist future support engineers.

Handling a few videos is manageable but supporting a large product with lots of moving parts means you could end up reviewing a gigantic collection of these videos. I remember once sifting through a folder of training videos spanning over 10 years. It was a bit of a headache working through the important details, as the videos were randomly named with little context.

If this situation sounds familiar, what we're about to share here is bound to help.

Remember, in the realm of customer support, time is vital.

So, let's find a way to convert these videos into a searchable format.

In this article, we'll use a sample video, convert it into an audio file, and then get the transcript of the audio file, with and without timestamps. We'll also look at getting a summary of the transcript.

Let's get started

Step 1: The video

As downloading videos is a bit controversial, I'll use an old training video from archive.org. Feel free to download or watch the video here.

The video is essentially a Sickit-learn tutorial from 2014 and lasts around 40 minutes. Feel free to use any video of your choice.

Step 2: Convert the video into an audio file

We'll be using ffmpeg to convert the video into an audio file. You can download ffmpeg from here. Run the following command in the folder where you've downloaded the video.

Let's assume your file name is training-video.mp4. If it's different, just update the command below.

ffmpeg -i training-video.mp4 -f mp3 -ab 192000 -vn training-audio.mp3

The command will convert the video into an audio file named training-audio.mp3. Feel free to choose your own file name. It took around 19 seconds to convert a 40-minute video into an audio file on my M1 Air base model.

You can now enjoy listening to the training audio file.

Step 3: Get the transcript of the audio file

To get the transcript, we'll be using OpenAI Whisper, one of the best open-source transcription services out there. You can read more about it . We'll run it locally to avoid any API charges or sharing our internal training data with third party services. For simplicity, we'll use the pre-trained model, but feel free to train your own. We'll handle all this in Python.

Install Whisper

pip install -U openai-whisper

I'm using the medium model, but pick one that suits your requirements and your machine's specs. More details about the models are available here.

The following code will merely print the output to your console, which you can save or use as needed. But toward the end of this article, I'll share the complete code that does everything we're discussing here.

import whisper

model = whisper.load_model("medium")

# please note for the first time it will download the model, requires around 1.5GB of space for medium.
result = model.transcribe("training-audio.mp3")
# this will print the transcript
print(result["text"])

#this will print the transcript with time stamps
print(result["segments"])

Once it's done, you will see the transcript in the console.

Transcribing a long video might take some time. For the 40-minute video, it took around 20-30 minutes on my M1 Air base model. This includes downloading the model for the first time. Appreciate your patience.

Check the below screenshot for the speed and accuracy of the model. I recommend starting with the tiny version and seeing how that goes.

Check out how the transcript appears

It's about 5000 words long. The more you explore, the more you'll appreciate the model's accuracy. I've added the transcript below.

 So the talk today is about scikit-learn, or in other words, why I think scikit-learn is so cool. First of all, I would like to ask you three questions. What's your favorite color, actually? But if you already know what machine learning is, how many of you? Oh, great. Perfect. The second one is, have you ever used scikit-learn? And the third one is, how many of you also attend the great training on scikit yesterday? Okay. Okay, it's just to rave questions. So what actually machine learning means? Machine learning, there are many definitions about machine learning. One of these is, machine learning teaches machines how to carry out tasks by themselves. It's very trivial, very simple definition. And it's that simple. The complexity comes with the details. It's a very general definition, but just to give you the intuition behind. Machine learning at a glance. Machine learning is about algorithms that are able to analyze, to crunch the data, and in particular, to learn the data, from the data. They basically exploit statistical approaches. So that's why statistical is a very huge word in this cloud. Machine learning is almost related to data analysis techniques. There are many buzzwords about machine learning. You may have heard about data analysis, data mining, big data, and data science. Data science actually is the study of the generalizable extraction of knowledge from data. And machine learning is related to data science according to Drew Conway with this Vern diagram. Machine learning is in the middle. And data science is a part of machine learning because it exploits machine learning. Machine learning is a fundamental part in the data science steps. But what is actually the relation of data mining and data analysis in general with machine learning? Machine learning is about to make predictions. So instead of only analyzing the data we have, machine learning is also able to generalize from this data. So the idea is we have a bunch of data. We may want to crunch this data to make statistics analysis on this data. And that's it. This is also called data mining, for instance. Machine learning is a bit different because machine learning performs this analysis. The goal is slightly different. The goal is analyze this data and generalize, try to find, to learn from this data, a general model for future data, for data that are already, that are almost unseen at this time. So the idea is a pattern exists in the data. We cannot pin this pattern manually, but we have data on it. So we may learn from this data. In other words, this kind of learning is also known as learning by examples. Machine learning comes in two different settings. There is the supervised settings. This is the general pipeline of a machine learning algorithm. You have all the data on the upper left corner. You translate the data in a feature vector. This is almost a common step in processing the data. Then you feed those feature vectors to your machine learning algorithm. And the supervised learning setting supports also the labels, which is the set of expected results on this data. And then we combine, we generate this model from feature vectors and labels, and we generalize, we get the model to predict for future data in the bottom left corner of the figure. A classical example of supervised learning is the classification. You have two different groups of data in this case, and you want to find a general rule to separate these data. So you find, in this case, a function that separates the data. And for future data, you will be able to know which is the class. In this case, it's a binary classification, so you have two classes. In the future, when you've got new data, you will be able to predict which is the class associated to this data. Another example is the clustering. In this case, the setting is called unsupervised learning. The pipeline processing is this one. You have the same old processing, but what you miss is the label part. Because that's why this is called unsupervised, because you have no supervision on the data. You have no label to predict. And as for the clustering, the problem is get a bunch of data and try to clusterize, in other words, to separate the data into different groups. So you have a bunch of data. You want to identify the groups inside this data. Just a brief introduction. So what about Python? Python and data science are very related nowadays. Actually, Python is getting more and more packages for computational science. According to this graph, Python is a cutting-edge technology for this kind of computation. It's in the upper right corner. And actually, it's replacing and substituting other technologies. One of the advantages, such as R or MATLAB, for instance, one of the advantages of Python is that Python provides unique programming language across different applications. It has a very huge set of libraries to exploit. And this is the case. The reason why Python is the language of choice nowadays for data science, almost the language of choice. And this is displacing R or MATLAB. By the way, there will be also a PyData conference at the end of the week. It will be started on Friday. So if you can, please come. Data science in Python. Actually, MATLAB could be easily substituted by all these technologies, such as IPython, NumPy, SciPy, and Matplotlib for plotting. But there are many other possibilities, especially for plotting nowadays. R could be easily substituted with Pandas. It's a great package. And in the Python ecosystem, we have also efficient Python interpreters that have been compiled for this kind of computation, such as Anaconda or AnthocanaPy. And we have also projects like Cyton. Cyton is a very great project to boost the computation of your Python code. The packages for martial learning in Python are manifold, actually. I'm trying to describe a bit of a set of well-known packages for martial learning code. And I would like to make some consideration on why scikit-learn is a very great one. We have Spark Martial Learning Lib, PyML, Natural Language Toolkit, NLTK, sometimes called the Shugen Martial Learning Toolbox. This morning there's been a talk about it. Scikit-learn, of course, PyBrain, MLPy. And there is a guy who set up a list on GitHub where everybody can put his or her contribution to this list in order to distribute the knowledge about available packages in different languages. And Python is very full of. So we have Spark MLib. Spark MLib actually is implemented in Scala. It's not Python. There is a wrapping in Python, which is called PySpark. But actually the library for martial learning is at a very early stage. Shugen is written in C++ and it offers a lot of interfaces. One of these interfaces is in Python. The other packages there are Python powered. So we're trying to talk about these packages. Natural Language Toolkit is implemented in pure Python. So no Numpy or Scipy allowed. But the other packages are implemented in Numpy and Scipy. So the code there is quite more efficient for large scale computations. And LTK supports Python 2 and Python 3 is also in a half a stage. PyML supports Python 2. Actually the support Python 3 is not so clear. PyBrain supports only Python 2. And these other two guys there support both Python 2 and Python 3. What about the purpose of these packages? NLTK is for natural language processing. It embeds some algorithms for martial learning. But actually it is not supposed to be used in complete martial learning environment. It's almost related to text analysis, natural language processing in general. PyML is almost focused on supervised learning, in particular to SVM technique, which is support vector machine. It has many algorithms, especially related to supervised learning. PyBrain is for neural network, which is another set of techniques in the martial learning ecosystem. The other two guys there are somewhat general purpose. So Scikit and martial learning py contains algorithms for supervised and unsupervised learning and some others slightly different settings for martial learning. So we will not consider anymore the PyML and PyBrain from here on. So we ended up with these three libraries written in Python for our martial learning code. So why to choose Scikit-learn? Ben Lorrecker, he's a big data guy, recommends Scikit-learn for six reasons. The first one is commitment to the documentation and usability. Scikit-learn has a brilliant documentation. And it's very, very useful for newcomers and for people without any background about martial learning. The second reason is models are chosen and implemented by a dedicated team of experts. And then the set of models supported by the library covers most martial learning tasks. PyData improves the support for data science tools, data science problems. And actually, I don't know if you know Kaggle. Kaggle is a site where you may apply for competition for data science. And Scikit is one of the most used package for this kind of competition. Another reason should be the focus. Scikit-learn is a martial learning library and its goal is to provide a set of common algorithms to Python users through a consistent interface. These two features are two of the features that I like the most. I will be more precise in a few slides about this. And finally, Scikit-learn scales the most data problems. So scalability is another feature that Scikit-learn supports out of the box. If you want to install Scikit-learn, you have to pip very few comments. You need to install numpy, scipy, matplotlib. IPython actually is not needed. It's just for convenience. And then you install Scikit-learn. All the other packages, numpy and scipy in particular, are required because Scikit-learn is based on numpy and scipy. But anyway, if you want to install other version of the Python interpreter, such as Anaconda, it's already provided out of the box. The design philosophy of Scikit, it's one of the greatest feature of this package, I guess, in my opinion. It includes all the batteries necessary for general purpose martial learning code. It has, it supports features and functionalities for data and data sets. Feature selection, feature extraction algorithms, martial learning algorithms in general in different settings. So classification, regression, clustering, and stuff like that. And finally, evaluation functions for cross validation, confusion metrics. We will see some examples in the next slides. The algorithm selection philosophy for this package is try to keep the core as light as possible and try to include only the well-known and largely used martial learning algorithms. So the focus here is to be as much general purpose as possible. So in order to include a broad audience of users. At a glance, this is a great picture depicting all the features provided by Scikit-learn. And this figure here has been gathered by the documentation. This is a sort of map you may follow that allows you to choose the particular martial learning techniques you want to use in your martial learning code. There are some clusters in this picture. There is regression over there, classification, clustering, and dimensionality reduction. And you may follow this kind of path over there to decide which is the setting most suited for your problem. The API of Scikit is very intuitive and mostly consistent to every martial learning technique. There are four different objects. There is the estimator, the predictor, transformer, and the model. These interfaces are implemented by almost all the martial learning algorithms included in the library. For instance, let's make an example. The API for the estimator is the method fit. An estimator is an object that fits a model based on some training data and is capable of inferring some properties on new data. For example, if we want to create an algorithm which is called KNN or K-Neighbors Classifiers, the KNN algorithm, which is a classifier, so it's for classification problems and then supervised learning, it has the fit method, but for also unsupervised learning algorithms such as K-Means, the K-Means algorithm is an estimator as well and it implements the fit method too. For feature selection, it's almost the same. Then the predictor. The predictor provides the predict and the predict probability method. And finally, the transformer is about the transform method and sometimes there is also the fit transform method that applies the fit and then the transformation of the data. The transformer is used to make transformation of the data in order to make the data able to end in a form that is able to be processed by the algorithms. Finally, the last one is the model. The model is the general model you may create in your machine learning algorithm. The model is for supervised and for unsupervised algorithms. And another great feature of Scikit is the pipelines because Scikit provides a great way to create pipeline processing. So in this case, you may create a pipeline of different processing steps just out of the box. You may apply these, select KBAS, which is feature selection step. Then after the feature selection, you may apply PCA. PCA is an algorithm for dimensionality reduction. And then you may apply logistic regression, which is a classifier. So you may instantiate pipeline processing very easily. And then you call the fit method on the pipeline and the fit method will and then the predict. The only constraint here is that the last step of the pipeline should be a class that implements the predict method, solve a predictor. So far so good? So let's see some examples. Scikit in action. We have, it's very introductory example. The first thing to consider is the data representation. Actually Scikit is based on NumPy and SciPy as you know. So all the data are usually represented as matrices and vectors. In general, in machine learning by definition, we have the X matrix over there, which is usually identified by the capital letter because it is a matrix, as a matrix of N different rows and D different colors. In this case, N is the number of samples we have in our data set and D is the number of features. So the number of relevant information on the data we have. So the data comes, the training data come in this flavor and under the hood it is implemented by SciPy.sparse matrices. Usually it is, if I'm not mistaken, should be CSR implementation, so comma sparse row, compressed sparse row. And finally we have the labels because we know the values for each of this data about the problem we have. The problem we are going to consider is about the Iris data set and we want to design an algorithm that is able to automatically recognize Iris species. So we have three different species of Iris. We have Iris versicolor on the left, Iris virginica here and Iris setosa here. The features we're going to consider are four and are the length of the saple and the width of the saple, the length of the petal and the width of the petal. So every data in this data set comes as a vector, every sample, sorry, comes as a vector of four different features. So those four here. Scikit has a great package to handle the data sets. Actually these particular data sets are very well known in many fields and is already embedded in the Scikit-learn library. So you only need to import the data set package and call the load Iris and then you call the function load Iris and the Iris object is a bunch object that contains different keys. It has the target names, the data, the target, a description of the data set and the feature names. Description is the description, a verbose description of the data set. Feature names are the four different features I already mentioned in the previous slides. The target names are the targets we expected on this data set, in particular setosa, versicolor and the three different Iris species we want to predict. Then we have the data. So we, Iris.data comes as a NumPy matrix, NumPy and the array. The shape of this matrix is 150 rows times four, which is four different columns. The targets are 150 because we have a value of target for each sample in the data set. So n, the number of samples in this case is 150, d, the number of features in this case is four and that's it. The targets here is the result of the target. So we have a value that ranges from zero to two corresponding to the three different classes we want to predict. We may try to apply a classification problem on this data. We want to exploit the KNN algorithm. The idea of the KNN classifiers is pretty simple. For example, if we consider a K which is equal to six, we're going to check the classes. This is a new data. We train our model with the training data and we want to predict the class of this new data on the classes of the six nearest neighbors of this data. In this case, it should be the Virginica, the red dot. Very simple. In Scikit, few lines of code. We import the data set. We call the KNN classifier algorithm. In this case, we select N neighbors equals to one. Then we call the fit method and we train our model. Then if this is what we get, actually, if we want to plot the data, these are called the decision boundaries of the classifier. If you want to know for new data which is the species of iris that has three centimeters times five centimeters and four times two centimeters petal width. Let's check iris.targetnames of knn.predict because KNN is a classifier. It may fit the data and also predict after the training and it says, okay, it's a Virginica. So far so good? We may also try to, instead of facing this problem as a classification, you may also face this problem in a non-supervised setting, so as a clustering problem. In this case, we are going to use the k-means algorithm. The k-means algorithm, the idea is pretty simple. We want to recreate a cluster of object and each object is equally distant to the center of this cluster. And that's it. In Scikit, it's very simple. We have the k-means. We specify the number of clusters we want to have. In the k-means, in this case, we want three clusters because we're going to predict three different species for the iris. And then this is the ground-through, so this is the value we expected. This is what we got after calling the k-means. As you may already notice, the interface for the two algorithms is exactly the same even if the machine learning settings are completely different. In the former case, it was supervised. In this latter case, it's unsupervised. So classification versus clustering. Finally, very few slides to conclude. Another great battery included in Scikit, and I don't know how many other machine learning libraries in Python are so complete in terms of batteries, is about the model evaluation algorithm. Model evaluation is necessary to know how do we know if our prediction model is good. So we apply model validation techniques. We may simply try to verify that every prediction corresponds to the actual target. But this is meaningless because we're trying to verify if we train all the data on the training. So this kind of evaluation is very poor because it's based only on the training. So we are just checking if we are able to fit the data, but we are not able to test if the final model is able to generalize. Because a key feature of this kind of technique is the generalization. So don't go too much to the training data because you will end up in a problem which is called overfitting. But you need to generalize to be able to noise and to be able to predict even new data that are not actually identical to the training data. One usually used technique in machine learning is the so-called confusion matrix. Scikit provides in the matrix package provides different kind of matrix to evaluate your performance. In this case we're going to use the confusion matrix. The confusion matrix is very simple. It's a matrix where it's the number of, it has a square matrix where the rows and the columns corresponds to the number of classes you want to predict. And in the diagonal you have all the classes that you expect with respect to the classes that you predict. So you have all the possible matchings. If you have all the data there on the diagonal itself that you predicted perfectly all the classes. Is that clear? Okay, great, thank you. But a very well known for you guys that are already aware of machine learning is the cross-validation technique. Cross-validation is a mode of validation techniques for assessing how the results of the statistical knowledge of the data is able to generalize to independent data sets. Not only to the data set we use for training. And Scikit already provide all the features to handle this kind of stuff. So Scikit imposes us to write very few code. Just the few lines of code necessary to import the functions already provided in the library. In other cases we were required to implement this kind of function over and over for every time in our Python code. So this is very useful even for lazy programmers like me. In this case we exploit the train test split. So the idea of the cross-validation here is to splitting the training data into different sets. The training set and the test set. So we fit on the training set and we predict on the test set. So in this case we see that there are some errors coming from this prediction. This is a more robust way to evaluate our prediction model. So the last couple of things is large scale out of the box. Another great battery included in Scikit is the support for large scale computation already out of the box. You may combine Scikit-learn code with every library you want to use for multiple tests. Multi-processing or parallel computation, distributed computation. But if you want to exploit the already provided features for this kind of stuff, there are many techniques in the library that allows for a parameter which is called n underscore jobs. If you set these parameters with a value different to one, which is the default value, it performs the computation on the different CPU you have in your machine. If you put the minus one value here, this means that it is going to exploit all the CPUs you have in your single machine. And this is for different settings or for different kind of application in machine learning. You may apply multiple processing for clustering, the k-means examples we made a few slides ago for cross validation for instance or for grid search. Grid search is another great feature included in Scikit that is able to identify the best parameter for a predictor that maximizes the value for the cross validation. So we want to get the best parameters for our model that maximizes the cross validation so that it is able to generalize the best. Just to give the intuition. This is possible thanks to the joblib library which is provided in the background. Under the hood, the new number jobs here correspond to a call to the joblib. The joblib is well documented as well so you may read the documentation for any additional details. By no means less, Scikit meets any other libraries. Scikit could be integrated with NLTK, that is natural language toolkit and for Scikit image just to make a couple of example. In details, Scikit meets natural language toolkit by design NLTK includes additional module which is nltk.classify.scikitlearn which is actually a wrapper in the NLTK library that allows to translate the API of Scikit in the API used in NLTK. So if you have code on NLTK, you want to apply a classifier exploiting the Scikit library, you may import the classifier from Scikit and then you may use the Scikitlearn classifier from the NLTK package over there and wrap the interface for this classifier to the one of Scikit that it is in this case linear SBCU that stands for support vector classifier. And then you may also include this kind of stuff in a pipeline processing of Scikit. So in conclusion, Scikitlearn is not the only machine learning library available in Python but it is powerful and in my opinion easy to use, very efficient implementation provided it's based on NumPy, Scipy and Sighten under the hood and it is highly integrated for example in NLTK or Scikit image just to make an example. So I really hope that you're looking forward to using it and thanks a lot for your kind attention. Thank you. Thank you very much. We have six minutes left for your questions. Please raise your hand and I'll come by with a microphone. Well, thanks for the talk. I have two short questions. Does Scikitlearn provide any online learning methods? Yes. Yes. Actually this is a point I wasn't able to include in the slides. The online learning is already provided and there are many classifiers or techniques that allows for a method which is called partial fit. So you have this method to provide the model, a bunch of data one at a time. So the interface has been extended by a partial fit method. So some techniques allow for online learning and another very great usage of this partial fit is in case of the so called out of core learning. In that case, in the out of core learning setting, your data are too big to fit in the memory. So you provide the data, one bunch of data one at a time because they're too big to fit in the memory. So you call the partial fit method to train in case of a classifier to fit your model a bunch at a time. Okay. Thanks. Second quick question. Is there any support for missing values or missing labels apart from just deleting them? In case of online learning? No, just in general for any machine learning. For missing labels. Missing labels or missing data? What do you mean? So like if you have a feature vector that just misses like a value at the third component? Actually I don't know. Okay. Actually I don't know. So we have a very simple imputer. Yeah, thank you. I'll just let him come by. So we have a very simple imputer that's going to impute by median or mean in the different directions. So if you have very few missing data, it's going to work well. If you have a lot, then you might want to look at matrix completion methods, which we do not have. We had a Google Summer of Code project on this last year. It didn't finish. We welcome contributions of course. Thank you. Hello. Hi. I have some experience actually with Scikit before. And I'm actually a mathematician. I had no idea about all the stuff under the hood. And I didn't want to be too deep inside of the whole algorithm stuff and mathematics and such. And the biggest problem for me was to realize what do I do wrong. So if you got some kind of big data set with features, labels, supervised learning, what would you advise to someone who doesn't know how does it work inside? Which steps or which small or easy solutions should I consider to improve the results of the classification? Thanks. Yeah, actually, machine learning is about finding the right model with the right parameters. So there are many steps you may want to apply in your training, the different algorithms. In general, you apply data normalization steps. So you might, first of all, the first step I suggest is preprocessing of the data. So you analyze the data. You make some statistical tests on the data, some preprocessing, some visualization of your data in order to know what kind of data you're dealing with. So this is the first step. The second one is try the simplest model you want to apply and then improve it one step at a time. If you find the right model you want to use, then you're required to find the best settings for that model. In that case, you might end up using the grid search method, for instance, which is a method provided out of the box just to find the best combination of parameters that maximizes the values of the cross validation, for instance. And of course, it's a training on the job. So you may find the right model for your predictions or you may find the worst model and then you start over again and look for different models. Okay? I hope this helps. Yes, thanks again, Valerio. I think he's going to give a talk on the data analysis as well. I think on Saturday, isn't it? So if you attend PIDATA, don't miss that talk as well. Thanks again. Thank you very much.

As promised earlier, here's the code to automate everything we've discussed so far.

The only thing you need to do is download the video and place it in the video folder. The rest will be handled automatically. Just to be safe, create two folders in the same directory as the code file. One named "audio", and the other "transcripts". The code will automatically create the audio and transcript files inside these folders.

I'm providing the code for both macOS/Linux and Windows. Use the one that works for your system. The only significant difference is in the path separator.

Code for macOS and Linux Systems

import os
import subprocess
import whisper

# convert all the videos in the video folder and save them in the audio folder
def convertToAudio(file):
    # ffmpeg -i video.mp4 -f mp3 -ab 192000 -vn training.mp3
    print("converting to audio", file)
    filename = file.split("/")[-1]
    savePath = os.getcwd() + "/audio" + "/" + filename + "-audio.mp3"
    command = "ffmpeg -i " + file + " -f mp3 -ab 192000 -vn " + savePath
    print(command)
    if subprocess.call(command, shell=True) == 0:
        print("success")
    else:
        print("failed")

def readConvertAllVideos():
    videoPath = os.getcwd() + "/video"
    for file in os.listdir(videoPath):
        if file.endswith(".mp4"):
            convertToAudio(videoPath + "/" + file)


def transcribeAudio():
    model = whisper.load_model("tiny")
    for root, dirs, files in os.walk(os.getcwd() + "/audio"):
        for file in files:
            if file.endswith(".mp3"):
                print("transcribing", file)

                result = model.transcribe(os.getcwd() + "/audio/" + file)
                # save the result to a text file with audiofilename-transcript.txt
                os.chdir(os.getcwd() + "/transcripts")
                with open(file + "-transcript.txt", "w") as f:
                    transcription = result["text"]
                    f.write(result["text"])
                with open(file + "-segments.json", "w") as f:
                    segments = result["segments"]
                    f.write(str(segments))

                os.chdir("..")

                print(file, "done")


def main():
    readConvertAllVideos()
    transcribeAudio()

if __name__ == "__main__":
    main()

Code for Windows Systems

import os
import subprocess
import whisper

# convert all the videos in the video folder and save them in the audio folder
def convertToAudio(file):
    # ffmpeg -i video.mp4 -f mp3 -ab 192000 -vn training.mp3
    print("converting to audio", file)
    filename = file.split("\\")[-1]
    savePath = os.getcwd() + "\\audio" + "\\" + filename + "-audio.mp3"
    command = "ffmpeg -i " + file + " -f mp3 -ab 192000 -vn " + savePath
    print(command)
    if subprocess.call(command, shell=True) == 0:
        print("success")
    else:
        print("failed")

def readConvertAllVideos():
    videoPath = os.getcwd() + "\\video"
    for file in os.listdir(videoPath):
        if file.endswith(".mp4"):
            convertToAudio(videoPath + "\\" + file)


def transcribeAudio():
    model = whisper.load_model("tiny")
    for root, dirs, files in os.walk(os.getcwd() + "\\audio"):
        for file in files:
            if file.endswith(".mp3"):
                print("transcribing", file)

                result = model.transcribe(os.getcwd() + "\\audio\\" + file)
                # save the result to a text file with audiofilename-transcript.txt
                os.chdir(os.getcwd() + "\\transcripts")
                with open(file + "-transcript.txt", "w") as f:
                    transcription = result["text"]
                    f.write(result["text"])
                with open(file + "-segments.json", "w") as f:
                    segments = result["segments"]
                    f.write(str(segments))

                os.chdir("..")

                print(file, "done")


def main():
    readConvertAllVideos()
    transcribeAudio()

if __name__ == "__main__":
    main()

In the next post, we'll discuss how to summarize the transcript. That's a fascinating problem to solve too. We'll use one of the best open-source summarization models available, so stay tuned.

Happy Learning!

Back To All Blogs