Harnessing Free Speech-to-Text AI Models For Building High-Power Productivity Tools
New Video On Coding Speech To Text
I recently discovered some new speech-to-text AI models that will be extremely useful to many people in the years to come. I’m amazed by the prospects of improving my brainstorming and video production workflows (and other productivity tools) using these new tools. I wanted to share with all of my readers this exciting new technology so you can use it for your use cases as well.
Huggingface Models (& Groq API)
If you haven’t heard of HuggingFace, it’s a place where people who make AI models can share them with the world. Many of the models are available totally free and some of them offer remarkable capabilities. To run the models, simple Python scripts can be authored. And these building blocks of system intelligence can also be integrated into more complex applications for the features they afford.
Today I’m posting a video exploring a few speech-to-text models from HuggingFace as well as a model hosting provider called Groq which offers these models fast and cheap if you don’t want to worry about running them yourself.
openai/whisper-large-v3-turbo : https://huggingface.co/openai/whisper-large-v3-turbo
distil-whisper/distil-large-v3 : https://huggingface.co/distil-whisper/distil-large-v3
openai/whisper-large-v3 : https://huggingface.co/openai/whisper-large-v3
Here is a graphic from Groq’s most recent blog post about how to choose between these three speech-to-text models.
Note Groq’s prices are much lower than Google or Amazon speech-to-text BUT we can get these models even cheaper- if we run these on our own computers the price becomes free.
This week I built reusable python code to allow people interested in exploring this technology and developers who want to build with these models to hit the ground running. Check out my new video on this subject:
(Note: Substack members can now download the code from this tutorial so you can hit the ground running even faster! Thank you for your support)
Transforming Workflows with Speech-to-Text
1. Brainstorming and Idea Generation
Traditional brainstorming sessions can be limited by the speed of typing or writing down ideas. With speech-to-text, you can:
Capture Ideas Instantly: Speak your thoughts as they come, ensuring no brilliant idea slips away.
Efficient Note-Taking: Capture important points during calls or lectures without interrupting the flow of conversation.
Organize Thoughts Efficiently: Convert spoken ideas into structured text, making it easier to analyze and refine concepts.
2. Developing Interactive Websites
Imagine creating websites where users can control features, navigate content, and interact with elements entirely through their voice. Speech-to-text makes it possible to allow users to use their voices to perform actions like searching, navigating & data entry
Download Code
Substack members can download the code from this video as well the other AI tools I’ve released to date: BERT text classification utilities & neural network training/ predictions scripts
Also, if you sign up for a 1 year plan of my substack, you can opt to get my Etymology Super Pack for free: over 90,000 pages of classic etymology reference books and old dictionaries that trace the history of the words that have become modern English. Read more here: https://createbetter.tv/etymology-book-pack/ (just email me once you join my Substack on an annual plan and I’ll send you the link to download!)