Web Speech API with React

The internet was designed to be user friendly and to be used for all. One feature that makes it easier for users and visually impaired is speech recognition and speech synthesis.

The Javascript Web Speech API has both speech recognition and speech synthesis functionalities. Speech recognition is the browser’s ability to recognize speech through a device’s microphone and Speech synthesis is the browser’s ability to turn text into speech/audio.

I created a basic example of speech recognition based on the Web Speech API. Below is a tutorial for my example.

The first step is to check if the user’s browser supports the Web Speech API.

let SpeechRecognition = window.webkitSpeechRecognition || window.SpeechRecognition
let SpeechGrammarList = window.SpeechGrammarList || window.webkitSpeechGrammarList

Now we set up the words we want our app to recognize. The grammar format used is JSGF notation.

let moods = ['happy', 'sad', 'sleepy', 'angry']
let grammar = '#JSGF V1.0; grammar moods; public <moods> = ' + moods.join(' | ') + ';';

Below we create a speech recognition object along with a grammar list and connect the two.

let recognition = new SpeechRecognition()
let recognitionList = new SpeechGrammarList()
recognitionList.addFromString(grammar, 1)
recognition.grammars = recognitionList

Next, we need to assign some properties to our speech recognition. The first line sets the language property to English using the BCP 47 language tag. The second line lets the device listen for a single word and the third sets itermResults to false. This way the user will only get their results once they are done speaking. MaxAlternatives are used if results are not clear and you want to display alternatives. For this demo, it is not needed so we will set it to 1.

recognition.lang = 'en-US'
recognition.continuous = false;
recognition.interimResults = false;
recognition.maxAlternatives = 1;

To start the speech recognition all you must do is call

recongition.start()

For our purpose we want the user to click the button that says press to talk. So we will add an event listener to the button that will run the method renderSpeech when clicked. In the renderSpeech method is where we will call recognition.start().

<button onClick={this.renderSpeech}>Press to Talk</button>renderSpeech = () => {
recognition.start()
}

You can use many event handlers to handle the speech results but the most common one is onresult, as used below. Event.results returns a SpeechRecognitionResultList object which contains our results. Since we set our maxAlternatives to one we only need one result and can access it like you would an array. Its transcript property will give us our result as a string.

renderSpeech = () => {
recognition.start()
recognition.onresult = (event) => {
//handle result in here
let word = event.results[0][0].transcript
}
}

Below I just did some logic so we could display an emoji according to the user's input and what they said.

renderSpeech = () => {
recognition.start()
recognition.onresult = (event) => {
let word = event.results[0][0].transcript
switch(word) {
case "happy":
this.setState({
emoji: "😄",
wordSpoken: word
});
break;
case "sleepy":
this.setState({
emoji: "😴",
wordSpoken: word
});
break;
case "sad":
this.setState({
emoji: "😢",
wordSpoken: word
});
break;
case "angry":
this.setState({
emoji: "😡",
wordSpoken: word
});
break;
default:
this.setState({
emoji: "🧐",
wordSpoken: `Sorry, I do not recognize the mood ${word}.`
});
}
}}

Below is the final product! I also recommend reading the Mozilla Web Speech docs. They are very thorough and easy to follow. Thank you and happy coding.

Former fashion designer turned software engineer