We’re all familiar with the voice commands for smart devices, such as, “Alexa,” “Hey Siri,” or “OK, Google.” But how does this actually work? The process is called keyword spotting or audio classification, and is a machine learning approach that can recognize audible events, particularly voice, even in the presence of other background noise or chatter.

Keyword Spotting

Let’s learn how to build a keyword spotting model with Edge Impulse. We’ll collect audio data from microphones, use signal processing to extract the most important information, and train a deep neural network that can tell you whether your keyword was heard in a given clip of audio. Finally, we’ll deploy the system to an embedded device and evaluate how well it works. At the end of this tutorial, you’ll have a firm understanding of how to classify audio using Edge Impulse.

There is also a video version of this tutorial. You can view the finished project, including all data, signal processing, and machine lear...