What if you can talk to a machine?
In many households, the use of voice assistants such as "Alexa" from Amazon – probably the best-known representative of keyword spotting solutions (KWS for short) alongside Apple's Siri – has revolutionised everyday life. After a so-called wake word ("Alexa!", "Hey Siri"), which activates the application, it transmits the user's request to a remote server, which analyses the voice command and then sends the evaluation result back to the device.
In this way, music requests are fulfilled, a movie is added to the playlist, or the Christmas present for the loved one is ordered. Similarities in technology can also be found in industry and medical technology, where KWS is becoming increasingly important. By capturing and using speech, processes can already be accelerated, made more efficient and even save lives.
"The advantages are obvious," Viacheslav Gromov, Managing Director of AI provider AITAD, goes into detail. "If the employee controls the machine in production with their voice, that means more flexibility. They can operate the equipment remotely and no longer have to press buttons or type their input on the machine screen. It also means avoiding germs and bacteria, an important aspect in medicine. Even a shouted 'stop' command is quicker than running to the machine and stopping it."
Speech recognition in industry and medical technology
The transmission of voice signals to remote servers involves latency times that are unacceptable in security-relevant environments. The risk of manipulation and failure of a network connection also play a role. Use in safety-critical environments in industry and medicine requires solutions that work locally and in real time. This is where innovative voice control models come into play, which are implemented with embedded AI. Here, not only individual wake words are recognised, but up to 30 predefined terms, which enables complex commands. A keyword activates the system. Complex combinations of this predefined group of words can then be spoken and evaluated by the AI, which sits on the same small board together with the microphones ("embedded AI system component").
Examples of such voice commands (here in the example with "robot" as wakeword):
- "Robot, start programme A on machine 3"
- "Robot, stop conveyor belt 6"
- "Robot, motor 4 in machine 3, increase the speed by 40%"
- "Robot, wheelchair, turn left now"
- "Robot, emergency off"
"The decisive factor here is a maximum level of security, which is guaranteed by the local processing of voice data – without a cloud or server. As well as efficiency, as the commands are evaluated directly in the chip in real time and the machine reacts faster as a result," Gromov continues.
Freely configurable, real-time capable and robust
Companies can configure their voice control individually, from the wake words to the group of words. This enables customised systems that are also available in several languages. Synthetically generated security queries can be integrated if required. There are many possible applications, from controlling machines to emergency release by voice command. There are a wide range of possible applications, from controlling machines to emergency release by voice command.
The possible applications for voice controls are virtually unlimited:
- In the operating room, the surgeon can start, adjust, and switch off the high-frequency scalpel by voice command
- Contactless control of machines and devices in production and hospitals
- Emergency stop by voice command saves a trip to the emergency stop button in an emergency
- For example, people with assistance needs can control their wheelchair by voice
Adaptable in harsh environments
Innovative self-sufficient embedded AI voice control solutions not only work completely locally and in real time, but also prove themselves in harsh environments. The AI is trained not only with the words to be recognised, but also with background noise to ensure reliable speech recognition, even when the environment is noisy. The integration of an additional microphone makes it possible to recognise the location of the speaker and eliminate disturbing noises (beamforming).
"This new pioneering voice control system will permanently change and determine the future of industry and medicine," Gromov is certain. "Requests from industry and medicine are increasing. We have developed a solution ourselves that enables companies to equip their products with this technology today and thus take on a pioneering role in industry and business. This is characterised by a high level of robustness, even against interference noise, and is individually tailored to customer requirements. As we have access to pre-development, we can offer our local voice control with around 30 words to be recognised at a reasonable price."