Understanding computer vision with Zebra Technologies
Since acquiring Cortexica in 2019, Zebra Technologies has been hard at work getting the most from computer vision in a deliverable package.
To learn more about computer vision as well as Zebra’s developments in the technology, Electronic Specifier spoke with Stuart Hubbard, Senior Director of AI and Advanced Development at Zebra.
Zebra’s computer vision technology
Zebra’s computer vision technology was developed through the acquisition of Cortexica, which had expertise in on-device computer vision. As Hubbard explained: “One of the main reasons we were acquired was because Zebra didn't have the computer vision expertise they wanted at the time, especially for on-device applications.”
Since the acquisition, Cortexica’s initial developments have been able to flourish. Zebra’s approach to computer vision involves constantly innovating and evolving the models they are based on, including various techniques such as pruning, distillation, and quantisation to balance aspects such as accuracy, size, and efficiency. Hubbard expands: “We are constantly looking to innovate on our models, looking at different techniques and methods.
“We are always asking ourselves: how do we take custom layers from the model networks? How do we look at the outputs and decide which layer to take from? How can we compress them? There are different techniques such as pruning, distillation, and quantisation, but how can we make them more efficient to run on-device?”
Real-world data is one of the most integral parts of this development cycle. Zebra's computer vision technology is used in various real-world applications, such as product localisation in retail, warehouse inventory counting, and other use cases that require precise localisation and recognition of products, shelves, labels, and text. It’s very important that the computer vision models are trained on this sort of real data as a base.
As Hubbard explains: “It is important that we can initially train in the environment that the product is going to physically be in. But simultaneously we do look at how we can evolve that model to be more generalised across different environments?”
Key factors behind Zebra’s computer vision
For Zebra, aspects like product recognition are crucial to the types of applications its technology is being used. To achieve this, and do it successfully, it is not as simple as a quick product match. Rather, localising and recognising individual products, shelves, labels, and text within an image is a complex but necessary process. As Hubbard puts it: “Localising all these parts so that you can then focus on a specific area of interest is what's really important. If I'm a frontline worker using a handheld device’s camera to detect out-of-stock items and I'm looking at the whole shelf, then I need to focus in on a specific product. If I was using the whole shelf image then I wouldn’t be able to recognise anything because it's too broad, therefore we need to split the image into individual products by localising and then recognising each individual product.”
The approach involves cropping the image to focus on specific areas of interest, representing the image data as numerical data which then allows for faster matching and differentiation between very similar-looking items. This process is then followed up with additional measures to ensure accuracy, such as techniques like reading text attributes or barcodes.
“This [method] is actually a bit like matching a fingerprint, in the sense you are matching one pattern with another. After the match, we give it a confidence score against the product we think it is.”
Deep learning for computer vision
Zebra focuses heavily on the deep learning aspects of its computer vision, enabled thanks to a great team of researchers and engineers behind the scenes, an end-to-end approach, and strategic partnerships within the field.
“We’ve key strategic partnerships with companies like Qualcomm,” says Hubbard. “Using Qualcomm chips in our devices and working closely means that we can optimise our models on each of the processors within our devices such as the NPU, GPU, and CPU all on a single device.
Zebra’s research focuses heavily on industry areas which can then be incrementally improved to create stronger deep learning models, this includes exploring custom layers and optimising model outputs. This, at its core, goes back to the previously mentioned pruning, distillation, and quantisation methodology. By following this process Zebra are able to take a large Cloud-based model and ensure that it runs effectively and efficiently on a smaller-scale device. “Pulling all this together is what makes us at Zebra unique, we’ve got the whole stack ready from the hardware all the way to the software side, including algorithms,” enthuses Hubbard.
What’s next at Zebra?
Zebra’s vision for the future is centralised around integrating computer vision with other AI technologies like Voice AI, further machine learning algorithms, and Generative AI to create a more complete, context-aware solution, that can be readily deployable.
The focus is on using sensors and contextual information, such as GPS or accelerometers, along with various AI technologies to provide more relevant and proactive information to users.
Zebra also sees strong potential in using Generative AI to create new use cases and improve the human interface aspect of its products. The goal here is to make the technology more natural and intuitive to use for as many people as possible, no matter their background, language, or understanding level.
Hubbard also mentioned how the company is working to commercialise and enable its partners to build real-world AI applications quickly, moving past the phase of academic research and innovation for its own sake.
As Hubbard puts it: “What we really want is if [our partners] could just say ‘Right, I’ve got this idea, and I’ve used the Zebra AI tool set to make a proof of concept. Then we can work with customers to get to the application into production within weeks.
“We want to enable people to get to the return on investment stage as quickly as possible. I like to think of it as a recipe, I can pick and choose what I need to be able to do what I want to achieve,” he concludes.