Vision Language Models for the Edge: Cascading Models for Better Reliability
on
Vision Language Models (VLM) are generative AI models that take in images and text prompts. Some of the latest VLMs can also be implemented on low-cost edge hardware, such as the RUBIK Pi 3. This platform has multiple accelerators that can be used to run a VLM and an object detection model at the same time. This enables a technique called model cascading, which improves reliability and performance for complex edge AI use cases.
In the last year, we’ve seen a convergence of two technologies that are enabling brand-new ways to build edge AI applications. The first is edge hardware performance. Single board computers at a low price point are now available with powerful hardware acceleration in the form of GPUs (Graphical Processing Units) for general tasks, and NPUs (Neural Processing Units) for running neural networks. A key example of this is the Thundercomm RUBIK Pi 3 dev...
