Vision Language Models (VLM) are generative AI models that take in images and text prompts. Some of the latest VLMs can also be implemented on low-cost edge hardware, such as the RUBIK Pi 3. This platform has multiple accelerators that can be used to run a VLM and an object detection model at the same time. This enables a technique called model cascading, which improves reliability and performance for complex edge AI use cases.

RUBIK Pi 3 dev kit with powerful hardware acceleration in the form of GPUs and NPUs.
Figure 1: A RUBIK Pi 3 dev kit with powerful hardware acceleration in the form of GPUs and NPUs.

In the last year, we’ve seen a convergence of two technologies that are enabling brand-new ways to build edge AI applications. The first is edge hardware performance. Single board computers at a low price point are now available with powerful hardware acceleration in the form of GPUs (Graphical Processing Units) for general tasks, and NPUs (Neural Processing Units) for running neural networks. A key example of this is the Thundercomm RUBIK Pi 3 dev...