Running Phi-3-vision via ONNX on Jetson Platform

Jambo - Jul 19 - - Dev Community

This article aims to run the quantized Phi-3-vision model in ONNX format on the Jetson platform and successfully perform inference for image+text dialogue tasks.

What is Jetson?

Jetson is a series of small arm64 devices launched by NVIDIA, equipped with powerful GPU computing capabilities. It is designed for edge computing and AI applications. Running on a Linux system, Jetson can handle complex computing tasks with low power consumption, making it ideal for developing embedded AI and machine learning projects.

Why ONNX Runtime?

ONNX Runtime is a high-performance inference engine for executing ONNX (Open Neural Network Exchange) models. It provides a simple way to run large language models like Llama, Phi, Gemma, and Mistral via the onnxruntime-genai API.

Running the Phi-3-vision Model

In this project, I ran the Phi-3-vision model on the Jetson platform using ONNX Runtime. Here’s a sneak peek at the results.

Inference Speed and Resource Utilization

By running the Phi-3-vision model in ONNX format on the Jetson Orin Nano with the Int4 quantized model, I achieved remarkable performance metrics. The Python process utilized 5.4 GB of GPU memory while keeping the CPU load minimal and almost fully utilizing the GPU.

Resource Utilization

The inference speed was impressively fast, even on a device with just a 15W power consumption.

Inference Speed

Example Output

Here’s an example of running the Phi-3-vision model on the Jetson platform. Given an image and a prompt, the model successfully converted the image to markdown format:

  • Input Image:

    table.png

  • Prompt:

    Convert this image to markdown format
    
  • Output:

    | Product             | Qtr 1    | Qtr 2    | Grand Total |
    |---------------------|----------|----------|-------------|
    | Chocolade           | $744.60  | $162.56  | $907.16     |
    | Gummibarchen        | $5,079.60| $1,249.20| $6,328.80   |
    | Scottish Longbreads | $1,267.50| $1,062.50| $2,330.00   |
    | Sir Rodney's Scones | $1,418.00| $756.00  | $2,174.00   |
    | Tarte au sucre      | $4,728.00| $4,547.92| $9,275.92   |
    | Chocolate Biscuits  | $943.89  | $349.60  | $1,293.49   |
    | Total               | $14,181.59| $8,127.78| $22,309.37  |
    

    The table lists various products along with their sales figures for Qtr 1, Qtr 2, and the Grand Total. The products include Chocolade, Gummibarchen, Scottish Longbreads, Sir Rodney's Scones, Tarte au sucre, and Chocolate Biscuits. The Grand Total column sums up the sales for each product across the two quarters.

Conclusion

Running the Phi-3-vision model in ONNX format on the Jetson platform demonstrates the incredible potential of combining powerful AI models with efficient edge computing devices. The results are impressive, and the resource utilization is optimized for low-power devices.

👉 For a detailed step-by-step guide on setting up and running the Phi-3-vision model on Jetson, including preparation and installation, please visit the complete article here: Running Phi-3-vision via ONNX on Jetson Platform

If you're interested in AI and edge computing, don't miss out on this comprehensive tutorial!

. . . . . . . . . . .