AI Compatibility

AI Models on GPU, CPU & NPU

VSaaS.ai develops detection models compiled in C language for maximum computational performance. Our intelligent orchestration technology distributes each detection to the optimal processor — GPU, CPU, or NPU — reducing compute costs by up to 80%.

Manufacturers

C/C++

Native Models

Types: GPU/CPU/NPU

80%

Compute Savings

Rendimiento Nativo

Detection Models in C Language

Unlike most platforms that run models in Python with interpreted frameworks, VSaaS.ai compiles its detection models directly into native C code. This eliminates interpreter overhead and enables machine-speed inference, making the most of every processor clock cycle.

Native Compilation

Models are compiled to machine code optimized for each processor architecture (x86, ARM, CUDA, NPU).

Zero Runtime Overhead

We eliminate the Python/PyTorch layer in production. The model runs directly on the hardware with no intermediaries.

Architecture-Specific Optimization

Each model is compiled with specific instructions: AVX-512 for CPU, PTX for NVIDIA, native instructions for NPU.

Lower Memory Consumption

C-compiled binaries use up to 10x less RAM than their Python equivalents, allowing more models per server.

Traditional Approach — Python

# Inferencia típica en Python
import torch
model = torch.load("yolov8.pt")
model.eval()

# Overhead: intérprete + GIL + framework
with torch.no_grad():
    results = model(frame)  # ~45ms por frame
    
# RAM: ~2.1 GB por modelo
# Latencia: 45ms (GPU) / 180ms (CPU)

VSaaS.ai — Native C

// Inferencia VSaaS en C compilado
#include "vsaas_engine.h"

vsaas_model_t* model = vsaas_load(
    "yolov8_vehicle.vsm",
    VSAAS_TARGET_GPU | VSAAS_OPT_TENSORRT
);

// Sin overhead: ejecución directa en hardware
vsaas_result_t* res = vsaas_infer(
    model, frame_ptr, width, height
);  // ~12ms por frame

// RAM: ~210 MB por modelo
// Latencia: 12ms (GPU) / 35ms (CPU)

Latencia GPU

45ms

12ms

↑ 3.7x faster

Latencia CPU

180ms

35ms

↑ 5.1x faster

RAM por Modelo

2.1 GB

210 MB

↑ 10x faster

Compatible Hardware

Compatible AI Accelerators

VSaaS.ai supports multiple processing architectures. Choose the hardware that best fits your budget and requirements.

NVIDIA

GPU

NVIDIA Corporation

Disponible

Compatibilidad completa con GPUs NVIDIA para datacenter y edge. Soporte nativo de CUDA y TensorRT para inferencia de alto rendimiento.

Supported Models

Tesla T4A2A10L4L40Jetson Orin NanoJetson Orin NXAGX Orin

Tipo

GPU (CUDA)

Rango

Edge a Datacenter

Framework

TensorRT / CUDA

Consumo

15W - 300W

nvidia.com

Blaize

NPU

Blaize Inc.

Disponible

Graph Streaming Processor (GSP) diseñado para inferencia eficiente en edge. Arquitectura única que procesa grafos de redes neuronales de forma nativa.

Supported Models

Pathfinder P1600Xplorer X1600

Tipo

GSP (Graph Streaming)

Rango

Edge

Framework

Blaize AI Studio

Consumo

7W - 15W

blaize.com

Axelera

NPU

Axelera AI

Disponible

AI Processing Unit (AIPU) ultra-eficiente en energía. Tecnología de computación en memoria (in-memory computing) para máxima eficiencia por watt.

Supported Models

Metis AX2185Metis M.2

Tipo

AIPU (In-Memory)

Rango

Edge

Framework

Voyager SDK

Consumo

5W - 12W

axelera.ai

Hailo

NPU

Hailo Technologies

En Desarrollo

Procesadores de IA para edge con arquitectura de dataflow. Hailo-8 es el acelerador más costo-eficiente del mundo. Hailo-15 integra ISP + IA en un SoC para cámaras.

Supported Models

Hailo-8Hailo-8LHailo-10HHailo-15LHailo-15H

Tipo

Dataflow Processor

Rango

Edge / Cámaras

Framework

Hailo Software Suite

Consumo

2.5W - 8W

hailo.ai

DeepX

NPU

DEEPX Co.

En Desarrollo

Semiconductores de IA para edge que superan GPUs de 40W consumiendo solo 5W. Cuantización inteligente IQ8 con precisión FP32 y eficiencia INT8.

Supported Models

DX-M1DX-M1MDX-H1 QuattroDX-M2

Tipo

NPU (Neural Engine)

Rango

Edge / IoT

Framework

DXNN SDK

Consumo

5W - 15W

deepx.ai

CPU x86/ARM

CPU

Intel / AMD / ARM

Disponible

Soporte nativo para inferencia en CPU usando OpenVINO (Intel) y optimizaciones SIMD. Ideal para detecciones de baja complejidad sin hardware dedicado.

Supported Models

Intel Core i5/i7/i9Intel XeonAMD RyzenARM Cortex-A

Tipo

CPU (x86 / ARM)

Rango

Universal

Framework

OpenVINO / ONNX

Consumo

15W - 125W

Intelligent Orchestration

Every Detection to the Optimal Processor

The VSaaS.ai orchestrator analyzes each detection task and assigns it to the most cost-effective and performant processor. A single camera can generate multiple detections that are simultaneously distributed among different servers and cards.

Example: Vehicle Detection

A camera detects a car → the orchestrator distributes 5 tasks to 4 different servers

Cámara IP

Stream RTSP 1080p

Orquestador VSaaS

Asignación inteligente de tareas

Detectar Vehículo

YOLOv8-Vehicle (C)

GPU

NVIDIA T4

Servidor Edge A

12ms

Detectar Placa (LPR)

LPR-Net v3 (C)

NPU

Blaize P1600

Servidor Edge B

8ms

OCR Letras Placa

OCR-Plate v2 (C)

GPU

Cloud GPU

Servidor Cloud W

15ms

Detectar Color Auto

Color-Class v1 (C)

CPU

Intel Xeon

Servidor CPU H

5ms

Marca y Modelo

Vehicle-Attr v2 (C)

NPU

Axelera Metis

Servidor Edge C

10ms

Resultado Consolidado

Vehículo: Toyota Corolla Blanco | Placa: ABC-1234 | Confianza: 97%

Total: 50ms

5 detecciones paralelas

Competitive Cost

Simple detections (color, classification) run on inexpensive CPUs. Only complex detections use expensive GPUs. Result: up to 80% savings.

Horizontal Scalability

Add more cards of any type on demand. The orchestrator automatically redistributes the load among all available processors.

Resilience & Failover

If a server or card fails, the orchestrator reassigns detections to other available processors without service interruption.

Distributed Processing Architecture

How VSaaS.ai combines multiple processor types to create an efficient and cost-effective detection pipeline.

📹

IP Cameras

ONVIF / RTSP

1080p/4K Stream

Multiple sites

Any brand

⚡

Video Ingestion

VSaaS Engine

H.264/H.265 Decoding

Frame extraction

Pre-processing

🧠

AI Orchestrator

Task Scheduler

Complexity analysis

Card assignment

Load balancing

🔧

Processing

GPU / CPU / NPU

Native C models

Parallel inference

Multi-server

📊

Results

Events & Alerts

Consolidation

Notifications

Live dashboard

Not sure which hardware you need?

Our technical team can design the optimal processing architecture for your use case, combining GPUs, CPUs, and NPUs to maximize performance and minimize costs.

Talk to an Expert View Hardware