VSaaS.ai - Plataforma de Video Analítica con IAvsaas.ai
Back to Technology
AI Compatibility

AI Models on GPU, CPU & NPU

VSaaS.ai develops detection models compiled in C language for maximum computational performance. Our intelligent orchestration technology distributes each detection to the optimal processor — GPU, CPU, or NPU — reducing compute costs by up to 80%.

5+
Manufacturers
C/C++
Native Models
3
Types: GPU/CPU/NPU
80%
Compute Savings
Rendimiento Nativo

Detection Models in C Language

Unlike most platforms that run models in Python with interpreted frameworks, VSaaS.ai compiles its detection models directly into native C code. This eliminates interpreter overhead and enables machine-speed inference, making the most of every processor clock cycle.

Native Compilation
Models are compiled to machine code optimized for each processor architecture (x86, ARM, CUDA, NPU).
Zero Runtime Overhead
We eliminate the Python/PyTorch layer in production. The model runs directly on the hardware with no intermediaries.
Architecture-Specific Optimization
Each model is compiled with specific instructions: AVX-512 for CPU, PTX for NVIDIA, native instructions for NPU.
Lower Memory Consumption
C-compiled binaries use up to 10x less RAM than their Python equivalents, allowing more models per server.
Traditional Approach — Python
# Inferencia típica en Python
import torch
model = torch.load("yolov8.pt")
model.eval()

# Overhead: intérprete + GIL + framework
with torch.no_grad():
    results = model(frame)  # ~45ms por frame
    
# RAM: ~2.1 GB por modelo
# Latencia: 45ms (GPU) / 180ms (CPU)
VSaaS.ai — Native C
// Inferencia VSaaS en C compilado
#include "vsaas_engine.h"

vsaas_model_t* model = vsaas_load(
    "yolov8_vehicle.vsm",
    VSAAS_TARGET_GPU | VSAAS_OPT_TENSORRT
);

// Sin overhead: ejecución directa en hardware
vsaas_result_t* res = vsaas_infer(
    model, frame_ptr, width, height
);  // ~12ms por frame

// RAM: ~210 MB por modelo
// Latencia: 12ms (GPU) / 35ms (CPU)
Latencia GPU
45ms
12ms
↑ 3.7x faster
Latencia CPU
180ms
35ms
↑ 5.1x faster
RAM por Modelo
2.1 GB
210 MB
↑ 10x faster
Compatible Hardware

Compatible AI Accelerators

VSaaS.ai supports multiple processing architectures. Choose the hardware that best fits your budget and requirements.

NVIDIA

GPU
NVIDIA Corporation
Disponible

Compatibilidad completa con GPUs NVIDIA para datacenter y edge. Soporte nativo de CUDA y TensorRT para inferencia de alto rendimiento.

Supported Models
Tesla T4A2A10L4L40Jetson Orin NanoJetson Orin NXAGX Orin
Tipo
GPU (CUDA)
Rango
Edge a Datacenter
Framework
TensorRT / CUDA
Consumo
15W - 300W
nvidia.com

Blaize

NPU
Blaize Inc.
Disponible

Graph Streaming Processor (GSP) diseñado para inferencia eficiente en edge. Arquitectura única que procesa grafos de redes neuronales de forma nativa.

Supported Models
Pathfinder P1600Xplorer X1600
Tipo
GSP (Graph Streaming)
Rango
Edge
Framework
Blaize AI Studio
Consumo
7W - 15W
blaize.com

Axelera

NPU
Axelera AI
Disponible

AI Processing Unit (AIPU) ultra-eficiente en energía. Tecnología de computación en memoria (in-memory computing) para máxima eficiencia por watt.

Supported Models
Metis AX2185Metis M.2
Tipo
AIPU (In-Memory)
Rango
Edge
Framework
Voyager SDK
Consumo
5W - 12W
axelera.ai

Hailo

NPU
Hailo Technologies
En Desarrollo

Procesadores de IA para edge con arquitectura de dataflow. Hailo-8 es el acelerador más costo-eficiente del mundo. Hailo-15 integra ISP + IA en un SoC para cámaras.

Supported Models
Hailo-8Hailo-8LHailo-10HHailo-15LHailo-15H
Tipo
Dataflow Processor
Rango
Edge / Cámaras
Framework
Hailo Software Suite
Consumo
2.5W - 8W
hailo.ai

DeepX

NPU
DEEPX Co.
En Desarrollo

Semiconductores de IA para edge que superan GPUs de 40W consumiendo solo 5W. Cuantización inteligente IQ8 con precisión FP32 y eficiencia INT8.

Supported Models
DX-M1DX-M1MDX-H1 QuattroDX-M2
Tipo
NPU (Neural Engine)
Rango
Edge / IoT
Framework
DXNN SDK
Consumo
5W - 15W
deepx.ai

CPU x86/ARM

CPU
Intel / AMD / ARM
Disponible

Soporte nativo para inferencia en CPU usando OpenVINO (Intel) y optimizaciones SIMD. Ideal para detecciones de baja complejidad sin hardware dedicado.

Supported Models
Intel Core i5/i7/i9Intel XeonAMD RyzenARM Cortex-A
Tipo
CPU (x86 / ARM)
Rango
Universal
Framework
OpenVINO / ONNX
Consumo
15W - 125W
Intelligent Orchestration

Every Detection to the Optimal Processor

The VSaaS.ai orchestrator analyzes each detection task and assigns it to the most cost-effective and performant processor. A single camera can generate multiple detections that are simultaneously distributed among different servers and cards.

Example: Vehicle Detection

A camera detects a car → the orchestrator distributes 5 tasks to 4 different servers

Cámara IP
Stream RTSP 1080p
Orquestador VSaaS
Asignación inteligente de tareas
1
Detectar Vehículo
YOLOv8-Vehicle (C)
GPU
NVIDIA T4
Servidor Edge A
12ms
2
Detectar Placa (LPR)
LPR-Net v3 (C)
NPU
Blaize P1600
Servidor Edge B
8ms
3
OCR Letras Placa
OCR-Plate v2 (C)
GPU
Cloud GPU
Servidor Cloud W
15ms
4
Detectar Color Auto
Color-Class v1 (C)
CPU
Intel Xeon
Servidor CPU H
5ms
5
Marca y Modelo
Vehicle-Attr v2 (C)
NPU
Axelera Metis
Servidor Edge C
10ms
Resultado Consolidado
Vehículo: Toyota Corolla Blanco | Placa: ABC-1234 | Confianza: 97%
Total: 50ms
5 detecciones paralelas

Competitive Cost

Simple detections (color, classification) run on inexpensive CPUs. Only complex detections use expensive GPUs. Result: up to 80% savings.

Horizontal Scalability

Add more cards of any type on demand. The orchestrator automatically redistributes the load among all available processors.

Resilience & Failover

If a server or card fails, the orchestrator reassigns detections to other available processors without service interruption.

Distributed Processing Architecture

How VSaaS.ai combines multiple processor types to create an efficient and cost-effective detection pipeline.

📹

IP Cameras

ONVIF / RTSP
1080p/4K Stream
Multiple sites
Any brand

Video Ingestion

VSaaS Engine
H.264/H.265 Decoding
Frame extraction
Pre-processing
🧠

AI Orchestrator

Task Scheduler
Complexity analysis
Card assignment
Load balancing
🔧

Processing

GPU / CPU / NPU
Native C models
Parallel inference
Multi-server
📊

Results

Events & Alerts
Consolidation
Notifications
Live dashboard

Not sure which hardware you need?

Our technical team can design the optimal processing architecture for your use case, combining GPUs, CPUs, and NPUs to maximize performance and minimize costs.