AI Models on GPU, CPU & NPU
VSaaS.ai develops detection models compiled in C language for maximum computational performance. Our intelligent orchestration technology distributes each detection to the optimal processor — GPU, CPU, or NPU — reducing compute costs by up to 80%.
Detection Models in C Language
Unlike most platforms that run models in Python with interpreted frameworks, VSaaS.ai compiles its detection models directly into native C code. This eliminates interpreter overhead and enables machine-speed inference, making the most of every processor clock cycle.
# Inferencia típica en Python
import torch
model = torch.load("yolov8.pt")
model.eval()
# Overhead: intérprete + GIL + framework
with torch.no_grad():
results = model(frame) # ~45ms por frame
# RAM: ~2.1 GB por modelo
# Latencia: 45ms (GPU) / 180ms (CPU)// Inferencia VSaaS en C compilado
#include "vsaas_engine.h"
vsaas_model_t* model = vsaas_load(
"yolov8_vehicle.vsm",
VSAAS_TARGET_GPU | VSAAS_OPT_TENSORRT
);
// Sin overhead: ejecución directa en hardware
vsaas_result_t* res = vsaas_infer(
model, frame_ptr, width, height
); // ~12ms por frame
// RAM: ~210 MB por modelo
// Latencia: 12ms (GPU) / 35ms (CPU)Compatible AI Accelerators
VSaaS.ai supports multiple processing architectures. Choose the hardware that best fits your budget and requirements.
NVIDIA
GPUCompatibilidad completa con GPUs NVIDIA para datacenter y edge. Soporte nativo de CUDA y TensorRT para inferencia de alto rendimiento.
Blaize
NPUGraph Streaming Processor (GSP) diseñado para inferencia eficiente en edge. Arquitectura única que procesa grafos de redes neuronales de forma nativa.
Axelera
NPUAI Processing Unit (AIPU) ultra-eficiente en energía. Tecnología de computación en memoria (in-memory computing) para máxima eficiencia por watt.
Hailo
NPUProcesadores de IA para edge con arquitectura de dataflow. Hailo-8 es el acelerador más costo-eficiente del mundo. Hailo-15 integra ISP + IA en un SoC para cámaras.
DeepX
NPUSemiconductores de IA para edge que superan GPUs de 40W consumiendo solo 5W. Cuantización inteligente IQ8 con precisión FP32 y eficiencia INT8.
CPU x86/ARM
CPUSoporte nativo para inferencia en CPU usando OpenVINO (Intel) y optimizaciones SIMD. Ideal para detecciones de baja complejidad sin hardware dedicado.
Every Detection to the Optimal Processor
The VSaaS.ai orchestrator analyzes each detection task and assigns it to the most cost-effective and performant processor. A single camera can generate multiple detections that are simultaneously distributed among different servers and cards.
Example: Vehicle Detection
A camera detects a car → the orchestrator distributes 5 tasks to 4 different servers
Competitive Cost
Simple detections (color, classification) run on inexpensive CPUs. Only complex detections use expensive GPUs. Result: up to 80% savings.
Horizontal Scalability
Add more cards of any type on demand. The orchestrator automatically redistributes the load among all available processors.
Resilience & Failover
If a server or card fails, the orchestrator reassigns detections to other available processors without service interruption.
Distributed Processing Architecture
How VSaaS.ai combines multiple processor types to create an efficient and cost-effective detection pipeline.
IP Cameras
Video Ingestion
AI Orchestrator
Processing
Results
Not sure which hardware you need?
Our technical team can design the optimal processing architecture for your use case, combining GPUs, CPUs, and NPUs to maximize performance and minimize costs.
