Software based acceleration on Nvidia Jetson and Arm Cortex Platforms. Up to 30% speedup with no change in the model vs TensorRT or Arm NN. Up to 2x with model compression. Supports most state-of-the-art models in CV and NLP.
Software based acceleration on Intel and AMD CPUs and Nvidia T4/V100 Platforms. Up to 5x speedup with model compression. Supports most state-of-the-art models in CV and NLP.
Custom gpu/cpu kernels
If your deep neural network uses novel operators, chances are they are poorly supported by current inference/training frameworks. Work with us to maximize performance on CPUs and GPU via custom CUDA/C++ kernels.