Artificial Intelligence

Hugging Face inference-as-a-service capabilities with NVIDIA NIM

29th July 2024
Harry Fowle
0

Hugging Face will now offer developers new inference-as-a-service capabilities which are powered by NVIDIA NIM.

The Hugging Face platform is gaining easy access to NVIDIA-accelerated inference on some of the most popular AI models.

New inference-as-a-service capabilities will enable developers to rapidly deploy leading large language models such as the Llama 3 family and Mistral AI models with optimisation from NVIDIA NIM microservices running on NVIDIA DGX Cloud.

Announced at the SIGGRAPH conference, the service will help developers quickly prototype with open-source AI models hosted on the Hugging Face Hub and deploy them in production. Hugging Face Enterprise Hub users can tap serverless inference for increased flexibility, minimal infrastructure overhead, and optimised performance with NVIDIA NIM.

The inference service complements Train on DGX Cloud, an AI training service already available on Hugging Face.

Developers facing a growing number of open-source models can benefit from a hub where they can easily compare options. These training and inference tools give Hugging Face developers new ways to experiment with, test and deploy cutting-edge models on NVIDIA- accelerated infrastructure. They’re made easily accessible using the “Train” and “Deploy” drop-down menus on Hugging Face model cards, letting users get started with just a few clicks.

Beyond a token gesture — NVIDIA NIM brings big benefits

NVIDIA NIM is a collection of AI microservices — including NVIDIA AI foundation models and open-source community models — optimised for inference using industry-standard application programming interfaces, or APIs.

NIM offers users higher efficiency in processing tokens — the units of data used and generated by a language model. The optimised microservices also improve the efficiency of the underlying NVIDIA DGX Cloud infrastructure, which can increase the speed of critical AI applications.

This means developers see faster, more robust results from an AI model accessed as a NIM compared with other versions of the model. The 70-billion-parameter version of Llama 3, for example, delivers up to 5x higher throughput when accessed as a NIM compared with off-the-shelf deployment on NVIDIA H100 Tensor Core GPU-powered systems.

Near-instant access to DGX Cloud provides accessible AI acceleration

The NVIDIA DGX Cloud platform is purpose-built for generative AI, offering developers easy access to reliable accelerated computing infrastructure that can help them bring production-ready applications to market faster.

The platform provides scalable GPU resources that support every step of AI development, from prototype to production, without requiring developers to make long-term AI infrastructure commitments.

Hugging Face inference-as-a-service on NVIDIA DGX Cloud powered by NIM microservices offers easy access to compute resources that are optimised for AI deployment, enabling users to experiment with the latest AI models in an enterprise-grade environment.

Featured products

Upcoming Events

View all events
Newsletter
Latest global electronics news
© Copyright 2024 Electronic Specifier