With Workers AI now generally available, Cloudflare is the first serverless inference partner integrated on the Hugging Face Hub for deploying models, enabling developers to quickly, easily, and affordably deploy AI globally, without managing infrastructure or paying for unused compute capacity.
Cloudflare has announced that developers can now deploy AI applications on Cloudflare’s global network in one simple click directly from Hugging Face. The integration of Workers AI and Hugging Face makes deploying serverless AI easier and more affordable than ever.
“Workers AI is one of the most affordable and accessible solutions to run inference. And with Hugging Face and Cloudflare both deeply aligned in our efforts to democratize AI in a simple, affordable way, we’re giving developers the freedom and agility to choose a model and scale their AI apps from zero to global in an instant,” said Matthew Prince, CEO and co-founder, Cloudflare.
Cloudflare powers one-click deployment with Hugging Face With Workers AI generally available, developers can now deploy AI models in one click directly from Hugging Face, for the fastest way to access a variety of models and run inference requests on Cloudflare’s global network of GPUs. Developers can choose one of the popular open source models and then simply click “Deploy to Cloudflare Workers AI” to deploy a model instantly. There are 14 curated Hugging Face models now optimized for Cloudflare’s global serverless inference platform, supporting three different task categories including text generation, embeddings, and sentence similarity.
“We are excited to work with Cloudflare to make AI more accessible to developers. Offering the most popular open models with a serverless API, powered by a global fleet of GPUs, is an amazing proposition for the Hugging Face community, and I can’t wait to see what they build with it,” said Julien Chaumond, co-founder and Chief Technology Officer, Hugging Face.
Today, Workers AI is generally available, providing the end-to-end infrastructure needed to scale and deploy AI models efficiently and affordably for the next era of AI applications. Cloudflare now has GPUs deployed across more than 150 cities globally, most recently launching in Cape Town, Durban, Johannesburg, and Lagos for the first locations in Africa, as well as Amman, Buenos Aires, Mexico City, Mumbai, New Delhi, and Seoul, to provide low-latency inference around the world. Workers AI is also expanding to support fine-tuned model weights, enabling organizations to build and deploy more specialized, domain-specific applications.