Pruna AI open sources AI model optimization framework

Pruna AI, a startup from Europe, is making its AI model optimization framework available to the public for free starting Thursday. The company specializes in creating compression algorithms for AI models. The framework uses different efficiency techniques, including caching, pruning, quantization, and distillation. These methods help improve the performance of AI models while reducing their size. John Rachwan, the co-founder and CTO of Pruna AI, explained that their system also helps users save and load compressed models easily. Pruna AI’s framework can check how much quality is lost after compression and evaluate the benefits gained. Rachwan said their approach is similar to how Hugging Face improved the use and management of transformer models. Big AI companies, like OpenAI, already utilize compression methods. For example, OpenAI has used distillation to create faster versions of its models, such as GPT-4 Turbo. Currently, Pruna AI’s framework supports various types of models but is focusing on image and video generation for now. Notable companies using Pruna AI include Scenario and PhotoRoom. In addition to the open-source version, Pruna AI offers an enterprise version with more advanced features. A notable upcoming feature is a compression agent that can optimize models based on user requirements. Developers will simply specify their needs, and the agent will handle the rest. Pruna AI’s paid version is structured like renting a GPU, allowing companies to save on model inference costs. Their compression framework already reduced the Llama model’s size significantly without losing much quality. Pruna AI recently raised $6.5 million in seed funding from various investors.

Pruna AI open sources AI model optimization framework

More on this topic:

More on this topic: