MIT and NVIDIA launch faster image-generating AI tool
Researchers from MIT and NVIDIA have developed a new AI tool called HART, or hybrid autoregressive transformer, that can generate high-quality images much faster than current leading techniques. This new approach is about nine times quicker and requires less computer processing power compared to traditional diffusion models. HART combines two methods to create images. First, it uses an autoregressive model to capture the overall image quickly. Then, it employs a smaller diffusion model to refine the finer details. This combination makes HART effective for generating images used in training self-driving cars, which need realistic simulations to improve safety. Diffusion models, like Stable Diffusion and DALL-E, create detailed images but are often slow and energy-intensive. They work by repeatedly predicting and correcting noise in the image. In contrast, autoregressive models generate images more quickly by predicting small sections sequentially but usually produce lower-quality results. The researchers found that by using the autoregressive model first, then applying the diffusion model to fill in the details, they could significantly boost image quality. HART manages to achieve results comparable to larger diffusion models while using much less computation. The researchers also noted that HART can easily be integrated with new generative models that combine text and images. They plan to expand HART's capabilities for video generation and other applications in the future. This work was supported by several organizations, including the MIT-IBM Watson AI Lab and the U.S. National Science Foundation. It aims to explore new possibilities in AI-generated content.