Advances in Diffusion Models for Image Generation: A Post-2024
1. Introduction
This survey will cover a range of topics, including theoretical developments, algorithmic improvements, and applications in various domains. Specifically, we will explore advancements in high-resolution image generation, video generation, and the theoretical understanding of generalization in diffusion models. The objective is to provide a comprehensive understanding of the evolution of diffusion models, meticulously exploring each layer [42]. We will also address open challenges and future research directions, aiming to stimulate forward-looking theories and methodologies for diffusion models [33]. The importance and timeliness of this survey stem from the rapid progress in diffusion models, which has led to an exponential growth of literature in this field [20,46]. Keeping up with the daily influx of new works on diffusion-based tools and applications across computer graphics, computer vision, and AI communities is a significant challenge [20]. This survey serves as a timely update on the rapidly evolving field [13].
Existing reviews often concentrate on specific areas like computer vision or medical imaging [35]. In contrast, this survey caters to a broader audience across multiple fields and provides a post-2024 perspective. While other surveys offer comprehensive overviews of diffusion models and act as good starting points [20,40], this survey emphasizes the latest advancements and emerging trends, highlighting the novelty and necessity of an updated review. This survey also aims to highlight system-level optimizations for diffusion model training, an area that complements the architectural and theoretical advancements typically covered.
The survey is organized to provide a clear and structured overview of the field. We begin by introducing the fundamental principles of diffusion models, followed by a discussion of key algorithmic innovations and theoretical advancements. We then explore the diverse applications of diffusion models in image generation, video generation, and other domains. Finally, we discuss the challenges and future research directions, providing insights into the potential of diffusion models for responsible and scalable use [22].
Throughout this survey, we will emphasize the importance of key performance metrics such as accuracy, robustness, and scalability in evaluating diffusion models [38]. The Fréchet Inception Distance (FID) score, for instance, is widely used to assess the quality and diversity of generated images [19]. The evolution of these metrics and the datasets used to train and evaluate diffusion models will also be discussed, providing a comprehensive understanding of the progress and challenges in the field.