Towards Training One-Step Diffusion Models Without Distillation

Mingtian Zhang*, Wenlin Chen*, Jiajun He*, Zijing Ou, José Miguel Hernández-Lobato, Bernhard Schölkopf, David Barber

April 2025

Abstract

Recent advances in training one-step diffusion models typically follow a two-stage pipeline: first training a teacher diffusion model and then distilling it into a one-step student model. This process often depends on both the teacher’s score function for supervision and its weights for initializing the student model. In this paper, we explore whether one-step diffusion models can be trained directly without this distillation procedure. We introduce a family of new training methods that entirely forgo teacher score supervision, yet outperforms most teacher-guided distillation approaches. This suggests that score supervision is not essential for effective training of one-step diffusion models. However, we find that initializing the student model with the teacher’s weights remains critical. Surprisingly, the key advantage of teacher initialization is not due to better latent-to-output mappings, but rather the rich set of feature representations across different noise levels that the teacher diffusion model provides. These insights take us one step closer towards training one-step diffusion models without distillation and provide a better understanding of the roles of teacher supervision and initialization in the distillation process.

Type

Preprint

Publication

Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy (DeLTa) at ICLR 2025

Diffusion

Towards Training One-Step Diffusion Models Without Distillation

Abstract

Wenlin Chen

Research Scientist

Related