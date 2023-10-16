Probabilistic diffusion models have become the standard for generative modeling in continuous domains, particularly in text-to-image generation. Among these models, DALLE has gained significant attention for its ability to generate images by training on large-scale datasets. However, controlling the behavior of these unsupervised models in downstream tasks has proven to be a challenging endeavor.

In response to this challenge, researchers have attempted to fine-tune diffusion models using reinforcement learning techniques. However, this approach is known for its high variance in gradient estimators. To address this issue, a new paper introduces a method called “AlignProp” that aligns diffusion models with downstream reward functions through end-to-end backpropagation of the reward gradient during the denoising process.

AlignProp offers an innovative approach that mitigates the high memory requirements typically associated with backpropagation through modern text-to-image models. It achieves this by fine-tuning low-rank adapter weight modules and implementing gradient checkpointing.

The performance of AlignProp has been evaluated through various objectives, including image-text semantic alignment, aesthetics, image compressibility, and controllability of the number of objects in generated images. The results show that AlignProp outperforms alternative methods by achieving higher rewards in fewer training steps. Moreover, its conceptual simplicity makes it a straightforward choice for optimizing diffusion models based on differentiable reward functions.

By utilizing gradients obtained from the reward function, AlignProp improves both sampling efficiency and computational effectiveness in fine-tuning diffusion models. The experiments consistently demonstrate the effectiveness of AlignProp in optimizing a wide range of reward functions, even for tasks that are difficult to define solely through prompts.

The future research direction for AlignProp involves extending these principles to diffusion-based language models, with the aim of improving their alignment with human feedback.

(Source: Research paper on AlignProp for fine-tuning diffusion models)