• Author(s) : Daniel Geng, Inbum Park, Andrew Owens

The research paper “Factorized Diffusion: Perceptual Illusions by Noise Decomposition” introduces a revolutionary zero-shot method for controlling individual components of an image during the diffusion model sampling process. This innovative approach allows for the creation of hybrid images that change appearance based on various factors such as viewing distance, lighting conditions, or motion blurring.

The method works by decomposing an image into a sum of linear components, such as low and high spatial frequencies, grayscale and color components, or motion blur kernels. Each component can then be conditioned on different text prompts during the diffusion model sampling process. This results in the generation of hybrid images with unique properties and perceptual illusions.

For instance, by decomposing an image into low and high spatial frequencies and conditioning these components on different text prompts, the method can produce hybrid images that change appearance depending on the viewing distance. Similarly, decomposing an image into three frequency subbands allows for the generation of hybrid images with three distinct prompts.

The authors also explore the decomposition of images into grayscale and color components, which enables the creation of images that change appearance when viewed in grayscale. This phenomenon naturally occurs under dim lighting conditions and can be replicated using the proposed method. Another interesting application of this approach is the decomposition of images by a motion blur kernel. This technique produces images that change appearance when subjected to motion blurring, adding a dynamic element to the generated visuals.

The method achieves these results by denoising the image with a composite noise estimate, which is built from the components of noise estimates conditioned on different prompts. This approach allows for fine-grained control over the individual components of the generated image. Interestingly, the authors demonstrate that for certain decompositions, their method recovers prior approaches to compositional generation and spatial control. This highlights the versatility and potential of the proposed technique in various image generation tasks.

Furthermore, the research paper showcases the extension of this approach to generate hybrid images from real images. By holding one component fixed and generating the remaining components, the method effectively solves an inverse problem, allowing for the manipulation of existing images to create novel perceptual illusions.

In conclusion, the factorized diffusion method presented in this research paper offers a powerful and flexible tool for generating perceptual illusions through noise decomposition. By enabling the control of individual components in image generation, this approach opens up new possibilities for creating visually striking and dynamic images with unique properties. The potential applications of this technique span various fields, including computer graphics, visual effects, and psychological psychology.