Image Mixer – Mix up concepts in multiple images and words to generate novel pictures

This model is based on the Stable Diffusion Image Variations model but it has been fined tuned to take multiple CLIP image embeddings. During training, up to 5 random crops were taken from the training images and the CLIP image embeddings were computed, these were then concatenated and used as the conditioning for the model. At inference time we can combine the image embeddings from multiple images to mix their concepts (and we can also use the text encoder to add text concepts too).
The model was trained on a subset of LAION Improved Aesthetics at a resolution of 640×640 and was trained using 8xA100 GPUs on Lambda GPU Cloud.
Help us find great AI content
Never miss a thing! Sign up for our AI Hackr newsletter to stay updated.
AI curated tools and resources. Find the best AI tools, reports, research entries, writing assistants, chrome extensions and GPT tools.
Leave a Reply