CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
huggingface

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: huggingface/notebooks
Path: blob/main/diffusers/gluegen_stable_diffusion.ipynb
Views: 2932
Kernel: Unknown Kernel

GlueGen Stable Diffusion Pipeline

GlueGen is a minimal adapter that allows alignment between any encoder (Text Encoder of different language, Multilingual Roberta, AudioClip) and CLIP text encoder used in standard Stable Diffusion model. This method allows easy language adaptation to available english Stable Diffusion checkpoints without the need of an image captioning dataset as well as long training hours.

Make sure you downloaded gluenet_French_clip_overnorm_over3_noln.ckpt for French (there are also pre-trained weights for Chinese, Italian, Japanese, Spanish or train your own) at GlueGen's official repo. This script was contributed by Phạm Hồng Vinh and the notebook by Parag Ekbote

pip install diffusers transformers torch pillow sentencepiece
Requirement already satisfied: diffusers in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (0.32.2) Requirement already satisfied: transformers in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (4.48.3) Requirement already satisfied: torch in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (2.6.0) Requirement already satisfied: pillow in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (9.0.0) Requirement already satisfied: sentencepiece in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (0.2.0) Requirement already satisfied: importlib-metadata in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from diffusers) (8.6.1) Requirement already satisfied: filelock in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from diffusers) (3.17.0) Requirement already satisfied: huggingface-hub>=0.23.2 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from diffusers) (0.28.1) Requirement already satisfied: numpy in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from diffusers) (1.26.4) Requirement already satisfied: regex!=2019.12.17 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from diffusers) (2024.11.6) Requirement already satisfied: requests in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from diffusers) (2.32.3) Requirement already satisfied: safetensors>=0.3.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from diffusers) (0.5.2) Requirement already satisfied: packaging>=20.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from transformers) (24.2) Requirement already satisfied: pyyaml>=5.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from transformers) (6.0.2) Requirement already satisfied: tokenizers<0.22,>=0.21 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from transformers) (0.21.0) Requirement already satisfied: tqdm>=4.27 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from transformers) (4.67.1) Requirement already satisfied: typing-extensions>=4.10.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (4.12.2) Requirement already satisfied: networkx in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (3.4.2) Requirement already satisfied: jinja2 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (3.1.5) Requirement already satisfied: fsspec in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (2025.2.0) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.127 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.4.127) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.127 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.4.127) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.127 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.4.127) Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (9.1.0.70) Requirement already satisfied: nvidia-cublas-cu12==12.4.5.8 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.4.5.8) Requirement already satisfied: nvidia-cufft-cu12==11.2.1.3 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (11.2.1.3) Requirement already satisfied: nvidia-curand-cu12==10.3.5.147 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (10.3.5.147) Requirement already satisfied: nvidia-cusolver-cu12==11.6.1.9 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (11.6.1.9) Requirement already satisfied: nvidia-cusparse-cu12==12.3.1.170 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.3.1.170) Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (0.6.2) Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (2.21.5) Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.4.127) Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.127 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (12.4.127) Requirement already satisfied: triton==3.2.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (3.2.0) Requirement already satisfied: sympy==1.13.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from torch) (1.13.1) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from sympy==1.13.1->torch) (1.3.0) Requirement already satisfied: zipp>=3.20 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from importlib-metadata->diffusers) (3.21.0) Requirement already satisfied: MarkupSafe>=2.0 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from jinja2->torch) (3.0.2) Requirement already satisfied: charset-normalizer<4,>=2 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from requests->diffusers) (3.4.1) Requirement already satisfied: idna<4,>=2.5 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from requests->diffusers) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from requests->diffusers) (2.3.0) Requirement already satisfied: certifi>=2017.4.17 in /system/conda/miniconda3/envs/cloudspace/lib/python3.10/site-packages (from requests->diffusers) (2025.1.31) Note: you may need to restart the kernel to use updated packages.
import os import gc import urllib.request import torch from transformers import XLMRobertaTokenizer, XLMRobertaForMaskedLM, CLIPTokenizer, CLIPTextModel from diffusers import DiffusionPipeline # Download checkpoints CHECKPOINTS = [ "https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_Chinese_clip_overnorm_over3_noln.ckpt", "https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_French_clip_overnorm_over3_noln.ckpt", "https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_Italian_clip_overnorm_over3_noln.ckpt", "https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_Japanese_clip_overnorm_over3_noln.ckpt", "https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_Spanish_clip_overnorm_over3_noln.ckpt", "https://storage.googleapis.com/sfr-gluegen-data-research/checkpoints_all/gluenet_checkpoint/gluenet_sound2img_audioclip_us8k.ckpt" ] LANGUAGE_PROMPTS = { "French": "une voiture sur la plage", #"Chinese": "海滩上的一辆车", #"Italian": "una macchina sulla spiaggia", #"Japanese": "浜辺の車", #"Spanish": "un coche en la playa" } def download_checkpoints(checkpoint_dir): os.makedirs(checkpoint_dir, exist_ok=True) for url in CHECKPOINTS: filename = os.path.join(checkpoint_dir, os.path.basename(url)) if not os.path.exists(filename): print(f"Downloading {filename}...") urllib.request.urlretrieve(url, filename) print(f"Downloaded {filename}") else: print(f"Checkpoint {filename} already exists, skipping download.") return checkpoint_dir def load_checkpoint(pipeline, checkpoint_path, device): state_dict = torch.load(checkpoint_path, map_location=device) state_dict = state_dict.get("state_dict", state_dict) missing_keys, unexpected_keys = pipeline.unet.load_state_dict(state_dict, strict=False) return pipeline def generate_image(pipeline, prompt, device, output_path): with torch.inference_mode(): image = pipeline( prompt, generator=torch.Generator(device=device).manual_seed(42), num_inference_steps=50 ).images[0] image.save(output_path) print(f"Image saved to {output_path}") checkpoint_dir = download_checkpoints("./checkpoints_all/gluenet_checkpoint") device = "cuda" if torch.cuda.is_available() else "cpu" print(f"Using device: {device}") tokenizer = XLMRobertaTokenizer.from_pretrained("xlm-roberta-base", use_fast=False) model = XLMRobertaForMaskedLM.from_pretrained("xlm-roberta-base").to(device) inputs = tokenizer("Ceci est une phrase incomplète avec un [MASK].", return_tensors="pt").to(device) with torch.inference_mode(): _ = model(**inputs) clip_tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14") clip_text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device) # Initialize pipeline pipeline = DiffusionPipeline.from_pretrained( "stable-diffusion-v1-5/stable-diffusion-v1-5", text_encoder=clip_text_encoder, tokenizer=clip_tokenizer, custom_pipeline="gluegen", safety_checker=None ).to(device) os.makedirs("outputs", exist_ok=True) # Generate images for language, prompt in LANGUAGE_PROMPTS.items(): checkpoint_file = f"gluenet_{language}_clip_overnorm_over3_noln.ckpt" checkpoint_path = os.path.join(checkpoint_dir, checkpoint_file) try: pipeline = load_checkpoint(pipeline, checkpoint_path, device) output_path = f"outputs/gluegen_output_{language.lower()}.png" generate_image(pipeline, prompt, device, output_path) except Exception as e: print(f"Error processing {language} model: {e}") continue if torch.cuda.is_available(): torch.cuda.empty_cache() gc.collect()
Downloading ./checkpoints_all/gluenet_checkpoint/gluenet_Chinese_clip_overnorm_over3_noln.ckpt... Downloaded ./checkpoints_all/gluenet_checkpoint/gluenet_Chinese_clip_overnorm_over3_noln.ckpt Downloading ./checkpoints_all/gluenet_checkpoint/gluenet_French_clip_overnorm_over3_noln.ckpt... Downloaded ./checkpoints_all/gluenet_checkpoint/gluenet_French_clip_overnorm_over3_noln.ckpt Downloading ./checkpoints_all/gluenet_checkpoint/gluenet_Italian_clip_overnorm_over3_noln.ckpt... Downloaded ./checkpoints_all/gluenet_checkpoint/gluenet_Italian_clip_overnorm_over3_noln.ckpt Downloading ./checkpoints_all/gluenet_checkpoint/gluenet_Japanese_clip_overnorm_over3_noln.ckpt... Downloaded ./checkpoints_all/gluenet_checkpoint/gluenet_Japanese_clip_overnorm_over3_noln.ckpt Downloading ./checkpoints_all/gluenet_checkpoint/gluenet_Spanish_clip_overnorm_over3_noln.ckpt... Downloaded ./checkpoints_all/gluenet_checkpoint/gluenet_Spanish_clip_overnorm_over3_noln.ckpt Downloading ./checkpoints_all/gluenet_checkpoint/gluenet_sound2img_audioclip_us8k.ckpt... Downloaded ./checkpoints_all/gluenet_checkpoint/gluenet_sound2img_audioclip_us8k.ckpt Using device: cuda Initializing XLM-RoBERTa...
Some weights of the model checkpoint at xlm-roberta-base were not used when initializing XLMRobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight'] - This IS expected if you are initializing XLMRobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing XLMRobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Initializing CLIP models... Initializing Diffusion Pipeline...
Expected types for tokenizer: ['AutoTokenizer'], got CLIPTokenizer. Expected types for text_encoder: ['AutoModel'], got CLIPTextModel.
Processing French model...
Image saved to outputs/gluegen_output_french.png Processing complete.