Fast Stable Diffusion

Fast Stable Diffusion

I’ve been playing around with Stable Diffusion recently - from just playing with prompt, to trying out Textual Inversion and Dreambooth.

I’ve opted not to use services or Web GUI’s - not because I dislike them, but rather because I was looking to learn about diffusion models and the infrastructure to use them, and get amazing pictures as a side effect. Stable diffusion has this great property : it can fit in a lot of consumer grade hardware. Specifically, if you have a GPU with about 14GB of memory you can run SD relatively quickly, and indeed there are optimised versions that allow you to run it on even smaller amounts of memory. You can even run it on an M1 or M2 Mac - no external GPU required, no DALLE2 credits.

Since I knew I wanted to both run (inference) and train (fine tune) a model myself plenty of times, I started looking for optimisations to increase throughput. This led me to two options xformers and work from Chris RĂ©’s lab Hazy research (I’m a fan of Chris from his work on Snorkel). I chose the the Flash Attention route from Hazy Research.

In this post, I will share the steps I took to improve the speed and stability of diffusion on an NVIDIA A100 GPU. This is the first step in my journey to achieve the fastest stable diffusion possible on this GPU. My next step will be to experiment with DeepSpeed.

For now though, the following steps get you very fast transformers (which are used in SD):

  1. Upgrade Cuda to 11.8 (assumes you have a machine with an older version of cuda)
  • check pytorch compatibility here: https://pytorch.org/get-started/locally/

  • Also make sure you have the right GPU and driver, you can find driver <> cuda compatibility here if you need

  • make sure you have cleaned up previous version of CUDA! You can check which versions you have locally and remotely using this

  • add Keys from NVIDIA, they rotated their keys in April so this was a bit fiddly:

  • sudo apt-key adv --fetch-keys 	https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub       
    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub
    
  • download the highest compatible version of the cuda toolkit and install it

  • after installation, set your path and LD_LIBRARY_PATH

  • export PATH="/usr/local/cuda-11.8/bin:$PATH"
    export LD_LIBRARY_PATH="/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH"
    
  • I ran into an issue - missing Python.h. Solution can be found here

    • sudo apt-get install python3.8-dev build-essential
  1. Clone the flash attention repo and build it

    git clone https://github.com/HazyResearch/flash-attention.git
    cd flash-attention
    git submodule init
    git submodule update
    python setup.py install
    cd ..
    
  2. Clone the HazyResearch repo of diffusers and build it

    git clone https://github.com/HazyResearch/diffusers.git
    cd diffusers
    pip install --prefix=~/.local -e .