Citizendeveloper.codes!

GANs vs. VAEs: A Guide to Generative AI with Python Code Examples

author image
Amit Puri

Advisor and Consultant

Posted on 31-Oct-2024, 13 min(s) read

GANs vs. VAEs: A Guide to Generative AI with Python Code Examples

Generative AI has revolutionized how we create, manipulate, and understand data, and at the forefront of this innovation are Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). Both models have proven to be invaluable tools in fields ranging from digital art and healthcare to recommendation systems and entertainment, but each brings its own unique strengths and limitations.

In this guide, we’ll dive deep into GANs and VAEs, exploring how they work, their key differences and similarities, and the best ways to implement them in Python. Whether you’re a tech enthusiast or a beginner, understanding GANs and VAEs will help you unlock the potential of generative AI.


Overview: GANs vs. VAEs

Before diving into the technical details, let’s break down GANs and VAEs at a high level. Both models are generative, meaning they’re designed to create new data points from the distribution they learn during training. They can produce entirely new images, sounds, or even synthetic data that resembles real-world examples, but they approach this task in fundamentally different ways.

GANs: Think of GANs as creative artists in a competitive environment. They use a game-like structure, where a Generator tries to create realistic samples, and a Discriminator tries to distinguish them from real samples. This adversarial process pushes GANs to produce high-quality, realistic images.

VAEs: VAEs are more like architects who carefully design a structured, interpretable space (latent space) where each point represents a unique variation of data. This latent space makes VAEs incredibly useful for applications that require control over the features in generated data, such as personalized recommendations.


How GANs and VAEs Work: Key Components and Processes

To understand the strengths and limitations of each model, let’s break down how GANs and VAEs function at a structural level.

Generative Adversarial Networks (GANs)

GANs are composed of two primary components: 1. Generator: This network creates fake data samples from random noise. The Generator’s goal is to produce samples that are as close to the real data distribution as possible. 2. Discriminator: The Discriminator acts as a classifier, evaluating both real and generated samples and identifying which ones are “fake” (generated by the Generator) and which are “real” (from the training dataset).

GANs are trained using an adversarial process. The Generator tries to fool the Discriminator by generating increasingly realistic samples, while the Discriminator improves at distinguishing between real and fake. Over time, the Generator’s outputs become convincingly realistic.

Variational Autoencoders (VAEs)

VAEs consist of two main parts: 1. Encoder: This network compresses input data into a lower-dimensional latent space. In this process, it outputs two parameters: mean and variance, which define the distribution of the data in the latent space. 2. Decoder: The Decoder reconstructs data from the latent space, turning compressed representations back into outputs that resemble the original data.

Instead of relying on a competition between two networks, VAEs use a probabilistic approach. They map data into a smooth latent space where points close together represent similar data. This gives VAEs excellent control over generated data, making them ideal for tasks like personalized recommendations and anomaly detection.


Key Differences Between GANs and VAEs

Understanding the differences between GANs and VAEs is crucial for selecting the right model for your task. Here are some of the most important distinctions:

| Feature | GANs | VAEs | |-----------------------|-------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------| | Training Process | Adversarial, involving two networks (Generator and Discriminator) that compete to improve each other. | Likelihood-based, with a single objective function focusing on reconstruction accuracy and latent space structure. | | Output Quality | Produces high-quality, sharp, and realistic outputs. | Generally produces less sharp, sometimes slightly blurry images due to pixel-wise averaging. | | Latent Space | Unstructured and hard to interpret, making it challenging to control specific features. | Structured and smooth, allowing easy manipulation of features and variations in the output. | | Ease of Training | Difficult to train due to adversarial loss, prone to instability and mode collapse. | Easier to train with a more straightforward optimization process and stable convergence. | | Use Cases | High-quality image generation, virtual models, realistic textures, creative content. | Tasks needing data control, such as anomaly detection, synthetic data generation, and personalized recommendations. |

Let’s expand on each of these to clarify the unique benefits each model brings to generative AI.

1. Training Process: Adversarial vs. Likelihood-Based

  • GANs use a competitive setup where the Generator and Discriminator push each other to improve. While this approach produces stunningly realistic results, it can be tricky to train. GANs often face issues like mode collapse, where the Generator repeatedly produces only a small variety of images, and vanishing gradients, where updates become ineffective.
  • VAEs, on the other hand, are trained using a likelihood-based objective, making their optimization more straightforward and stable. This probabilistic approach means they don’t rely on an adversarial game, leading to more predictable and easier-to-control training.

2. Output Quality: Realism vs. Structured Variety

  • GANs excel at generating photorealistic images with high detail. Their adversarial process encourages the Generator to create outputs that could easily pass for real data, making GANs the model of choice for applications where image quality is paramount, like fashion, gaming, or digital marketing.
  • VAEs produce outputs that might be slightly blurrier than GANs, as they optimize for pixel-wise similarity rather than adversarial feedback. However, VAEs offer structured diversity, making them ideal for tasks that prioritize controlled exploration over high-fidelity realism, such as creating varied synthetic datasets.

3. Latent Space: Control and Interpretability

  • GANs lack a well-defined latent space, making it hard to control specific aspects of generated samples. For instance, altering specific features, like changing the color of a GAN-generated shirt, would be challenging.
  • VAEs create a structured latent space, where nearby points in the latent space correspond to similar data in the original space. This allows for easy manipulation, interpolation, and feature control, which is particularly useful for applications like recommendation systems.

4. Ease of Training: High Maintenance vs. Stability

  • GANs are notoriously hard to train. The adversarial setup makes it difficult to achieve a stable equilibrium between the Generator and Discriminator, often requiring careful tuning and troubleshooting.
  • VAEs are easier to train due to their stable likelihood-based objective, resulting in fewer issues and allowing for faster experimentation. This makes VAEs a go-to for applications where interpretability and stable convergence are more valuable than maximum realism.

Key Similarities Between GANs and VAEs

Despite their differences, GANs and VAEs share several fundamental similarities. Here’s what they have in common:

  1. Generative Nature: Both GANs and VAEs are generative models, meaning they learn the underlying distribution of data and can generate new, similar data points.
  2. Neural Network-Based: Both are based on neural networks, with GANs using a generator-discriminator structure and VAEs using an encoder-decoder structure.
  3. Latent Space Usage: Both models rely on a latent space, a compressed representation of data that serves as a “creative playground” where new data points are sampled.
  4. Non-linear Activations: GANs and VAEs both use non-linear activations, like ReLU or LeakyReLU, in their layers to model complex data distributions.
  5. New Sample Generation: Both models can generate new data samples that didn’t exist in the original dataset, opening up possibilities in image synthesis, synthetic data generation, and personalized content creation.

When to Use GANs vs. VAEs: Real-World Scenarios

Knowing when to use GANs or VAEs is essential for optimizing results. Here are some practical examples:

  • When to Use GANs:
  • Fashion and E-commerce: GANs can generate realistic images of virtual models, reducing the need for extensive photo shoots.
  • Game Development: GANs produce high-resolution textures and characters for immersive environments.
  • Digital Marketing: Marketers can use GANs to generate visually appealing banners and ads quickly.

  • When to Use VAEs:

  • Healthcare: VAEs are great for generating synthetic data, such as additional cell images for training diagnostic AI tools.
  • Recommendation Systems: VAEs create diverse suggestions, helping recommendation engines balance familiarity and novelty.
  • Anomaly Detection: VAEs excel at learning a smooth latent space, making them ideal for detecting irregularities in financial data or security logs.

Implementing GANs and VAEs in Python

Here’s a quick look at how to get started with GANs and VAEs in Python, using TensorFlow/Keras.

1. Building a GAN in Python

from tensorflow.keras.layers import Dense, Reshape, LeakyReLU, Flatten
from tensorflow.keras import Sequential

# Generator
def build_generator():
    model = Sequential()
    model.add(Dense(128, input_dim=100, activation=LeakyReLU(alpha=0.2)))
    model.add(Dense(256, activation=LeakyReLU

(alpha=0.2)))
    model.add(Dense(512, activation=LeakyReLU(alpha=0.2)))
    model.add(Dense(784, activation='tanh'))
    model.add(Reshape((28, 28, 1)))
    return model

# Discriminator
def build_discriminator():
    model = Sequential()
    model.add(Flatten(input_shape=(28, 28, 1)))
    model.add(Dense(512, activation=LeakyReLU(alpha=0.2)))
    model.add(Dense(1, activation='sigmoid'))
    return model

2. Building a VAE in Python

from tensorflow.keras.layers import Lambda, Input
import tensorflow as tf

# Encoder
def build_encoder():
    inputs = Input(shape=(28, 28, 1))
    x = Flatten()(inputs)
    x = Dense(256, activation='relu')(x)
    mean = Dense(2)(x)
    log_var = Dense(2)(x)
    return tf.keras.Model(inputs, [mean, log_var])

# Decoder
def build_decoder():
    latent_inputs = Input(shape=(2,))
    x = Dense(256, activation='relu')(latent_inputs)
    x = Dense(784, activation='sigmoid')(x)
    return tf.keras.Model(latent_inputs, x)

Let's walk through a complete example of both a GAN and a VAE in Python. We'll use the MNIST dataset of handwritten digits to keep things simple, as it's widely used and easy to understand.

For these examples, we'll use TensorFlow and Keras, popular frameworks in Python for building and training neural networks. Here’s what we’ll cover:

  1. Building and training a GAN to generate new images of handwritten digits.
  2. Building and training a VAE to learn and reconstruct images of handwritten digits with control over the latent space.

Setting Up the Environment

To get started, make sure you have the required libraries installed:

pip install tensorflow numpy matplotlib

Import the necessary libraries:

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Reshape, Flatten, LeakyReLU, BatchNormalization, Input, Lambda
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt

1. Generative Adversarial Network (GAN)

Let’s start with a GAN. Our goal here is to generate new images of handwritten digits by training a Generator to produce images that fool a Discriminator into thinking they’re real.

Step 1: Prepare the Data

Load and preprocess the MNIST data:

# Load the MNIST dataset
(x_train, _), (_, _) = mnist.load_data()
x_train = (x_train.astype(np.float32) - 127.5) / 127.5  # Scale images to [-1, 1]
x_train = np.expand_dims(x_train, axis=-1)  # Reshape for compatibility

Step 2: Define the GAN Components

  1. Generator: Takes random noise as input and generates an image.
  2. Discriminator: Determines if an image is real (from the dataset) or fake (from the Generator).
# Generator Model
def build_generator():
    model = Sequential([
        Dense(128, input_dim=100),
        LeakyReLU(0.2),
        BatchNormalization(),
        Dense(256),
        LeakyReLU(0.2),
        BatchNormalization(),
        Dense(512),
        LeakyReLU(0.2),
        BatchNormalization(),
        Dense(784, activation='tanh'),
        Reshape((28, 28, 1))
    ])
    return model

# Discriminator Model
def build_discriminator():
    model = Sequential([
        Flatten(input_shape=(28, 28, 1)),
        Dense(512),
        LeakyReLU(0.2),
        Dense(256),
        LeakyReLU(0.2),
        Dense(1, activation='sigmoid')
    ])
    return model

Step 3: Compile the Models

Compile the Discriminator first, then combine the Generator and Discriminator to create the GAN.

# Build and compile the Discriminator
discriminator = build_discriminator()
discriminator.compile(optimizer=tf.keras.optimizers.Adam(0.0002, 0.5), loss='binary_crossentropy', metrics=['accuracy'])

# Build the Generator
generator = build_generator()

# Create and compile the GAN
z = Input(shape=(100,))
img = generator(z)
discriminator.trainable = False  # Only train the generator in the combined model
validity = discriminator(img)

gan = Model(z, validity)
gan.compile(optimizer=tf.keras.optimizers.Adam(0.0002, 0.5), loss='binary_crossentropy')

Step 4: Train the GAN

Define a function to train the GAN.

def train_gan(epochs, batch_size=128, sample_interval=1000):
    real = np.ones((batch_size, 1))
    fake = np.zeros((batch_size, 1))

    for epoch in range(epochs):
        # Train Discriminator
        idx = np.random.randint(0, x_train.shape[0], batch_size)
        real_imgs = x_train[idx]

        noise = np.random.normal(0, 1, (batch_size, 100))
        gen_imgs = generator.predict(noise)

        d_loss_real = discriminator.train_on_batch(real_imgs, real)
        d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
        d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

        # Train Generator
        noise = np.random.normal(0, 1, (batch_size, 100))
        g_loss = gan.train_on_batch(noise, real)

        # Print progress
        if epoch % sample_interval == 0:
            print(f"{epoch} [D loss: {d_loss[0]}] [G loss: {g_loss}]")
            sample_images(epoch)

# Helper function to save generated images
def sample_images(epoch, rows=5, cols=5):
    noise = np.random.normal(0, 1, (rows * cols, 100))
    gen_imgs = generator.predict(noise)
    gen_imgs = 0.5 * gen_imgs + 0.5  # Rescale to [0, 1]

    fig, axs = plt.subplots(rows, cols)
    cnt = 0
    for i in range(rows):
        for j in range(cols):
            axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray')
            axs[i, j].axis('off')
            cnt += 1
    plt.show()

# Train the GAN
train_gan(epochs=10000, batch_size=64, sample_interval=2000)

2. Variational Autoencoder (VAE)

Now, let’s implement a VAE to learn and reconstruct images of handwritten digits while providing a structured latent space.

Step 1: Define the Encoder and Decoder

The Encoder compresses the data into a latent space, while the Decoder reconstructs the data from the latent space.

# Encoder
def build_encoder():
    inputs = Input(shape=(28, 28, 1))
    x = Flatten()(inputs)
    x = Dense(256, activation='relu')(x)
    z_mean = Dense(2)(x)
    z_log_var = Dense(2)(x)
    return Model(inputs, [z_mean, z_log_var])

# Sampling Layer
def sampling(args):
    z_mean, z_log_var = args
    epsilon = tf.keras.backend.random_normal(shape=(tf.keras.backend.shape(z_mean)[0], 2))
    return z_mean + tf.keras.backend.exp(0.5 * z_log_var) * epsilon

# Decoder
def build_decoder():
    latent_inputs = Input(shape=(2,))
    x = Dense(256, activation='relu')(latent_inputs)
    x = Dense(784, activation='sigmoid')(x)
    outputs = Reshape((28, 28, 1))(x)
    return Model(latent_inputs, outputs)

Step 2: Build the VAE Model

Connect the Encoder, Sampling layer, and Decoder to form the complete VAE.

# Instantiate Encoder and Decoder
encoder = build_encoder()
decoder = build_decoder()

# VAE Model
inputs = Input(shape=(28, 28, 1))
z_mean, z_log_var = encoder(inputs)
z = Lambda(sampling, output_shape=(2,))([z_mean, z_log_var])
outputs = decoder(z)
vae = Model(inputs, outputs)

Step 3: Define the VAE Loss

The VAE loss is composed of a reconstruction loss and a KL divergence regularization term.

# VAE Loss Function
reconstruction_loss = tf.keras.losses.binary_crossentropy(inputs, outputs)
reconstruction_loss *= 784  # Scale by image size
kl_loss = -0.5 * tf.reduce_sum(1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var), axis=-1)
vae_loss = tf.reduce_mean(reconstruction_loss + kl_loss)
vae.add_loss(vae_loss)
vae.compile(optimizer='adam')

Step 4: Train the VAE

Train the VAE on the MNIST dataset and use the Decoder to generate new images.

# Train VAE
vae.fit(x_train, x_train, epochs=50, batch_size=128)

# Visualize Results
def plot_latent_space(decoder, n=15, digit_size=28):
    figure = np.zeros((digit_size * n, digit_size * n))
    grid_x = np.linspace(-3, 3, n)
    grid_y = np.linspace(-3, 3, n)

    for i, yi in enumerate(grid_y):
        for j, xi in enumerate(grid_x):
            z_sample = np.array([[xi, yi]])
            x_decoded = decoder.predict(z_sample)
            digit = x_decoded[0].reshape(digit_size, digit_size)
            figure[i * digit_size: (i + 1) * digit_size,
                   j * digit_size: (j + 1) * digit_size] = digit

    plt.figure(figsize=(10, 10))
    plt.imshow(figure, cmap='Greys_r')
    plt.axis('off')
    plt.show()

# Plot the latent space
plot_latent_space(decoder)

Summary

This tutorial demonstrated how to implement both a GAN and a VAE to generate and reconstruct images, respectively. Here’s a quick recap of what each model did:

  • GAN: Generated new images by training a Generator against a Discriminator in an adversarial game. This is suitable for high-quality image generation.
  • VAE: Learned a latent representation of images, which allowed for controlled image generation and reconstruction.

This complete walkthrough provides a foundation to explore and experiment with GANs and VAEs on your own projects.

Choosing Between GANs and VAEs: Final Thoughts

GANs are your best choice for generating high-quality, realistic visuals. They shine in applications like digital marketing, gaming, and e-commerce where photorealism is key. VAEs, on the other hand, offer structured control, making them ideal for personalized recommendations, synthetic data generation, and anomaly detection. Understanding GANs and VAEs’ differences and similarities can help you make informed choices and harness the full potential of generative AI.

Now that you’re equipped with this knowledge—and some Python code—you’re ready to bring your digital visions to life with GANs and VAEs. Happy generating!

Share on

Tags

Subscribe to see what we're thinking

Subscribe to get access to premium content or contact us if you have any questions.

Subscribe Now