EGAN (Enhanced GAN)

Last updated on 31 Mar 2023

EGAN (Enhanced Generative Adversarial Networks) is an extension of Generative Adversarial Networks (GANs) that was introduced by Salimans et al. in 2016. GANs are a type of generative model that can learn the distribution of a set of data and generate new samples from it. The core idea behind GANs is to train two neural networks: a generator that learns to generate realistic samples, and a discriminator that learns to distinguish between real and fake samples. The generator learns by trying to generate samples that fool the discriminator, while the discriminator learns by trying to correctly classify samples as real or fake.

EGAN builds upon the GAN framework by introducing additional modifications to the loss function and the architecture of the generator and discriminator networks. These modifications are aimed at improving the stability and quality of the generated samples. In this article, we will explain these modifications in detail.

Wasserstein distance-based loss function: One of the main issues with the original GAN loss function is that it can be difficult to train. Specifically, the generator may generate samples that are very different from the training data, which can make it difficult for the discriminator to provide informative feedback. To address this problem, EGAN introduces a loss function based on the Wasserstein distance between the generated and real data distributions. This loss function is also known as the Wasserstein GAN (WGAN) loss function.

The Wasserstein distance is a measure of how different two probability distributions are. By using this distance as the loss function, the generator is encouraged to generate samples that are more similar to the training data. Additionally, the discriminator is no longer required to output probabilities for each sample, but instead outputs a single scalar value that represents the distance between the generated and real data distributions.

Spectral normalization: Another issue with GANs is that the discriminator can become too powerful, making it difficult for the generator to learn. To address this, EGAN introduces spectral normalization, which is a technique for constraining the weights of the discriminator network. This technique ensures that the discriminator's output is bounded, which helps to stabilize the training process and prevent the generator from collapsing.

Spectral normalization works by dividing each weight in the discriminator network by its spectral norm, which is the largest singular value of the weight matrix. This effectively constrains the Lipschitz constant of the discriminator's mapping, which makes it more difficult for the discriminator to become too powerful.

Feature matching: In the original GAN architecture, the generator learns to generate samples by minimizing the difference between the distribution of generated samples and the distribution of real samples. However, this can lead to the generator generating samples that are too similar to each other, resulting in low diversity. To address this, EGAN introduces feature matching, which encourages the generator to match the statistics of the intermediate layer activations of the discriminator, rather than just the final output.

Feature matching works by computing the mean and covariance of the intermediate layer activations of the discriminator for both real and generated samples. The generator is then trained to minimize the difference between these statistics. This encourages the generator to produce a diverse set of samples that match the statistics of the real data, rather than just producing samples that are similar to each other.

Self-attention mechanism: Finally, EGAN introduces a self-attention mechanism to improve the quality of the generated samples. The self-attention mechanism is a technique for modeling long-range dependencies in data. In the context of EGAN, it is used to model the relationships between different parts of an image.

The self-attention mechanism works by computing attention weights for each pixel in an image, based on its relationships with other pixels. These attention weights are then used to compute a weighted sum of the feature maps, which effectively highlights the most important parts of the image. This helps to generate high-quality images with more realistic details.

The self-attention mechanism is applied to both the generator and discriminator networks in EGAN. Specifically, it is applied to the convolutional layers in the networks. By applying the self-attention mechanism, EGAN is able to generate more realistic and higher-quality images compared to traditional GANs.

In summary, EGAN is an extension of GANs that introduces several modifications to the loss function and architecture of the generator and discriminator networks. These modifications are aimed at improving the stability and quality of the generated samples. Specifically, EGAN uses a Wasserstein distance-based loss function, spectral normalization, feature matching, and a self-attention mechanism. These techniques allow EGAN to generate high-quality images with more realistic details and greater diversity.