Skip to content

Generative Adversarial Networks (GANs)

Overview

Generative Adversarial Networks (GANs) are a class of generative models designed to learn the underlying distribution of a dataset and generate new, realistic samples from it. A GAN consists of two neural networks trained simultaneously in an adversarial setup: a generator, which attempts to produce synthetic data indistinguishable from real data, and a discriminator, which attempts to distinguish between real and generated samples. Training proceeds as a minimax game in which the generator improves by fooling the discriminator, while the discriminator improves by better detecting fakes.

GANs were introduced in 2014 and quickly became influential due to their ability to produce high-fidelity samples, particularly in image generation tasks. Unlike likelihood-based generative models, GANs do not explicitly define or optimize a probability density function; instead, they learn through adversarial feedback. Over time, numerous architectural and training refinements have been introduced to improve stability, convergence, and sample quality, including alternative loss functions, normalization strategies, and architectural constraints.

GANs are especially well suited to tasks where perceptual quality is important, though they are known to be challenging to train and sensitive to hyperparameters and data quality.

Applications and Use Cases

  • Image generation and synthesis
  • Image-to-image translation
  • Super-resolution
  • Style transfer
  • Data augmentation for low-resource domains
  • Domain adaptation
  • Video and audio generation (less common, more complex)
  • Synthetic data generation for privacy-preserving analytics
  • Vanilla GAN
  • DCGAN (Deep Convolutional GAN)
  • WGAN / WGAN-GP
  • Pix2Pix
  • CycleGAN
  • StyleGAN / StyleGAN2 / StyleGAN3
  • BigGAN

Strengths

  • Capable of producing highly realistic and sharp samples
  • Flexible framework applicable across many data modalities
  • Does not require explicit likelihood modeling
  • Particularly strong for image-based generative tasks
  • Enables unsupervised or weakly supervised representation learning

Drawbacks

  • Training is unstable and sensitive to hyperparameters
  • Mode collapse can occur, reducing sample diversity
  • Difficult to evaluate quantitatively; metrics are often proxy-based
  • Requires careful balancing between generator and discriminator
  • Less effective for tasks requiring explicit probability estimates

Further Reading