Cycle GAN Architecture

Saif Gazali
5 min readAug 8, 2021

Image-To-Image translation is the process of conversion of source image to target image. It requires specialized models and custom loss functions for a given task or dataset. These datasets are paired examples and are difficult and expensive to prepare. The dataset must consists many examples of input image (e.g. summer image) and the same image with the desired modification that is used as an expected output image(e.g. winter image). These datasets are difficult to prepare hence techniques which do not require such paired examples are preferred.

Cycle GAN is an approach to learn special characteristics of the source image and figuring out how these characteristics can be translated into target image in absence of any paired training examples. Generative Adversarial Networks includes a generator model which is capable of generating new plausible fake samples that can be considered to be coming from an existing distribution of samples and a discriminator model that would classify the given sample as real or fake. Cycle GAN is an extension of GAN architecture where in two generator models and two discriminator models are simultaneously trained.

The first generator model takes input from the first domain and generates a translated image for the second domain whereas the second generator takes image from the second domain and generates a translated image for the first domain. Both the discriminators are then used to determine whether the generated images are real or fake. This adversarial loss is good to generate plausible image in target domain but not sufficient to generate a plausible translation. Cycle consistency is a process where in an image outputted by the first generator model is provided as an input to the second generator model and the output from the second generator model should match the original image. Cycle GAN introduces cycle consistency by using an additional loss where in the difference between the generated output of the second generator and the original image is measured.

Cycle GAN Model Architecture

Consider the problem where in we would translate the images of apple to image of oranges and reverse. We have a dataset of apple images and a dataset of orange images and they are not paired meaning they are not the actual translation of each other. Two GANs are developed with each one having a generator and a discriminator model. The first GAN will generate image of an orange given an image of an apple. The second would do the reverse. Each GAN is also updated using cycle consistency loss which is designed to generate plausible translation of the input image. Cycle consistency loss compares an input image to the Cycle GAN to the generated image by calculating the difference between the pixel values.

Forward cycle consistency loss and backward cycle consistency loss are two ways to calculate Cycle consistency loss.

Forward Cycle Consistency Loss:

— Input photo of Apple(collection 1) to GAN 1

— Output photo of Orange from GAN 1

— Input photo of Orange from GAN 1 to GAN 2

— Output photo of Apple from GAN 2

— Compare photo of Apple(collection 1) to photo of Apple from GAN 2

Backward Cycle Consistency Loss:

— Input photo of Orange(collection 2) to GAN 2

— Output photo of Apple from GAN 2

— Input photo of Apple from GAN 2 to GAN 1

— Output photo of Orange from GAN 1

— Compare photo of Orange(collection 2) to photo of Orange from GAN 1

Discriminator model

Cycle GAN uses a Patch GAN rather than a DCNN as it is discriminator model which helps it to classify patches of image as real or fake instead of an entire image. The discriminator is convolutionally run across the image where in we average all the responses to give the final output. The network outputs a single feature map of real and fake predictions that is averaged to give a single score. 70x70 patch size is considered to be effective across different image-to-image translation task. Hence Patch GAN has the ability to predict whether each 70 x 70 patch in the input image is real or fake.

The discriminator model described in the paper uses blocks of Conv2D-InstanceNorm-LeakyReLU layers with 4x4 filters and 2x2 strides. The architecture is C64-C128-C256-C512. After the last layer a convolution is applied to produce a 1D output. Instance normalization which involves standardizing the values on each feature map in order to remove image specific contrast information is used instead of batch normalization.

Discriminator model Summary.

Discriminator Model Summary

Plotting our Discriminator model.

Discriminator model plot

Generator Model

The generator model uses deep convolutional GAN which is implemented using multiple residual blocks from the ResNet architecture and is based on the approach described in the paper Perceptual Losses for Real-Time Style Transfer and Super-Resolution. The model uses a number of down-sampling convolutional blocks to encode the input image, a number of ResNet convolutional blocks to transform the image and a number of up-sampling convolutional blocks to generate the output image.

The architecture for 6-resnet block generator is as follows :

C7s1–64,d128,d256,R256,R256,R256,R256,R256,R256,u128,u64,C7s1–3.

Each Resnet block(R256) has two 3x3 convolutional neural network layers where the input is concatenated channel-wise. The down-sampling or d-blocks and up-sampling or u-blocks include Conv2D-InstanceNormalization-Activation blocks with the number of nodes as 128 or 256. The C7s1–64 and C7s1–3 is also a Conv2D-InstanceNormalization-Activation block with number of nodes as 64 and 3 respectively.

Plotting our generator model.

Generator Model Plot

Resources

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017.

Perceptual Losses for Real-Time Style Transfer and Super-Resolution, 2016.

Image-to-Image Translation with Conditional Adversarial Networks, 2016.

--

--