Cell Nuclei Segmentation using VGG16-UNET And Double-UNET

9 min readAug 14, 2021

Image Segmentation is the process of dividing an image into segments in order to make use of important segments for processing the image instead of processing the entire image. An image is made of pixels and using image segmentation the pixels having similar attributes are grouped together which allows us to get a far better granular understanding about the content of the image. Each grouped part is assigned a label and using these labels we specify boundaries which separate the required objects in the image from the rest of the image.

In this article we would be segmenting Cell Nuclei images that would allow us to automate the process of nucleus identification which leads to more efficient drug testing. The dataset can be downloaded from here. The training dataset contains the retina image, mask and the annotated image. The training set and the test dataset both contains around 20 images. Plotting the Image and its annotated image below.

Loading Dataset

Creating training, testing and validation set from our dataset.

from sklearn.model_selection import train_test_split
def load_path(path,split=0.2):
  X = sorted(glob(os.path.join(path,"images","*.jpg")))
  Y = sorted(glob(os.path.join(path,"masks","*.jpg")))  size = int(len(X) * split)  train_x,valid_x = train_test_split(X,test_size=size,random_state=1)
  train_y,valid_y = train_test_split(Y,test_size=size,random_state=1)  train_x,test_x = train_test_split(train_x,test_size=size,random_state=1)  train_y,test_y = train_test_split(train_y,test_size=size,random_state=1)  return (train_x,train_y), (valid_x,valid_y), (test_x,test_y)

Creating a method to shuffle the images.

from sklearn.utils import shuffle
def shuffling(x,y):
  x,y = shuffle(x,y)
  return x,y

Loading the training, testing and validation dataset.

path = '/content/drive/MyDrive/DSB'(train_x,train_y), (valid_x,valid_y), (test_x,test_y) = load_path(path)

We have around 402 images in our training set, 134 images each in testing and validation set.

print(len(train_x),'-',len(train_y)) #402 - 402print(len(valid_x),'-',len(valid_y)) #134 - 134print(len(test_x),'-',len(test_y)) #134 - 134

Data Pipeline

Creating a method to read the image which would take a path and read an image from it followed by normalizing the image.

def read_image(path):
  path = path.decode()
  x = cv2.imread(path,cv2.IMREAD_COLOR)  x = x/255.0
  x = x.astype(np.float32)
  return x

Creating a method to read the annotated image which would take a path and read the annotated image. We would normalize the image and expand its dimension to make it as 3D.

def read_mask(path):
  path = path.decode()
  x = cv2.imread(path,cv2.IMREAD_GRAYSCALE)
  x = x/255.0
  x = x.astype(np.float32)
  x = np.expand_dims(x,axis=-1) #(256,256,1)
  return x

Creating a method to parse the image and its mask which would process the images.

def tf_parse(x,y):
  def _parse(x,y):
    x = read_image(x)
    y = read_mask(y)
    return x,y
  x,y = tf.numpy_function(_parse,[x,y],[tf.float32,tf.float32])
  x.set_shape([256,256,3])
  y.set_shape([256,256,1])
  return x,y

Creating a method tf_dataset() which would take list of images and masks along with a batch size. It would use tensorflow Dataset API and returned a batched dataset after processing it through the above defined methods.

def tf_dataset(X,Y,batch_size=2):
  dataset = tf.data.Dataset.from_tensor_slices((X,Y))
  dataset = dataset.map(tf_parse)
  dataset = dataset.batch(batch_size)
  dataset = dataset.prefetch(4)
  return dataset

Getting the dataset.

train_dataset = tf_dataset(train_x,train_y)
valid_dataset = tf_dataset(valid_x,valid_y)

Calculating the number of steps to train our model.

#no of stepstrain_steps = len(train_x)//batch_size
test_steps = len(valid_x)//batch_sizeif len(train_x) % batch_size != 0:
  train_steps +=1if len(valid_x) % batch_size != 0:
  test_steps +=1print(train_steps) #201
print(test_steps) #67

Evaluation Metrics

Intersection-Over-Union is a common evaluation metric for semantic image segmentation, which first computes the IOU for each semantic class and then computes the average over classes. IOU is defined as follows: IOU = true_positive / (true_positive + false_positive + false_negative). The predictions are accumulated in a confusion matrix, weighted by sample_weight and the metric is then calculated from it.

def iou(y_true,y_pred):
  def f(y_true,y_pred):
    intersection = (y_true*y_pred).sum()
    union = y_true.sum() + y_pred.sum() - intersection
    x = (intersection + 1e-15) / (union + 1e-15)
    x = x.astype(np.float32)
    return x
  return tf.numpy_function(f,[y_true,y_pred],tf.float32)

Dice Loss is essentially a measure of overlap between two samples. This measure ranges from 0 to 1 where a Dice coefficient of 1 denotes perfect and complete overlap. The Dice coefficient was originally developed for binary data and can be used in our case.

smooth = 1e-15
def dice_coef(y_true,y_pred):
  y_true = tf.keras.layers.Flatten()(y_true)
  y_pred = tf.keras.layers.Flatten()(y_pred)
  intersection = tf.reduce_sum(y_true*y_pred)
  return (2. * intersection + smooth) / (tf.reduce_sum(y_true) + tf.reduce_sum(y_pred))def dice_loss(y_true,y_pred):
  return 1.0 - dice_coef(y_true,y_pred)

VGG16-UNET Model

The encoder layers in our UNET model will be using Pretrained layers from VGG-16 model whereas the other layers would be trained during the model training. Defining a method to return a covolutional block(Convolution-BatchNormalization-ReLU).

def conv_block(inputs,num_filters):
  x = Conv2D(num_filters,3,padding='same')(inputs)
  x = BatchNormalization()(x)
  x = Activation('relu')(x)  x = Conv2D(num_filters,3,padding='same')(x)
  x = BatchNormalization()(x)
  x = Activation('relu')(x)  return x

Defining our decoder block.

def define_decoder(inputs,skip_layer,num_filters):
  init = RandomNormal(stddev=0.02)
  x = Conv2DTranspose(num_filters,(2,2),strides=(2,2),padding='same',kernel_initializer=init)(inputs)  g = Concatenate()([x,skip_layer])
  g = conv_block(g,num_filters)
  return g

Importing the VGG-16 model from Keras applications library and printing its summary.

vgg16 = VGG16(include_top=False,weights='imagenet')
vgg16.summary()

WE would select the layers according to our output shape required for each layer in our traditional encoder block of the UNET model. We require layers with output shape 512,256,128 and 64. Hence selecting the layers from the VGG16 model which satisfy the requirements.

def vgg16_unet(input_shape):
  inputs = Input(shape=input_shape)  vgg16 = VGG16(include_top=False,weights='imagenet',input_tensor=inputs)  # We will extract encoder layers based on their output shape from vgg16 model  s1 = vgg16.get_layer('block1_conv2').output  
  s2 = vgg16.get_layer('block2_conv2').output  
  s3 = vgg16.get_layer('block3_conv3').output  
  s4 = vgg16.get_layer('block4_conv3').output    # bottleneck/bridege layer from vgg16
  b1 = vgg16.get_layer('block5_conv3').output #32
  
  # Decoder Block
  d1 = define_decoder(b1,s4,512)
  d2 = define_decoder(d1,s3,256)
  d3 = define_decoder(d2,s2,128)
  d4 = define_decoder(d3,s1,64)  #output layer
  outputs = Conv2D(1,1,padding='same',activation='sigmoid')(d4)
  model = Model(inputs,outputs)
  
  return model

Defining our VGG16-UNET model and compiling.

model = vgg16_unet((256,256,3))model.compile(loss=dice_loss,optimizer=Adam(lr),metrics=[dice_coef,iou,Recall(),Precision()])

Training our model for 25 epochs.

model.fit(
train_dataset,
epochs=25,
validation_data=valid_dataset,
steps_per_epoch=train_steps,
validation_steps=test_steps)

Intersection over Union ratio is around 88%.

Using our model to to plot segmented images for the test dataset and plotting it. The image plot has the cell image, its annotated image and our predicted image.

Double UNET Model

Double UNET starts with a VGG-19 as encoder sub-network, which is followed by decoder subnetwork. What distinguishes Double UNET from UNET in the first network (NETWORK 1) is the use of VGG-19 marked in yellow, ASPP marked in blue, and decoder block marked in light green. The squeeze-and-excite block [21] is used in the encoder of NETWORK 1 and decoder blocks of NETWORK 1 and NETWORK 2. An element-wise multiplication is performed between the output of NETWORK 1 with the input of the same network.

The first encoder in Double UNET (encoder1) uses a pretrained VGG-19 model, whereas the second encoder (encoder2), is built from scratch. we use two decoders in the entire network, with small modifications on the decoder as compared with that of the original UNET. Each block in the decoder performs a 2 × 2 bi-linear up-sampling on the input feature, which doubles the dimension of the input feature maps. Now, we concatenate the appropriate skip connections feature maps from the encoder to the output feature maps.

Squeeze and Excitation block

The Squeeze-and-Excitation Block is an architectural unit designed to improve the representational power of a network by enabling it to perform dynamic channel-wise feature recalibration. The process is:

The block has a convolutional block as an input.
Each channel is “squeezed” into a single numeric value using average pooling.
A dense layer followed by a ReLU adds non-linearity and output channel complexity is reduced by a ratio.
Another dense layer followed by a sigmoid gives each channel a smooth gating function.
Finally, we weight each feature map of the convolutional block based on the side network; the “excitation”.

def squeeze_excite_block(inputs,ratio=8):  init = inputs   # (b,128,128,32)
  channel_axis = -1
  filters = init.shape(channel_axis)  print(filters) # 32  se_shape = (1,1,filters)  # global average pooling layer to supress/squeeze it
  se = GlobalAveragePooling2D()(init)  # (b,32) -> (b,1,1,32)
  se = Reshape(se_shape)(se)
  se = Dense(filters//ratio,activation='relu',use_bias=False)(se) # ratio - how much we need to reduce the amount of features in the Dense layer  se = Dense(filters,activation='sigmoid',use_bias=False)(se)  # contains value btw [0,1] - supresses the features not of importance  #Channel wise attention
  x = Multiply()([inputs,se])  return x

Atrous Spatial Pyramid Pooling

Atrous Spatial Pyramid Pooling (ASSP) is a semantic segmentation module for resampling a given feature layer at multiple rates prior to convolution. This amounts to probing the original image with multiple filters that have complementary effective fields of view, thus capturing objects as well as useful image context at multiple scales. Rather than actually resampling features, the mapping is implemented using multiple parallel atrous convolutional layers with different sampling rates.

def ASPP(x,filter):  shape = x.shape  # Taking an image map -> Downsampling it -> Upsampling it to original shape
  y1 = AveragePooling2D(pool_size=(shape[1],shape[2]))(x)
  y1 = Conv2D(filter,1,padding='same')(y1)
  y1 = BatchNormalization()(y1)
  y1 = Activation('relu')(y1)
  y1 = UpSampling2D((shape[1],shape[2]),interpolation='bilinear')(y1)  #dilation_rate =1 is normal convolution
  y2 = Conv2D(filter,1,dilation_rate = 1,padding='same',use_bias=False)(x)
  y2 = BatchNormalization()(y2)
  y2 = Activation('relu')(y2)  y3 = Conv2D(filter,3,dilation_rate = 6,padding='same',use_bias=False)(x)
  y3 = BatchNormalization()(y3)
  y3 = Activation('relu')(y3)  y4 = Conv2D(filter,3,dilation_rate = 12,padding='same',use_bias=False)(x)
  y4 = BatchNormalization()(y4)
  y4 = Activation('relu')(y4)  y5 = Conv2D(filter,3,dilation_rate = 18,padding='same',use_bias=False)(x)
  y5 = BatchNormalization()(y5)
  y5 = Activation('relu')(y5)  y = Concatenate()([y1,y2,y3,y4,y5])
  
  y = Conv2D(filter,1,dilation_rate = 1,padding='same',use_bias=False)(y)
  y = BatchNormalization()(y)
  y = Activation('relu')(y)  return y

Defining the encoder-1 of the Double UNET model which uses the Pretrained VGG19 model layers.

# Encoder-1def encoder1(input):
  skip_connections = []  model = VGG19(include_top=False,weights='imagenet',input_tensor=input)  names = ["block1_conv2","block2_conv2","block3_conv4","block4_conv4"]  for name in names:
    skip_connections.append(model.get_layer(name).output)    output = model.get_layer("block5_conv4").output
    
    return output,skip_connections

The second encoder block of Double UNET model uses a Convolution-BatchNormalization-ReLU blocks of layers to create deep convolutional neural networks.

def conv_block(x,filters):  x = Conv2D(filters,3,padding='same')(x)
  x = BatchNormalization()(x)
  x = Activation('relu')(x)  x = Conv2D(filters,3,padding='same')(x)
  x = BatchNormalization()(x)
  x = Activation('relu')(x)  return x

Defining our second encoder block.

def encoder2(inputs):
  num_filters = [32,64,128,256]
  skip_connections = []  x = inputs  for i,f in enumerate(num_filters):
    x = conv_block(x,f)
    skip_connections.append(x)
    x = MaxPool2D((2,2))(x)  return x, skip_connections

The architecture of decoder-1 is same as unet , except that it uses bilinear upsampling instead of a transpose convolution.

def decoder_1(input,skip_connection):
  num_filters = [256,128,64,32]
  skip_connection.reverse()  x = input  for i, f in enumerate(num_filters):
    x = UpSampling2D((2,2),interpolation='bilinear')(x)
    x = Concatenate()([x,skip_connection[i]])
    x = conv_block(x,f)  return x

Similarly defining our second decoder model.

def decoder_2(input,skip_1,skip_2):  num_filters = [256,128,64,32]
  skip_2.reverse()  x = input  for i, f in enumerate(num_filters):
    x = UpSampling2D((2,2),interpolation='bilinear')(x)
    x = Concatenate()([x,skip_1[i],skip_2[i]])
    x = conv_block(x,f)  return x

Defining our output block.

def output_block(inputs):
  x = Conv2D(1,1,padding='same')(inputs)
  x = Activation('sigmoid')(x)
  return x

Creating a method to build our double UNET model.

def build_model(input_shape):  inputs = Input(input_shape)
  x,skip_1 = encoder1(inputs)  x = ASPP(x,64)  x = decoder_1(x,skip_1)  output1 = output_block(x)
  
  x = inputs * output1  x,skip_2 = encoder2(x)  x = ASPP(x,64)  x = decoder_2(x,skip_1,skip_2)  output2 = output_block(x)  outputs = Concatenate()([output1,output2])  model = Model(inputs,outputs)  return model

Creating our Double UNET model and plotting it.

input_shape = (256,256,3)doubleUnetModel = build_model(input_shape)

Compiling our Double UNET model.

doubleUnetModel.compile(loss=dice_loss,optimizer=Adam(lr),metrics=[dice_coef,iou,Recall(),Precision()])

Training our model for 30 epochs.

doubleUnetModel.fit(
train_dataset,
epochs=30,
validation_data=valid_dataset,
steps_per_epoch=train_steps,
validation_steps=test_steps)

Intersection over Union ratio has improved to around 92.8% for Double UNET model from 88%(for VGG16 UNET model).

Using our model to to plot segmented images for the test dataset and plotting it. The image plot has the cell image, its annotated image and our predicted image.

Resources

Cell Nuclei Segmentation using VGG16-UNET And Double-UNET

DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation

Atrous Spatial Pyramid Pooling

Squeeze-and-Excitation Block

Written by Saif Gazali