Image Classification on Grape Leaves Disease using Deep Learning

aiman hasni
10 min readFeb 7, 2022

Viticulture is one of the important thing in agriculture production where grape plays vital roles in wine production. Thus, taking care vineyard is the first crucial part in order to further the whole process in making wine. One of the most challenging part in taking care of vineyard is to supervise the condition of every vineyard and it will take some time and huge amount of effort to do so.

As machine learning, deep learning and artificial intelligence arising, there’s a leap in human productivty in almost many fields. Viticulture also one of the fields that can implement these kind of technologies to increase effectivity such as using computer vision to detect the grape disease. In this article, we will discuss how we can use deep learning approach to classify esca disease from healthy grape leaves.

Photo by Jovana Askrabic on Unsplash


In this discussion, we are using Deep Learning to make a classification of grape leaves either they’re affected by esca or a healthy one using a dataset of images. We are using Convolutional Neural Network as it has the ability to extract the feature in order to recognize pattern differences from the images.


The dataset contains two classes which is unhealthy leaves that affected by esca disease and healthy leaves. The data that has been used is from a research project jointly developed by the Department of Information Engineering, Polytechnic University of Marche, Ancona, Italy and the STMicroelectronics, Italy, under the cooperation of the Umani Ronchi SPA winery, Osimo, Ancona, Marche, Italy. There is total of 888 images for esca affected leaves and 882 images for healthy leaves made the whole dataset contains 1772 images. It can be downloaded here.

Grape leaves with esca disease images.
Figure 1: Grape leaves with esca disease images.

At this point, amount of dataset are not appropriate enough to feed in our network, thus, augmentation techniques had been introduced in order to increase the size of data.

Augmentation Procedure:

In data augmentation, our dataset will be transformed into a diverse shape or condition by applying random, but realistic, transformations such as geometric transformations, color space transformations, cropping, noise injection and random erasing.

This code is for downloading and extracting dataset straight from the source:

dataset_name = "esca_dataset"# Url to repo (repo temporary saved in Google Drive but intended to Mendeley repo)dataset_url =  ""   # Google Drive -> to change with Mendely Link# Trick to use wget with gDrive: use ''# where FILEID is extracted from the virtual link provided from Google drivedataset_url4wget = ""# Download the archive directly from url!wget -r --no-check-certificate "$dataset_url4wget" -O $dataset_name".zip"!ls# Unzip data!unzip  $dataset_name".zip"!ls

Then, we can apply the ImageDataGenerator class, provided by the Keras (v 2.4.3) deep learning library to augment all the images.

# The new dataset 'augmented_esca_dataset' will be created.# This dataset contains the augmented images create by the ImageGenerator class and the orginal images,# in order to obtain an expanded version of the orginal dataset ready-to-usefrom keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import tensorflow as tf
import os
from numpy import expand_dims
import cv2
import matplotlib.pyplot as plt
from pathlib import Path
def blur(img):return (cv2.blur(img,(30,30)))def horizontal_flip(img):return (tf.image.flip_left_right(img))def vertical_flip(img):return (tf.image.flip_up_down(img))def contrast(img):return (tf.image.adjust_contrast(img, 0.5))def saturation(img):return (tf.image.adjust_saturation(img, 3))def hue(img):return (tf.image.adjust_hue(img, 0.1))def gamma(img):return (tf.image.adjust_gamma(img, 2))new_dataset = 'augmented_esca_dataset'classes = ['esca', 'healthy']for class_tag in classes:input_path = '/content/' + dataset_name + '/' + class_tag + '/'output_path = '/content/' + dataset_name + '/' + new_dataset + '/' + class_tag + '/'print(input_path)print(output_path)# TMP!rm -rf $output_path# END TMPtry:if not os.path.exists(output_path):os.makedirs(output_path)except OSError:print ("Creation of the directory %s failed\n\n" % output_path)else:print ("Successfully created the directory %s\n\n" % output_path)for filename in os.listdir(input_path):if filename.endswith(".jpg"):# Copy the original image in the new datasetoriginal_file_path = input_path + filenameoriginal_newname_file_path = output_path + Path(filename).stem + "_original.jpg"%cp $original_file_path $original_newname_file_path# Initialising the ImageDataGenerator class.# We will pass in the augmentation parameters in the constructor.for transformation in transformation_array:if transformation == "horizontalFlip":#datagen = ImageDataGenerator(horizontal_flip = True) # for random flipdatagen = ImageDataGenerator(preprocessing_function=horizontal_flip) # all imgs flippedelif transformation == "verticalFlip":#datagen = ImageDataGenerator(vertical_flip = True) # for random flipdatagen = ImageDataGenerator(preprocessing_function=vertical_flip) # all imgs flippedelif transformation == "rotation":datagen = ImageDataGenerator(rotation_range = 40, fill_mode='nearest')elif transformation == "widthShift":datagen = ImageDataGenerator(width_shift_range = 0.2, fill_mode='nearest')elif transformation == "heightShift":datagen = ImageDataGenerator(height_shift_range = 0.2, fill_mode='nearest')elif transformation == "shearRange":datagen = ImageDataGenerator(shear_range = 0.2)elif transformation == "zoom":datagen = ImageDataGenerator(zoom_range = [0.5, 1.0])elif transformation == "blur":datagen = ImageDataGenerator(preprocessing_function=blur)elif transformation == "brightness":#Values less than 1.0 darken the image, e.g. [0.5, 1.0],#whereas values larger than 1.0 brighten the image, e.g. [1.0, 1.5],#where 1.0 has no effect on brightness.datagen = ImageDataGenerator(brightness_range = [1.1, 1.5])elif transformation == "contrast":datagen = ImageDataGenerator(preprocessing_function=contrast)elif transformation == "saturation":datagen = ImageDataGenerator(preprocessing_function=saturation)elif transformation == "hue":datagen = ImageDataGenerator(preprocessing_function=hue)elif transformation == "gamma":datagen = ImageDataGenerator(preprocessing_function=gamma)# Loading a sample imageimg = load_img(input_path + filename)# Converting the input sample image to an arraydata = img_to_array(img)# Reshaping the input image expand dimension to one samplesamples = expand_dims(data, 0)# Plot original imageprint("Original image:")print(filename)if enable_show:plt.imshow(img)"\n\n")# Generating and saving n_augmented_images augmented samplesprint("Apply " + transformation + ".")# prepare iteratorit = datagen.flow(samples, batch_size = 1,save_to_dir = output_path,save_prefix = Path(filename).stem + "_" + transformation,save_format ='jpg')batch = Plot trasnformed imageimage = batch[0].astype('uint8')if enable_show:print("Transformed image:")plt.imshow(image)"\n\n")print("Done!\n\n")

It will take some time so sit back and sip your coffee untill it’s printed done. You can view the whole code on my github.

This is result of the augmentation for one image:

Figure 2: Augmentation result.

You can see here, one image has 14 differents of condition which every condition is unique and it helps in increasing the variety of the dataset.

Deep Learning Implementation:

Import All Important Dependencies

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import load_model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.layers import Activation, Dropout, Flatten, Dense
from tensorflow.keras.preprocessing import image_dataset_from_directory
import numpy as np
import matplotlib.pyplot as plt
import os
import time

Extract Our Dataset Information

data_dir = pathlib.Path(dir_original)#Import dataset directoryset_samples = ['train', 'validation', 'test']print("set_samples: ", set_samples, "\n")CLASS_NAMES = np.array([ for item in sorted(data_dir.glob('*'))])print("class: ", CLASS_NAMES, "\n")N_IMAGES = np.array([len(list(data_dir.glob('/*.jpg'))) for item in sorted(data_dir.glob('*'))])      # number of images for classprint("number of images for class: ", N_IMAGES, "\n")N_samples = np.array([(int(np.around(n*60/100)), int(np.around(n*15/100)), int(np.around(n*25/100))) for n in N_IMAGES])  # number of images for set (train,validation,test)print("split of dataset: \n ", N_samples, "\n")

Within this code, we can view all the information regarding our dataset and divide them into approporiate ratio, which consist of 60% train, 15% validation and 25% test set.

This is the result of dataset information:

Figure 3: Dataset Information.

Model Architecture

This model consists of 5 convolutional 2D layers followed by ReLu activation function and 5 2D max pooling with 2×2 pool size. In the final stage a flatten, two dense layers, with ReLu and softmax activation function respectively and a dropout layer between them, are inserted to classify the provided input training images into 2 fill level classes.

Convolutional Neural Network Reference for further reading

  1. CNN documentation from Tensorflow
  2. CNN Architecture Explanation

Now we go to the code:

model = Sequential()model.add(Conv2D(32, (3, 3), padding='same', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3), padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(2)) #because we have 2 class

Summary of the architecture:

Figure 4: Architecture summary.

Model Compilation & Training

After that we can compile the model using .compile() method and start the training session by calling .fit() method like this:

model.compile(loss='categorical_crossentropy',optimizer=keras.optimizers.Adadelta(learning_rate=1, name='Adadelta'),metrics['accuracy'])
with tf.device('/device:GPU:0'):
history =,epochs=epochs,
validation_data=validation_dataset)'your directory') #save your model in any file path you want

Model Evaluation

When the last epochs is done in training, the model will be saved automatically in directory we put earlier, with that we can always download the model even call it back to do evaluation with testing dataset we split earlier.

Before that we can use matplotlib.pyplot library that we imported to plot the performance of the model during training that we can summarize through it.

This is the graph of this model performance during training:

Figure 5: Performance Graph.


As we can see, accuracy results for testing dataset showed 96% of accuracy, which mean model achieve high accuracy in determining test dataset.

The loss for training and validation showed a declining trend which made this trained model learned from the error prediction to predict better. Graph showed that our model had a kind of good fitting as the difference on loss and accuracy both for training and validation are not quite far.


Although this model is already performed better with default CNN architecture, we can always try to do some comparison with any other technique to see how the performance will change. In this case, I am using transfer learning technique to see if the accuracy can be improve better and see how it will performed.

Applying transfer learning is not difficult as it seem, as we need to change several parameter to make it happened. Using the same dataset we will train the model and make significant test to see the comparison between both model.

Transfer Learning


First we need to import Transfer Learning library as I’m using InceptionV3, we need to import that first and freeze some layer in the network to prevent its weight from being modified during the backward pass of training. It will progressively ‘lock-in’ the weights for each layer to reduce the amount of computation in the backward pass and decrease training time. To made this happened, layer.trainable = False is used.

import keras
from keras.applications.inception_v3 import InceptionV3
from keras.models import Model,load_model
conv_base = InceptionV3(weights='imagenet',include_top=False,input_shape=(300, 300, 3))output = conv_base.layers[-1].output
output = keras.layers.Flatten()(output)
model_tl = Model(conv_base.input, output)
model_tl.trainable = False
for layer in model_tl.layers:
layer.trainable = False
layers = [(layer,, layer.trainable) for layer in
model_layers=pd.DataFrame(layers, columns=["Layer Type", "Layer Name", "Layer Trainable"])print(model_layers)

Architecture summary for InceptionV3:

Figure 6: Architecture Summary for Transfer Learning CNN.

We use the same method to compile and train the model as before, and here is the performance graph for our Transfer Learning model:

Figure 7: Performance Graph.

We can see the accuracy improve slighly better than default CNN which is the accuracy is 97%, and we also can look the graph had a good fitting without having too much epochs for training.

Significant Test

Although we can see that the performance is not really has much different, we can view the model performance by looking at their performance details using confusion matrix.

Using sklearn library we can plot the confusion matrix and to apply to our model to see their performance summarization with more details.

Here is the confusion matrix results for Default CNN architecture:

Figure 8

Confusion matrix results for CNN with Transfer Learning:

Figure 9


Using the deep convolutional neural network architecture, we trained a model on images of grape leaves with the goal of classifying esca affected leaves from the healthy one on images that the model had not seen before. Within the Mendeley data set of 24780 (after augmentation) images containing 2 classes, this goal has been achieved as demonstrated by the top accuracy of 97.05% for Transfer Learning method and 96.88% for Default CNN. Thus, without any feature engineering, both models are able to correctly classifies esca affected and healthy grape leaves.

As we can see, the comparison between two Neural Network architecture gave not much difference but we can review all the factor for each architecture. To be fair, all hyperparamter are standardize for both models, we can see through this table:

Table 1: Hyperparameters for both CNN architecture.

The only different parameter here is epochs used in CNN with Transfer Learning as we don’t want to overfit the model with excessive value of epochs. Although we can see that with less epoch, CNN with Transfer Learning performed better than the Default CNN one, which it is good in term of reducing time consumed in training model.

However, this works can be improved with new image collection efforts, where we could try to obtain images from many different perspectives, and ideally from settings that are as realistic as possible.

For the whole code, you can view on my github.

Model Deployment:

After we done with training and evaluation, our model can be deployed on any platform be it web, apps or any advance devices. In this case, I’m deploying this model into flask web application. So, the web will open the webcam and you need to show images of grape leaves, and it will predict probability of the images and classify either the image is belong to esca or healthy grape leaves class. You can see video below or you can try it real time using this link.

Here is the video demo: