Deep Learning Based on Residual Networks for Automatic Sorting of Bananas

Abdulkader Helwan
6 min readApr 23, 2021

In the food industry, the quality of processed fruits is extremely important. Meeting the demands of the consumers and producing high-quality fruits at the production line at a very fast rate requires the implementation of high-performance technologies [1]. Moreover, the food industry is one of the few fields which have restricting conditions and constraints due to its dependency on weather conditions and the labor market [2]. For example, if the fruits were not harvested at the most suitable time due to weather conditions, the quality and quantity of the harvest may decrease due to bad weather conditions and excessive ripening of the fruits. Over the years, the most technological processes in this industry were mainly controlled by human operators. Some delicate tasks such as postharvest and grading of healthy and defective products were based on human-made decisions. Human operators are sometimes exposed to the tiredness of the eyes due to lack of sleep and fatigue caused by overworking that can affect their performances. Fruit sorting is a decision-making task which is based on some visual features of the fruit and decides whether a fruit is healthy or defective when it passes through a conveyor belt. Therefore, it is a computer vision problem which can be perfectly solved using machine learning that can prevent the errors caused by human operators [3].

Recently, different research works have been performed for controlling and grading of fruits using computer vision and machine learning techniques. The common applications are classification and sorting of fruits [4, 5], identification of the fruits defects [6], ripeness detection [7], and estimation of food security [8]. Reference [6] claimed that the products produced should have a certain weight, size, colour, and density in order to meet quality standards. Therefore, they proposed a machine vision system for controlling 1–10 conveyor belts, with a maximum performance of 15 fruits per second. The system aimed to classify the fruits into a set of classes using the weight, size, and colour of fruits. The presented system was based on the automatic visual inspection on fruits and vegetables using machine vision algorithms and sensors. The developed system using visual fruits’ features implemented colour processing, weights detection, size measurements, and density detection. The authors claimed that the system performance was satisfactory as it was compared to human criteria, and no significant differences were observed. Moreover, the computation time of the system has also been decreased to 15 fruits/second, and at the same time, the system controlled 2 conveyor belts.

In recent years, research works had been carried out for determination of banana size [9], banana ripeness [10], and sorting of healthy and unhealthy bananas [11]. Reference [5] presented an automatic sorting system for bananas. The system was based on the extraction of texture features of bananas using the gray-level cooccurrence matrix (GLCM). Three algorithms backpropagation neural network (BPNN), support vector machine (SVM), and radial basis function network (RBFN) were used for classification purposes. Experimental results have shown the highest classification rate of 100% using SVM. However, RBFN and BPNN scored 96.25% and 98.8%, respectively. As a result of the implementation of these research studies, the system performances such as the production quality and quantity have been increased. Additionally, the production process has switched to the faster operating mode.

Recently, different machine learning algorithms are implemented for solving different engineering and image processing problems. Machine learning, in particular deep learning techniques, has undergone a major development that sharply improved its performance in different areas such as medicine [12, 13], agriculture [14], and food engineering [15]. Different deep learning structures have been designed in order to improve their performance in problem solutions. These are AlexNet [16] with 8 layers, VGG [17] with 18 layers, and GoogLeNet [18] with 22 layers. Chronologically, the aforementioned networks were getting deeper and deeper. However, the “in-depth” structures caused an optimization difficulty during the training of the networks, i.e., vanishing gradients. Consequently, this affected the generalization performance of the network. The accuracy of the network became saturated and degraded rapidly. To overcome this problem, residual learning was employed for training very deep networks [19]. A few research studies have been performed using residual networks for solving different engineering problems. In reference [20], the combination of a deep residual neural network (ResNet) and lower and upper bound estimation is proposed for forecasting future flow in order to construct prediction intervals. In reference [21], the deep neural network is used to identify six kinds of grain pests. The residual network is introduced in order to improve convolutional vision of the model. Reference [22] presents a local binary residual block to promote the very deep residual networks on the trainable parameters. It was shown that the used structure reduced at least 69.2% trainable parameters. The study [23] presented a deep convolutional neural network termed as the dense residual network for optical character recognition. The study [24] presents multiple improved residual networks for super resolution reconstruction of medical images. Residual learning or residual networks (ResNet) builds special constructs by skipping some connections and jumping over some layers. These ResNet models are basically designed by double or triple layer skips instead of using consecutive layer connections as it was used in other deep plain networks (AlexNet). Skipping over layers allows avoiding the vanishing gradient problem. In this study, we are using residual learning for the optimization of network parameters. The study presents the design of a deep network of 50 layers, called ResNet-50, in order to sort the banana fruits into healthy or defective category. Transfer learning and residual learning are applied for the optimization of the network parameters and development of the system.

In this study, transfer learning was employed in order to leverage the knowledge of ResNet-50 into another classification task which is sorting out bananas. Transfer learning of ResNet-50 can be simply described in two stages, i.e., freezing and fine-tuning. In the freezing stage, the publicly available weights and learned parameters of the pretrained models were frozen and used. Fine-tuning begins by removing the fully connected layer (FC) of the ResNet-50 and then rearchitecting it to three fully connected layers with two output neurons at the output layer which corresponds to healthy and defective bananas. We noted that the weights of the FC layers were initiated randomly during training. On the contrary, the weights of the remaining layers were frozen in order to act as a strong feature extractor of high levels of abstractions of input images, as they have been already trained on millions of images from ImageNet dataset [33].

As mentioned, the network was trained using only 40% of the data. The stochastic gradient descent optimization method [34] was used to train the network with a batch size of 64 images for every iteration.

To minimize the cost function, an initial learning rate and a reducing factor of the fully connected layers were set to 0.0001 and 0.1, respectively, during training. Selecting the number of epochs was complex, as it was directly associated with a number of optimization during training. Hence, if the epoch’s number was high, the network might overfit and performed poorly. Therefore, to avoid the overfitting problem, the error and performance rate on validation images were monitored. It was found that the ResNet-50 achieved its highest training accuracy and best generalization capability at epoch 6. Table 1 shows that the training performance of the network was relatively good as it scored a 100% accuracy in a very short time (37 seconds) and a small number of epochs (6) despite the depth of the network and the training scheme (40 : 60).

Find the full paper here: