DEEP LEARNING-BASED VECTOR MOSQUITOES CLASSIFICATION FOR PREVENTING INFECTIOUS DISEASES TRANSMISSION

Healthcare systems worldwide are burdened by mosquitoes transmitting dangerous diseases. Conventional mosquito surveillance methods to alleviate these diseases are based on expert entomologists’ manual examination of the morphological characteristics, which is time-consuming and unscalable. The lack of professional experts brings a high necessity for cheap and accurate automated alternatives for mosquito classification. This paper proposes an end-to-end deep Convolutional Neural Network (CNN) for mosquito species classification by taking advantage of both dropout layers and transfer learning to enhance performance accuracy. Dropout layers randomly disable the neurons of the neural network, mitigating co-adaptation and data overfitting. Transfer learning efficiently applies the extracted features from one dataset to others. Furthermore, a Region of Interest (ROI) visualization component is adopted to gain insight into the model learning. The generalization ability and feasibility of the proposed model are validated on four publicly available mosquito datasets. Experimental results on these datasets with an accuracy of 98.82%, 98.92%, 94.66%, and 98.40% demonstrate the superiority of our proposed system over the recent state-of-the-art approaches. The effectiveness of different numbers of dropout layers, their positions in the network, and their values are all investigated through ablation studies. Visualizing the model attention confirms that useful mosquito features are learned from insect legs and thorax through our model leading to optimistic predictions.


INTRODUCTION
Mosquitoes are tiny creatures with more than 3600 species, among which a few dozen are vectors of deadly diseases such as malaria, dengue, yellow fever, and chikungunya.These diseases cause over one billion infections and around one million deaths worldwide annually Organization et al. (2017); Omodior et al. (2018), making mosquitoes the deadliest animals in the world Gates (2014).Aedes, Anopheles, and Culex are considered as the most dangerous genera Roth et al. (2014); Kittichai et al. (2021a).These mosquitoes are found in almost every region of the world, and their female gender transmits the diseases by injecting infected saliva into human hosts.To prevent and minimize the distribution of mosquito-borne diseases and assist health authorities, it is beneficial to identify and classify the disease-spreading mosquitoes and monitor their populations.Vector control programs are fundamentally carried out using manual microscopic observations through which the insects are identified by morphological and dichotomous keys Rueda (2004); Park et al. (2016); Eritja et al. (2019).However, these conventional methods are required by a highly professional entomologist leading to a time-consuming, laborious, barely scalable, and costly process that is infeasible for practical implementations.Recently, it has become even more challenging due to the lack of experts compared to the extreme increment in mosquito diversity and populations Audisio (2017).Furthermore, the external morphological characteristics are susceptible to damage during sample acquisition, preservation, or transportation so mosquito identification is challenging even for professionals Mewara et al. (2018).
Molecular-based methods such as polymerase chain reaction Clapp (1996), ELISA, and DNAbarcoding Wang et al. (2012); Beebe (2018); Mee et al. (2021) are another practical alternative for identifying and classifying mosquitoes but are also impractical in real-world applications as they follow slow procedures, require expensive technical equipment, and are performed by molecular biology experts Kittichai et al. (2021a).These limitations inspired researchers to develop automated systems to classify mosquito species.The prior frequency-based automated mosquito surveillance systems analyzed wingbeat harmonics using acoustic recorders Jackson and Robert (2006); Silva et al. (2013); Arthur et al. (2014); Ouyang et al. (2015a); Mukundarajan et al. (2017).Despite their high classification accuracy, these devices suffered from limited storage memory, shortdistance operational ranges, and dependence of the data acquisition procedures to the position of the recorder concerning the mosquitoes.
Consequently, vision-based methods have been extensively applied to deal with these challenges in the last five years.These approaches are generally divided into two main categories: 1) conventional machine learning (ML)-based approaches and 2) deep learning (DL)-based approaches.In conventional approaches, the handcrafted features are first extracted, then fed into the ML-based classifiers for mosquito identification Ouyang et al. (2015b); Reyes et al. (2016).Although they achieved successful performance, their accuracy is not still satisfactory for real-world applications.Furthermore, they have low generalization ability as the extracted features, so the model's performance is significantly affected when the dataset is changed.Thanks to recent advances in developing powerful GPUs and largescale datasets, DL-based models such as convolutional neural networks (CNNs) have been extensively applied to numerous computer vision tasks, including mosquito identification and classification.However, they also suffer from some limitations restricting their applicability in real-world scenarios: i) they obtained satisfactory performance for the images acquired in a laboratory under environmental constraints Park et al. (2020); Goodwin et al. (2021), ii) even for those datasets, their accuracy still is not in the human expert level Motta et al. (2019), and iii) they have poor generalization capability as they are only validated on a limited number of datasets (mostly a single dataset) Park et al. (2020); Rustam et al. (2022).
To address these issues, a novel deep CNN (DCNN) model is proposed to simultaneously improve the generalization ability and the accuracy of mosquito classification benefiting from the strengths of both regularization layers and transfer learning.The proposed model can learn and extract the finegrained features from the discriminant parts of the mosquitoes, similar to those used by entomologists in manual examinations.Visualization of these features based on the Grad-CAM Selvaraju et al. (2016) further proves the capabilities and effectiveness of the proposed mosquito classification system.Overall, the key contributions of the paper are summarized as follows: -A novel end-to-end deep neural network is proposed for mosquito classification based on modifying the VGG16 architecture and applying transfer learning on the ImageNet Deng et al. (2009) dataset for taking advantage of features extracted from non-mosquito images.Consequently, the model can perform accurately for both small-scale and large-scale datasets.
-Inspired by the regularization technique, the original architecture of the pre-trained VGG16 is modified by adding two dropout layers which effectively increase the classification accuracy while mitigating model overfitting.
-Proper location and the optimal number and value of the dropout layers are selected through extensive ablation studies resulting in high classification accuracy.In addition to the promising quantitative results, the feasibility of the model is evaluated qualitatively through a visualization model based on the Grad-CAM algorithm.
-Assessing the performance of the proposed model on four different public datasets along with the combination of them, it is demonstrated that our modified model outperforms four pre-trained models under both controlled and uncontrolled environments with small inter-class and large intra-class variations leading to a high level of generalization.
The remainder of the paper is organized as follows.Recent related works are briefly reviewed.Then, the proposed model is presented in detail.The model's performance is evaluated and compared with the recent approaches through experiments besides the explanations regarding the employed datasets, experimental setup, and evaluation metrics.Ablation studies and failure cases for the proposed model are respectively discussed.Finally, we conclude the works with possible future research directions.

RELATED WORK
Due to the importance of automated mosquito identification and classification in monitoring the population of vector mosquitoes and controlling mosquito-borne diseases, researchers have developed numerous approaches using both audio and visual features.Among these studies, the recent competitive vision-based approaches using deep learning models Applying the transfer learning on these pre-trained models, VGG16 surpassed ResNet50 with a validation accuracy of 94.6%.Furthermore, the regions used by the model to learn the features were visualized based on the explainable models.Although it was demonstrated that their model effectively learned features from discriminative morphological patterns of mosquitoes, the final accuracy is still insufficient for real-world scenarios.Moreover, it suffers from poor generalization as the performance was investigated only on a single dataset and for a binary classification task, i.e., tiger vs. non-tiger mosquitoes.At the same time, there are other genera of vector mosquitoes whose identification plays a significant role in efficiently and practically controlling mosquito-borne diseases.Akter et al. Akter et al. (2021) collected their dataset from different web sources.Their dataset was formed by 442 images which were increased to 3600 images by applying four types of augmentation.Proposing a custom CNN model with convolutional, pooling, and dropout layers, they achieved a classification accuracy of 70%, which was improved to 93% after augmentation outperforming the other models of VGG16, Random Forest, XGboost, and SVM.Goodwin et al. Goodwin et al. (2021) also developed a publicly available dataset with 2696 images in 39 classes and investigated the performance of the Xception on it, reporting an accuracy of 97%.
Recently, Rustam et al.Rustam et al. (2022) proposed a new feature selection method as RIFS, i.e., a combination of ROI-and wrapper-based feature selection methods, for binary classification of Aedes and Culex mosquitoes.Different ML-and DL-based classification models were applied, among which Extra Tree Classifier (ETC) (99.2% accuracy) and VGG16 (98.6% accuracy) achieved the best performance while the computational time and cost were reduced.Their implementations were all carried out on a single dataset developed by Pise et al.Pise et al. (2020), including mosquito images on various backgrounds.Although the existing approaches obtained satisfactory performance for mosquito classification, they still need to be improved to the expert-level performance.On the other hand, validating the performance mostly only on a single dataset challenges their generalization capability.Considering these issues, developing a high-quality mosquito classification system on various datasets captured in controlled and uncontrolled environments is highly demanded, providing effective preventive strategies and controlling the spread of the arboviruses.

PROPOSED METHOD
The main flowchart of the proposed mosquito classification system is demonstrated in Fig. 1.It comprises three main components: 1) feature extraction based on pre-trained VGG16, 2) classification module with fully-connected, dropout, and softmax layers, and 3) explainable model based on Grad-CAM for visualization.Each of these components and the applied techniques in the training process are explained in detail in the following subsections.

VGG16-BASED FEATURE EXTRACTOR BACKBONE
Before extracting the features from the input mosquito images, all the RGB images are resized into 224 × 224 pixels whose intensities are normalized, leading to the mean and standard deviation values of 0 and 1, respectively.These pre-processed images are fed into the pre-trained VGG16 model Simonyan and Zisserman (2014) to extract the rich discriminative features by minimizing the cross-entropy loss function.VGG16 is one of the successful vision model architectures.It extracts the feature maps from the input images by a total of 13 convolutional and five max-pooling layers arranged in 5 blocks.Two first blocks have similar structures formed by two convolutional layers with 3 × 3 kernel size and stride of 1, followed by a max-pooling layer with 2 × 2 pooling size and stride of 2. The only difference between these two blocks is the number of filters in the convolutional layers assigned as 64 and 128, respectively.In the last three blocks, the convolutional layers are increased to three with 256, 512, and 512 filters from the third to fifth blocks, respectively.Maxpooling layers are the same as the previous blocks.The activation function of all convolutional layers is the Rectified Linear Unit (ReLU) Agarap (2018).It introduces non-linearity to classification, makes the learning process faster, enhances the performance, and deals with the vanishing gradient issue while it has simple computation.Based on its main equation defined as ReLU(x) = max(0, x), it returns back its input value if it is positive.Otherwise, its output is set to zero.The final extracted feature maps from the VGG16 architecture have a dimension of 7 × 7 × 512.
As training schemes of the CNN models are based on the feed-forward process, the shallower convolutional layers learn general patterns such as edges, corners, boundaries, etc.In contrast, the deeper ones learn more extensive patterns leading to the feature maps desired for a specific task Mayer et al. (2018).VGG16 has been initially pre-trained on ImageNet, a large dataset with over one billion images.Taking advantage of transfer learning, this pre-trained model and the extracted features are fine-tuned for mosquito classification.The main benefits of applying transfer learning instead of training the model from scratch are its faster and easier convergence, rich representations of the features, and obtaining high accuracy, even for small-scale datasets.

MODIFIED CLASSIFICATION BLOCK
Once the feature maps are extracted from the last block of the VGG16 model, they are flattened into a linear vector fed into the classification block for predicting the type of mosquito.Generally, the VGG16 model has approximately 138 million parameters, 90% of which are reserved for its classification block, emphasizing its importance.The original pre-trained VGG16 model has two fully-connected layers (with 4096 neurons) followed by a ReLU activation function and a Softmax layer.To further mitigate the effects of data scarcity, prevent the model from overfitting, and enhance the model's generalization, a dropout layer is added after each fully-connected layer as a regularization technique.Overfitting occurs when the trained model is too complex and performs poorly when confronted with new data.To address this issue, dropout layers with the value of 0.5 in our modified classification block randomly ignore 50% of the neurons and remove their contribution on forward pass during training Baldi and Sadowski (2013).Consequently, their weights are not updated on the backward pass, and the remaining neurons provide the desired representation from the input for the final prediction.In this case, the network learns multiple independent representations, leading to better generalization and less probability of overfitting.
The final layer is another fully connected layer with Softmax activation function whose number of neurons equals the number of classes.Softmax activation function is mathematically defined as follows: where ŷi and y i are the predicted scalar probability and the corresponding target value.

GRAD-CAM-BASED EXPLAINABLE MODEL
Grad-CAM stands for Gradient-based Class Activation Maps, is a visualization algorithm that uses gradients of the predicted class (calculated by Softmax) as α i concerning the extracted feature maps from the final convolution layer to generate heatmaps depicting ROI where the model focuses on it for final prediction Selvaraju et al. (2016).To this end, as illustrated in Fig. 1, for a given resized image of x with the height and width of H and W , the weighted sum of the alpha values is calculated as follows: where A k (i, j) is the k-th activation unit in the last convolutional layer.The importance of different regions for the given class c is visualized by α c k (i, j).These regions are highlighted in the heatmap by applying the ReLU activation function as Grad − CAM = ReLU(∑ k α c k A k ).Practically, these maps help to qualitatively assess a model to know if it effectively learns the important features from the morphological characteristics of the mosquitoes or if it just makes predictions from unrelated features of the image.

DATASETS
The proposed mosquito classification model is evaluated on four publicly available datasets captured in both controlled and uncontrolled environments with variations in the background and illumination conditions (shown in Fig. 2).The details of each dataset regarding the year of publication, number of classes and images, labels of the classes, and more information about the environmental conditions are all summarized in Table 2.

Park
This dataset was developed by Park et al. Park et al. (2020) in 2020, including a total of 4290 images from vector and non-vector mosquitoes.Vector mosquitoes have 5 sub-classes of Aedes albopictus, Aedes vexans, Anopheles sinensis, Culex pipiens, and Culex tritaeniorhynchus.All the images were captured from dead adult mosquitoes in the lab environment with a plain gray background and under stable lighting conditions.In our experiments, the images of all classes are utilized without enriching the quantity of the dataset by augmentation.

IEEE
This publicly available dataset was developed by Pise et al.Pise et al. (2020) for binary classification of the mosquitoes into two general classes of Aedes and Culex mosquitoes with overall 1404 images.All the images were captured in nature from alive mosquitoes, some of whom were fed with blood just before capturing the images leading to red stomachs.The number of images was enhanced by applying rotation as augmentation.

Kaggle
Isawasan et al.Isawasan (2020) provided this dataset for classifying two types of Aedes mosquitoes namely Aedes albopictus and Aedes Anopheles.The images were captured from dead mosquitoes in a laboratory environment with plain blue background.All images of both classes are employed in our experiments for binary classification.

Goodwin
This dataset was provided by Goodwin et al. Goodwin et al. (2021) with overall 6548 images in 67 classes.This is the first mosquito dataset with the highest number of classes and many varieties of mosquito species.However, the data distribution among the classes is highly imbalanced.To reach a good balance in this dataset, we use the classes with more than 100 images leading to only 18 out of 67 classes.The dataset is balanced by sampling or applying augmentation (i.e., random 0-360 degree rotation, random brightness, hue, contrast, and saturation variations in the ranges of 20%, 10%, 20%, and 20%, respectively) so that each class consists of 800 images resulting in overall 14400 images.

TRAINING DETAILS
All the experiments are conducted on a PC with windows 10 operating system, Intel Core i7-10700F CPU @ 2.90GHz, an Nvidia GeForce RTX 2080 GPU with 8 GB memory, and the TensorFlow framework with Keras deep learning API.The training process is carried out for 100 epochs, with a batch size of 16, and using an ADAM optimizer with an initial learning rate of 5e −6 .StepLR scheduler decays this rate by a factor of 0.25 every 15 epochs.All the input images are resized to 224 × 224 pixels, and the performance is evaluated by applying the 5-fold crossvalidation strategy and computing the average value of the results.

EVALUATION METRICS
One of the critical metrics for evaluating the classification models is accuracy which is defined as Accuracy = T P+T N T P+T N+FP+FN where T P, T N, FP, and FN are respectively the number of true-positive, truenegative, false-positive, and false-negative samples obtained from the confusion matrix.In other words, accuracy is the fraction of right predictions over the total number of predictions.However, employing only accuracy is not enough for effectively evaluating the model.Consequently, three more evaluation metrics of Precision, Recall/Sensitivity, and F1-Score are utilized for assessing the performance of the model.Precision is calculated by the fraction of the true positive predictions over all the correct and incorrect positive predictions as Precision = T P T P+FP .In Recall/Sensitivity, which is also referred to as True Positive Rate, the number of true positive predictions is divided by all the positive samples as Recall/Sensitivity = T P FN+T P .F1-Score is another successful metric defined based on Precision and Recall as F1 − Score = 2×Precision×Recall Precision+Recall .

PERFORMANCE ASSESSMENT
The performance of our proposed mosquito classification model is evaluated based on four metrics of accuracy, precision, recall, and F1-score on four publicly available datasets and their combination.These values, along with the loss values for the test set, are presented in Table 3 and  Obtaining an accuracy of 98.82%, 98.92%, 94.66%, 98.40%, and 97.55% on the datasets of Park, IEEE, Kaggle, Goodwin, and their combination, demonstrates performance improvement in the ranges of 1.31-10.85%,2.37-10.67%,2.55-19.96%,1.13-17.47%,1.69-18.12%,respectively.In particular, by modifying the VGG16 architecture by adding two dropout layers, our model surpassed the Confusion matrices for the proposed model with dropout layers are depicted in Fig. 4 for the Park, IEEE, and Kaggle datasets and Fig. 5 for the Goodwin dataset to further prove its feasibility.It is worth mentioning that the misclassified samples mainly belong to different species of the same genus whose morphological characteristics bear a striking resemblance.In the Park dataset, the highest number of misclassification occurs between the Culex pipiens and two other species of Aedes vexans and Anopheles sinensis.
In addition to the quantitative evaluation, the efficiency of our model is also evaluated qualitatively through an explainable model based on the Grad-CAM algorithm.As demonstrated in Fig. 6, the heatmaps generated from the final convolutional layer highlight the importance of the mosquitoes' thorax and legs for the model to learn the discriminative features.These regions that are used by the model for mosquito recognition are highly similar to those used by entomologists in the manual examination.These heatmaps give a real insight into the network, confirming its promising performance and capabilities.

COMPARISON WITH THE STATE-OF-THE-ART
The performance of the proposed method is compared with the state-of-the-art approaches in Table 4 on four different datasets in terms of classification accuracy.To have a fair comparison, we adopt the same data splits as those of compared methods and use their reported accuracy values except for the model proposed in Adhane et al. (2021), whose accuracy is obtained on the same datasets as ours based on their public available source code.Our proposed model outperforms the other approaches with an accuracy of 98.46%, 98.89%, 98.92%, and 94.66% on four datasets of Park, Goodwin, IEEE, and Kaggle, respectively.It is worth mentioning that, although our proposed system is closely related to the architecture in Adhane et al. (2021), it obtains superior performance with a significant margin (an increment of 2.05%, 4.07%, 2.26%, 3.33%, respectively) by applying Adam optimizer instead of SGD and adequately adjusting the hyperparameters of the dropout layers and training process.Adam is an extended version of SGD optimizer, and once its hyperparameters (i.e., learning rate and weight decay) are efficiently tuned, it obtains better performance than SGD.In addition to high accuracy, our model has superiority over the other approaches concerning generalization ability.As was mentioned, most existing approaches have poor generalization as they have been evaluated only on a single dataset.For instance, the performance of Adhane et al. (2021) was only evaluated on a subset of the Mosquito Alert dataset for binary classification of tiger and non-tiger mosquitoes.In contrast, our model evaluated binary and multi-class (6 and 18 classes) recognition tasks on four different datasets attaining competitive performance.Consequently, high generalization ability, as well as high classification accuracy, make our proposed system applicable to realworld vector control programs.

ABLATION STUDY
To investigate the effectiveness of the dropout The impact of dropout value as the prominent hyper-parameter of this layer is also investigated in Table 6.Good performance is achieved when the dropout value lies from 0.4 to 0.6.Within this range, the highest accuracy is achieved for 0.5, which is the optimal value for the dropout layers in our proposed model.It is worth mentioning that most of the neurons are ignored during training for weight updating when the dropout value is increased to 0.7 and more.In consequence, the performance dramatically degrades for the large dropout values.

FAILURE CASES
Although our modified architecture obtains successful performance in mosquito recognition, there are also some misclassification cases.The heatmaps of misclassified samples from different datasets are illustrated in Fig. 7. Model attention has carefully investigated for the wrong predictions where the model has failed to focus on the discriminative regions of the mosquitoes.Deeply analyzing these samples, we draw the inferences that three main conditions pose a challenge to our system and degrade its performance: 1) cluttered background, 2) dark shadows of mosquitoes on the background due to improper illumination, and 3) damaged or occluded morphological features such as legs and thorax.

CONCLUSION
This paper tackled the problem of classifying vector mosquitoes by modifying the VGG16 model with dropout layers and taking advantage of transfer learning.The main focus of this paper was introducing an automated mosquito classification system with high accuracy and generalization ability.To this end, the performance of the proposed model has been evaluated on four publicly available datasets, and their combination proved its capabilities and feasibility.It outperformed the original VGG16 model and three other pre-trained models on all five datasets.An ablation study achieved a good trade-off between the number of dropout layers and the performance accuracy.Our proposed model surpassed the other existing approaches with an accuracy of 98.46%, 98.89%, 98.92%, and 94.66% on Park, Goodwin, IEEE, and Kaggle datasets, respectively.In addition to quantitative evaluation, the model's performance was assessed based on the Grad-CAM algorithm and visualizing the attention of the network for feature extraction.The generated heat maps confirmed that the model learned the data from the discriminative regions of the mosquitoes, which further supported the model's reliability.In future work, we plan to minimize the misclassification between different species of the same genus to improve the accuracy while pruning the model to reduce its computation cost.Providing a new complete dataset, physically capturing the images or artificially generating them based on generative adversarial networks, would greatly benefit the research community in this domain.

Fig. 1 :
Fig. 1: The main flowchart of the proposed model includes the convolutional blocks as feature extractors, fully connected and softmax layers as classifiers, and two dropout layers as the regularization techniques.The Grad-CAM component shows the model's focus on learning the discriminant features.

Fig. 2 :
Fig. 2: Sample images from four different mosquito datasets used to evaluate the performance and generalization of our proposed model.

Fig. 3 :
Fig. 3: Accuracy curve of the proposed model for test set on Park dataset compared to those of four pretrained models.

Fig. 4 :
Fig. 4: Confusion matrices on the test set for three datasets of (a) Kaggle, (b) IEEE, and (c) Park.

Fig. 5 :
Fig. 5: Confusion matrix on the test set for Goodwin dataset.

Fig. 6 :
Fig. 6: Heatmaps generated based on the Grad-CAM algorithm from the last convolutional layer of the model demonstrate the discriminant regions used to learn the features.

Fig. 7 :
Fig. 7: Heatmaps generated from the last convolutional layer of the model for the misclassified samples of four datasets (the first and second row depict the original images and the heatmaps, respectively).

Table 1 :
Redmon et al. (2015)21b)ep learning-based mosquito classification approaches, including brief details about their evaluated dataset, methodology, and the reported accuracy (ACC) (MA: Mosquito Alert).Resnet50, and SqueezeNet models.Considering three experimental schemes of employing no augmentation or fine-tuning, only fine-tuning, and both fine-tuning and augmentation, VGG16 outperformed the other models with an accuracy of 56.7%, 91.1%, and 97.2%, respectively.Kittichai et al.Kittichai et al. (2021b)developed another mosquito dataset with 15 different classes from both newborn and adult mosquitoes in two genders, male and female.They applied different YOLO-based modelsRedmon et al. (2015)for real-time classification, among which YOLO-v3 achieved the highest accuracy of 98.9% after enriching the dataset with augmentation.Adhane et al.Adhane et al. (2021) presented a comparative study based on two DL-based models, i.e., VGG16 and ResNet50, for binary classification of tiger mosquitoes on the Mosquito Alert dataset.

Table 2 :
Summarized details of four different mosquito datasets adopted to evaluate the performance and generalization of our proposed model.

Table 3 :
Performance evaluation and comparison between our proposed model with four pre-trained models on the test set of four different datasets and their combination.

Table 4 :
The performance comparison of the proposed model with the recent approaches on four different datasets.

Table 5 :
The results of ablation study on Goodwin dataset for different numbers of dropout layers and their locations (the bold values indicate two best results).

Table 6 :
The results of ablation study on Goodwin dataset for different values of dropout layer.