One of the most deadly diseases in humans is skin cancer. Research shows that early detection of these cancerous tumors significantly increases chances of survival.Therefore, physicians take more time to investigate these types of lesions.] The purpose of this paper is to propose a system that automatically classifies skin cancer with a higher classification rate than other models, using convolutional neural networks and transfer learning models. The models tested were Xception, Inception V3, and ResNet 50.These models were chosen because they had the highest “top-5 accuracy” and smallest size among other pretrained models. In this work, the models trained for 25 epochs each in order to study its impact on classification performance. Epochs are the number of times a training dataset is iterated through. The number of epochs is chosen by a trial and error process. It was found that running the model for a small number of epochs–25– gave acceptable results. In comparison, when the models were run for 50 epochs, it achieved an accuracy of 74%, 79%, and 34% for the pretrained models respectively, and an 71.34% accuracy for the proposed Convolutional Neural Network (CNN). It was interpreted that transfer learning provides reliable results in the case of small dataset. The proposed pre-trained models achieve 81%, 83%, and 54% classification accuracy respectively on the Skin Cancer Dataset downloaded from Kaggle. The proposed CNN achieved 78.48% classification accuracy.
Machine Learning, Artificial Intelligence, Convolutional Neural Networks, Transfer Learning, Skin Cancer Detection.
Due to the similarities in appearance between malignant and benign skin lesions, physicians take a lot more time investigating them]. Incidences of skin cancer have increased significantly in recent years because of melanoma lesions]. Diagnosing skin cancer by only trying to inspect lesions is hard as benign and malignant skin lesions look very similar. About 75% of deaths related to skin cancer come from malignant lesions.] The survival rate of patients could be increased if skin cancer was recognised accurately in its early stages. Therefore, the automated classification of skin lesions has the potential to save effort, time and human lives. In this project, a system that can assist doctors in the automatic diagnosis of skin cancer is proposed. Experiments with different machine learning techniques led to choosing the model with the best performance.
The method used to classify the coloured images for skin lesions, using four different models, is described.
A) THE AUGMENTATION PROCESS
An increased number of training images allows for the model to have a higher accuracy. The augmentation process, which includes rescaling, shearing, zooming, rotating, flipping, and validation splitting of images is done to increase the number of the images in a dataset so the model can undergo more training. The dataset used in this study was from Kaggle. It had only 66 images for training, which made it easier to download and work on. Since the number of downloaded images was small, there was a need to increase the number of training images by using different augmentation processes. A rescale of 1/255, which rescales an image by that ratio, validation split of 0.2, shear and zoom range, both of 0.2, horizontal flip, and rotation range of 0.1 were applied. The numbers selected are random; any numbers could have been selected for the augmentation process as the only effect of using different numbers would be a change in the image size. In the end, there were 660 images in total.
B) CONVOLUTIONAL NEURAL NETWORKS (CNNs)
A CNN is a class of deep neural networks which is most commonly used to analyse or classify images. The architecture of CNNs consists of convolutional layers, pooling layers, and fully connected layers. Convolutional layers extract low and high level information after filtering images. These convolutional filters/kernels are optimised during the training of the model, and their sizes are specified by the programmer. Pooling layers are inserted between the convolutional layers. These layers prevent the output from getting too large by reducing the dimensions of data. Fully connected layers do the actual classification task by connecting one layer to another. The data is passed through this layer, which classifies the images.
A CNN was used in this experiment to solve the image classification problem efficiently, as CNNs can approximate any complicated function. The CNN used in this experiment had 3 convolutional layers, 3 pooling layers, and 2 fully connected layers. The model was compiled with a ‘binary crossentropy’ loss function, which is the function for the price paid for inaccurate predictions and an ‘Adam’ optimiser, which is an algorithm used to change hyperparameters of the model to reduce losses.
C) TRANSFER LEARNING
Transfer learning uses the gained knowledge that solves one problem and applies it to solve different related problems by using pre-trained models. Specifically, a large dataset from a source task is employed for training of a target task using the weights trained by the images from source dataset.The pre-trained architectures used in this project; Xception, Inception V3, and ResNet-50, consist of 1000 classes and 1.28 million training images (images used to train the model), tested on 100,000 test images (images used to test the model) and evaluated on 50,000 validation images (images used to provide unbiased evaluation of the model). The architectures of Xception, Inception V3, and ResNet-50 consist of 71, 48, and 50 convolutional layers, respectively, , .
These three models were chosen as they yielded the best accuracy, had a compact size, and a simple architecture, however, alternatives that could have been considered are VGG 16, VGG 19, and ResNet50V2, due to their small size and top accuracy. For this project, the priority was choosing the smallest model (model with the fewest number of neurons) that worked well for the data. This is done to prevent high memory usage while getting a fair accuracy. Complex models were not chosen due to the risk of overfitting (picking up of noise or random fluctuations by the data).
An advantage of transfer learning is the improvement of classiﬁer accuracy and the acceleration of the learning process while using less training data. In transfer learning, the model has already been trained to recognise some features. The first few layers are kept the same, and the deeper layers of the models are fine tuned by adding new layers. Then, the model is trained with new training data. In this research, we focused on the classification of skin cancer using Kaggle dataset into two types of lesions. The last classification layer is replaced with a sigmoid layer, which is an activation function that decides which values to pass as output. The optimiser, loss function, weights, and learning rate were set to adapt to the classification task through the process of trial and error and fine-tuning for experiments.
The dataset used from Kaggle was of Red Green Blue (RGB) colored skin images (see Figure 1). The RGB images are significant for this experiment as it helps the model to better understand and classify the images. This dataset contained test and train data, which consisted of images of benign and malignant lesions. The implementation for this classification was run on Google Collab over Graphics Processing Unit (GPU). Using GPU enables the use of a huge number of training images with low model error rates as they can process multiple computations simultaneously. In this experiment, the classification layer called sigmoid is replaced with a new sigmoid layer to be appropriate for skin lesions. In the sigmoid layer, the algorithm returns a value between 0 and 1, which can be used to categorise the skin lesion as malignant or benign appropriately.
Figure 1: Example of benign skin cancer image. The fact that the “redness” is concentrated on a smaller area allows for the correct diagnosis of “benign”, which increases classification accuracy.
The image size was (224, 224), in RGB. There is no such significance of the image size, but it allows the model to process the images more efficiently. Image augmentation, as mentioned above in the proposed method section, was performed, which is increasing the number of images in a dataset by rescaling, validation splitting, shearing, zooming, rotating, and flipping horizontally. The input shape of all images was (224, 224, 3). All experiments were performed with a batch size of 32, which means that 32 training images were passed in for iteration at one time. Batch size helps control the accuracy of a model. The CNN model was trained with 15 epochs, while the three pre-trained CNN models were each trained with 25 epochs. The data was divided into training and test datasets, where 80% of the images are used for training and 20% are used for testing, which is the most commonly used ratio. Typically, a higher percentage of images are used for training and a lower percentage for testing.
Choosing training parameters is on a trial and error basis. These parameters were randomly selected and then were frequently tuned until the desired accuracy was achieved. Optimal parameters have been reached as the accuracy goes down if the value of the parameters increases or decreases.
The training parameters for the CNN were:
- Batch size: 32
- Epochs: 15
- Learning Rate: 0.001
The training parameters for all of the transfer learning models were:
- Batch Size: 32
- Epochs: 25
- Learning Rate: 0.001
The training parameters for the transfer learning models were the same because even though their architectures are slightly different, there was a custom classifier layer added to each of the transfer learning models that had the same architecture.
Trained networks were used to classify the test images and calculate the overall classification accuracy, precision, and F-1 score.
The formulae below are how the models calculate their accuracies, precisions, and F-1 scores. They are also a good way of cross checking if the model has computed the correct values for each
Accuracy is the ability to predict correctly and guess the value of predicted attributes for new data.
Accuracy = [(TP+TN)/(TP+TN+FP+FN)] *100
Precision is the positive predictive value for data.
Precision = [TP/(FP+TP)]*100
F-1 score is the measure of the test’s accuracy.
F-1 Score = 2(Recall*Precision)/(Recall + Precision).
TP means True Positives (correctly identified positives: if the image is malignant and the model predicts malignant), TN means True Negatives (correctly identified negatives: if the image is benign and the model predicts benign), FP means False Positives (incorrectly identified positives: if the image is benign and the model predicts malignant), and FN means False Negatives (incorrectly identified negatives: if the image is malignant and the model predicts benign).
Evaluation of the classification performance was done using the three pre-trained architectures and CNN model. The results were summarised in the form of confusion matrices (see Figures 2, 3, 4, and 5), which are visual summaries of the models’ performances on a classification problem.
The information in the confusion matrices, which are the number and a direct comparison of TP’s, TN’s, FP’s, and FN’s, allows the calculation of measures that help determine how useful the model is. An individual model’s matrix compares the values predicted by a model to the actual values. They are an efficient way to summarise a classification algorithm’s performance as sometimes the classification accuracy can be deceptive.
Accuracy: 78.48% Precision: 79% F-1 Score: 78%
Figure 2: CNN model confusion matrix shows the number of correct (true positives and negatives) and incorrect (false negatives and positives) prediction
Accuracy: 81% Precision: 82% F-1 Score: 81%
Figure 3: Xception model confusion matrix shows the number of correct (true positives and negatives) and incorrect (false negatives and positives) predictions
Accuracy: 83% Precision: 84% F-1 Score: 83%
Figure 4: Inception V3 model confusion matrix shows the number of correct (true positives and negatives) and incorrect (false negatives and positives) predictions
Accuracy: 54% Precision: 30% F-1 Score: 39%
Figure 5: ResNet 50 model confusion matrix shows the number of correct (true positives and negatives) and incorrect (false negatives and positives) predictions
Through the figures above (see Figures 3, 4, and 5), it was seen that overall, the Inception V3 model classified images of skin lesions the most accurately. This means that if the experiment were to be implemented, the model used would be the Inception V3 model. Doctors can use either of these models for efficiency in diagnosing skin cancer. These models will also help in saving effort, time, and human lives.
The Kaggle dataset used in this experiment did not have any pictures of people with dark skin, which is a big limitation and source of error. Bias is a current problem in machine learning for medical diagnosis as models are trained on images of people with white skin. When these models are used by doctors, they give more accurate results for patients with white skin than people of colour. Training models with images of people with darker skin will help in efficient diagnosis of patients with dark skin, which is not as efficient with the current model. A future experiment could be trained with pictures of people with dark skin as it will reduce bias. Data collection of images of people of colour could be done in hospitals, and these images would also be used to train the model. This system (the CNN model or the Inception V3 model) could be further applied to different medical purposes, such as classifying different types of cancers, diagnosing diseases, predicting the location of cancerous cells, etc.
This paper presents a fully automatic system for the two kinds of skin cancer classification, malignant and benign, using a Kaggle dataset. The proposed system applied the concepts of deep transfer learning using three pre-trained architectures, and a CNN for skin lesion images. The classification accuracy of these models is more with a higher number of training samples and a small number of epochs. The architectures which have fewer layers perform better than the deeper architectures. The Xception, Inception V3, and ResNet50 models yielded an accuracy of 81% (see Figure 3), 83% (see Figure 4), and 54% (see Figure 5), respectively. The CNN yielded an accuracy of 78.48% (see Figure 2). Therefore, it is concluded that Inception V3 yielded the highest results overall.
This paper and the research behind it would not have been possible without the support of my mentors, Ivoline Ngong and Purvi Goel. Their enthusiasm, knowledge and attention to detail have been an inspiration and kept my work on track from my first experience with AI and Machine Learning to my final project and draft of this paper. I am very grateful for the insightful comments offered by them and I really appreciate their immediate response to my doubts. Their generosity made me improve this study in many ways and it saved me from making errors, and this could not have been possible without them.
- Jerant, Anthony F., Jennifer T. Johnson, Catherine Demastes Sheridan, and Timothy J. Caffrey. “Early Detection and Treatment of Skin Cancer.” American Family Physician. July 15, 2000. Accessed April 14, 2021. URL: https://www.aafp.org/afp/2000/0715/p357.html.
- Khalid M. Hosny, Mohamed A. Kassem, Mohamed M. Foaud. Classification of skin lesions using transfer learning and augmentation with Alex-net; Jie Zhang 2019. May 21, 2019. URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0217293#abstract0
- DAGNetwork.” Xception Convolutional Neural Network – MATLAB. Accessed May 01, 2021. URL: https://www.mathworks.com/help/deeplearning/ref/xception.html?s_tid=doc_ta.
- DAGNetwork.” Inception-v3 Convolutional Neural Network – MATLAB. Accessed May 01, 2021. URL: https://www.mathworks.com/help/deeplearning/ref/inceptionv3.html?s_tid=doc_ta.
- DAGNetwork.” ResNet-50 Convolutional Neural Network – MATLAB. Accessed May 01, 2021. URL: https://www.mathworks.com/help/deeplearning/ref/resnet50.html?s_tid=doc_ta
- Team, Keras. “Keras Documentation: Keras Applications.” Keras. Accessed April 12, 2021. URL: https://keras.io/api/applications/
- Dsouza, Jason. “What Is a GPU and Do You Need One in Deep Learning?” Medium. December 26, 2020. Accessed May 01, 2021. URL: https://towardsdatascience.com/what-is-a-gpu-and-do-you-need-one-in-deep-learning-718b9597aa0d#:~:text=Why choose GPUs for Deep,computation of multiple parallel processes.
- Fanconi, Claudio. “Skin Cancer: Malignant vs. Benign.” Kaggle. June 19, 2019. Accessed April 27, 2021. URL: https://www.kaggle.com/fanconic/skin-cancer-malignant-vs-benign/..
- Tokuç, A. Aylin. “Splitting a Dataset into Train and Test Sets | Baeldung on Computer Science.” Baeldung on Computer Science, Baeldung, 14 Jan. 2021, URL: https://www.baeldung.com/cs/train-test-datasets-ratio#:~:text=If%20we%20search%20the%20Internet,even%20a%2050%3A50%20split.
About the Author
The author is Arya Gijare, a sophomore in Parkway Central High School, from St.Louis, MO. Arya is interested in computer science and wants to broaden her interest in the subject by exploring its different branches. She wants to explore AI professionally in the future and major in computer science. In her free time, she likes to play the piano and code in Java.