Computer Science

Detecting Open-angle Glaucoma Using a Two-parts Deep Learning Architecture


Glaucoma is a chronic eye disease that damages the optic nerve and leads to irreversible blindness in its terminal stage. This project proposes a feasible and accurate system to detect open-angle glaucoma, the most common form of glaucoma. Using two deep learning architectures, Single Shot MultiBox Detector (SSD) and Visual Geometry Group Neural Network (VggNet), the system is able to accurately extract the optic disc from a picture of retina and detect glaucoma patterns. The training datasets were obtained from multiple public sources, consisting of 5532 images of retinae, with 3670 images labeled with the optic disc areas, 901 normal retinae, and 761 retinae with glaucoma. The accuracy of the system was found to be 98.8% after testing with a separate dataset of 332 images of retina. In conclusion, this system with its two deep learning architectures is able to effectively diagnose open-angle glaucoma.

Keywords (see glossary): Glaucoma, Convolutional Neural Network, VggNet, SSD, Image Classification, Deep Learning

1. Introduction

Glaucoma is the second leading cause of blindness in the world. By 2020, 80 million people (3 million in the United States) are predicted to be living with glaucoma[1]. Glaucoma is a chronic eye disease that leads to elevated eye pressure, optic atrophy and visual field defects. The treatment for terminal glaucoma is limited and the resulting blindness is irreversible. However, the prevention of the progression of glaucoma is possible, given proper treatment at its early stage. The greatest difficulty in detecting glaucoma is that this disease is asymptomatic: patients are often unaware of the disease until severe visual loss has occurred. The current recognition of early glaucoma is problematic since it is time consuming and based on visual examination [2]. Detection methods are subjective and prone to errors. Hence, it is essential to develop an early-stage diagnosis system that detects glaucoma features accurately and prevents the progression of glaucoma. 

Since a large number of Digital Fundus Images (DFI) are easily accessible on public databases, analysing DFIs is especially useful for training the glaucoma screening program– an automated system that determines whether any potential glaucoma features are present in a DFI.

Fig 1. A Glaucoma DFI

Glaucoma can be diagnosed preliminarily based on the assessment of the Optic Disc (OD) through ophthalmoscopy. The OD is the area of the retina where retinal ganglion cells exit the eye to form the optic nerve, which transmits visual information from the retina to the brain. In DFIs, two distinct parts with different features can be found: the optic cup, a central bright zone; and the optic disc, a peripheral region called the neuroretina rim. Optic atrophy, as one indicator of glaucoma, is the loss of some or most nerve fibers in the optic nerve, which changes the structure of the OD. Additionally, various other indicators may be used to diagnose glaucoma: optic disc dilation, vertical cup to disc ratio[4], the ISNT rule (dis rim of thickness of inferior ≥ superior ≥ nasal ≥ tempora)[5], and the occurrence of parapapillary atrophy[6].

This image has an empty alt attribute; its file name is Eye.png

Fig 2. A processed DFI with the optic disc indicated in the yellow box.

1.1. Prior Related Work

A number of recognition methods on automatic open-angle glaucoma detection have been developed by universities and related organizations. Singh et al. proposed an innovative technique using features from segmented optic discs (with blood vessels removed) using 44 DFI and achieved an accuracy of 94.7% [8]. Chen et al. utilised a six-layer convolutional neural network(CNN)based deep learning architecture and data augmentation and achieved accuracies of 83.1% and 88.7% on two different datasets [9]. Chakrabarty et al. used a feature extraction technique which reported an accuracy of 79.2% based on datasets of 2252 DFIs [10]. Raghavendra et al. developed an 18-layer CNN based system and reported an accuracy of 98.13% with 1426 DFIs [21].

1.2. Contribution

The various approaches and systems developed in previous experiments have one similarity: in order to enhance performance, most of the methods manually pre-process images. Although the overall performance indicates that manual operation does contribute to high accuracy, the preprocessing would inevitably create errors from unprecise operation and misconduct. Additionally, such manual operation is time consuming and requires expertise, which makes it impractical to train models with very large datasets.

The work shown in this paper made two major improvements compared to previous studies. First, this system employs a deep learning architecture to detect and segment the OD from a DFI. This allows the system to bypass the need for manual operations, which consequently eliminates the errors associated with such operations. Second, the automated segmentation of the OD allows the system to be applied to relatively larger datasets for training to achieve a more stable and reliable accuracy measure.

Thus, this paper focuses on developing a two-part deep learning architecture which utilises SSD [19] and VggNet [20] architectures to detect open-angle glaucoma through digital image analysis. The first part of the system is able to recognise and separate the optic disc (OD) area from digital retina images. This step enhances the accuracy and time-efficiency of the system. The second part of the system focuses on detecting glaucoma patterns within the OD. In this part, Vgg16 is used to classify different OD images. Since the first part significantly reduces computational requirement, the training speed of the second is greatly enhanced: it becomes possible to run and analyse thousands of pictures at once. The datasets utilised in the training of this system contain over 5000 images.

This paper is divided into four sections. In Section I, the background, motivation, and the improvements of this work upon previous works are presented. In Section II, a detailed introduction to a convolutional neural network (CNN) classifier and the two convolutional architectures is given. Section III elaborates upon the datasets used in this work, the results and discussion. Conclusions are in the last section.

In summary, this paper makes two contributions:

  • It proposes a highly accurate model which consists of a modified VggNet to effectively diagnose glaucoma with an accuracy of 98.8%.
  • It utilises Single Shot MultiBox Detector (SSD) to automatically detect and separate the OD area from DFIs to achieve a higher accuracy with lower computational power.

2. Convolutional filters and networks

Deep learning is a class of machine learning algorithms that uses multiple layers consisting of multiple linear and non-linear transformations of the data to progressively extract higher-level and more abstract features from the raw input [7]. Convolutional neural networks (CNNs) are a class of deep learning architecture which are commonly applied to analyse and classify images by assigning importance (or weights and biases) to various features in the image. Eventually, CNNs are able to differentiate between these features automatically. Deep learning architectures contain multilayer neural networks, including different configurations of networks and training strategies aiming to optimize the outcome. With enough training, a CNN will be able to acquire a set of best-fit weights for each feature in order to reach optimal classification results [22]. A CNN is composed of four major parts: convolutional layers, activation functions such as Rectified Linear Unit (ReLU), pooling layers, and fully connected layers.

Fig 3. A Simplified Structure of CNN

Fig. 3 illustrates an example of CNN, which contains convolutional layers, pooling layers, ReLU (activation function), batch normalization, fully connected layers and SoftMax function. In the case of the glaucoma detection system, the input image of width 32, height 32 will hold the original pixel values based on three color channels R, G, B [32x32x3].

2.1 Convolutional Layer

The convolutional layers are the core structure of CNN, accounting for the feature extraction. The convolutional layer utilises a set of learnable filers to perform operations on the original input image and eventually produce a 2-dimensional activation map. Using a CNN allows the system to capture relevant features from an image at a relatively low computational cost as a result of weight sharing [11]. In the glaucoma detection system, the data size will become [32x32x12] after 12 convolutional filters with size of 3*3.

Fig 4. A convolutional operation utilizing a 3×3 filter with learnable weights producing a 2-dimensional activation map [11].

2.2 ReLU

ReLU (Rectified Linear Unit) is one type of activation function, a function to calculate the weighted sum of the input values and decide whether the weighted sum should be kept. ReLU is less computationally expensive than other activation functions because it involves simpler calculations. ReLU computes the function f(x)=max (0, x). This leaves the data unchanged ([32x32x12]).

Fig 5. Rectified Linear Unit (ReLU) activation function, which outputs zero when x< 0 and linear with slope = 1 as x > 0 [11].

2.3 Pooling Layer

Pooling layers in a CNN architecture are used for abstracting image features [11]. The most common form of a pooling layer employs a 2×2 size filter with a stride of 2 selecting 25% of the activations on an activation map from convolutional layers. Every max-pooling operation would select the largest number from a 2×2 region. The data size will thus become [16x16x12]. Application of the pooling layer reduces the size of feature parameters and computation in the network and controls overfitting, improving the overall performance of the network.

Fig 6. A max pooling operation extracting the max number from each 2×2 region [11].

2.4 Batch normalization

Batch normalization is a transform function that converts the input of each convolutional layer into a number between 0 and 1 based on the output of previous layers. The implementation of batch normalization effectively increases the time cost of training neural networks. A batch normalization process is shown below [12].

Input: Values of x over a mini batch:


// mini-batch mean

//mini-batch variance


//scale and shift

( refers to sample size, is a constant

added to the mini-batch variance for numerical stability, is each element in )

2.5 Fully connected layer

A fully connected layer works as means of mapping features with full connections to all activations from the previous layer [11]. Those activations will be computed with a weighted multiplication and biased addition in the next hidden layer and eventually get to the output layer. The usage of fully connected layers helps to select representative features for further classification.

Fig 7. Two fully connected layers.

2.6 SoftMax

A SoftMax layer works as an activation function that turns the output of a fully connected layer into probabilities that sum to one. It outputs a vector that indicates the potential output classes together with the probabilities associated with each [12].

3. System Structure

One unique difference of the proposed diagnostic system in this paper is that it utilises a two-parts deep learning architecture, aiming to avoid the complex and error-prone manual pre-processing of the input DFIs. The system uses SSD and VggNet CNN architectures to detect open-angle glaucoma through digital image analysis. Data augmentation was applied to enhance the capacity of datasets during training. The system proposed in this paper is constructed using TensorFlow and Python.

Fig 8. A flowchart displaying the structure of the detection system

3.1 Visual Geometry Group Neural Network (VggNet)

VggNet is an CNN architecture emphasizing the aspect of depth to its performance. Using very small (3×3) convolution filters in all layers, it steadily increases the depth of the network by constructing 16-19 convolutional layers. The increase in depth significantly increases the accuracy of the network. Using multiple very small (3×3) convolutional layers instead of one single larger convolutional layer(7×7) not only makes the decision function more discriminative, but also reduces the number of parameters by 55%.


Fig 9. A basic VggNet structure with convolutional layers, ReLU activation function, max-pooling layers, fully connected layers and SoftMax function [20]

3.2 Single Shot Multibox Detector(SSD)

Single Shot MultiBox Detector (SSD) is a popular object detection architecture with high accuracy and low time cost. Compared to previous object detection architectures, SSD includes improvements allowing for predictions of detections from multiple layers (multi-scale feature maps). SSD uses lower-resolution layers to detect objects independently from a CNN as it gradually reduces the spatial dimension. It produces a fixed set of detection predictions using a set of convolutional filters.

Fig 10. a basic SSD structure [19]

3.3 Data Augmentation

Data augmentation, which helps to improve the performance of the deep learning model, was utilised in this system to enlarge the datasets. During this process, original images were transformed to new images with different sizes, viewpoints, brightness, orientations and translations.

Fig 11. Augmented data based on one image[11]

4. Experiment and Discussion

4.1 Sources of DFI Datasets

The testing and training DFIs were obtained from multiple sources. The RIGA database [13], which contains 3220 DFIs labeled by six expert ophthalmologists; and the OGIGA database [14], which consists of 650 DFIs annotated by trained professionals, were used to train the SSD. Four other public databases were used to train and test VggNet: the ACRIMA database [15], which is composed of 705 images; the RIM-ONE database [16]; which contains 455 images; the sjchoi86-HRF database [17] which consists of 401 images and Drishti-GS database [18] which is composed of 101 images. The composition of all the data is summarised in Table 1.

Table 1. List of public glaucoma databases




































Italic represents the database used for SSD.

4.2 Training Strategy to Avoid Overfit

Overfit is one of the most common problems during model training. An overfitting deep learning model lacks the ability to give sensible outputs to sets of input that it has never seen before. In other words, overfit occurs when a model has learned noise and/or less representative features from the training datasets rather than representative features, in such a way that it may fail to fit new datasets.

In this paper, multiple techniques had been deployed to reduce overfit:

  • Early Stopping: The model will be monitored by a function that stops training when the accuracy of the model does not increase for 40 continuous epochs (for each epoch, the model runs through all the training data). This process will prevent overfit by stopping the model’s iteration before its iterative convergence to the training data.
  • Data Augmentation: The model’s ability to generalise is largely dependent on the size and versatility of the datasets. By adopting data augmentation to the training datasets, the model will have a reduced chance of overfitting.
  • Regularisation: The model’s complexity increases with more parameters. A complex model overfits more commonly. Regularisation reduces the chance of overfit by adding new information which simplifies the model.
  • Architecture Selection: The model’s learning capacity and complexity determines whether the model generalizes well. In this paper, architectures were selected based on their number of parameters (fewer than 3e7) to reduce overfit.

4.3 Finding the Optimal Pre-trained Model

In this paper, the Vgg16, Vgg19, ResNet50V2, DenseNet121, Xception, InceptionResnetV2, and Inception architectures were applied to the glaucoma diagnosis task using their pre-trained ImageNet weights available in the Keras core. They were trained on 1.28 million images from the ImageNet database and acquired outstanding accuracy. These pre-trained models were extremely helpful for the system to extract features from DFIs. These architectures were modified for this case such that their last fully connected layers were changed to a global average pooling layer (GlobalAveragePooling2D) followed by a Dense layer with ReLU activation, a dropout layer of rate 0.5, and a fully connected layer of two nodes representing two classes (glaucoma and healthy) and a sigmoid classifier. These modified pre-trained models were trained on 1329 training and 332 testing DFIs with learning rates (lr) of 1e-4 and 1e-5. They were trained for 150 epochs each on a device using GPU RTX-2080Ti 11GB in Ubuntu-16.11. Table 2 presents the validation accuracy of each model on both learning rates. Figures 12-16 show how the models’ accuracies changed during training.

Table 2. Performance of Architectures on Learning Rates 1e-4 and 1e-5.

























Fig 12. Model Accuracy of Vgg16 and Vgg19 (lr=1e-4)

Fig 13. Model Accuracy of ResNet50 and DenseNet 121 (lr=1e-4)

Fig 14. Model Accuracy of Xception, Inception, and InceptionResnet (lr=1e-4)

Fig 15. Model Accuracy of Vgg16 and Vgg19 (lr=1e-5)

Fig 16. Model Accuracy of ResNet50 and DenseNet 121(lr=1e-4)

Fig 17. Model Accuracy of Xception, Inception, and InceptionResnet (lr=1e-5)

4.4 Finding the Optimal Optimiser

The error in predictions made by the model after each iteration is measured using a loss function. During the training process, parameters of the model are tweaked to minimise the loss function for optimal performance. Optimisers play the role of updating the model according to the output of the loss function. In this paper, the five most popular optimisers were deployed on the same VGG16 model on 1329 training and 332 testing DFIs with learning rate of 1e-5. Results showed that the Adam optimiser outperformed other optimisers. Details are shown in Table 3. Figure 18 presents the model’s performance with different optimisers over 150 epochs.

Table 3. Performance of Optimizers on Vgg16


Learning rate

















Fig 18. Validation accuracy of the model, using different optimisers.

4.5 Hyperparameter Tuning

To achieve the optimal performance of the model, multiple learning rates have been tested. Learning rate is a crucial hyperparameter in CNN model training that controls the amount of change to improve the model after each update according to the estimated error. A very large learning rate will lead to a quick convergence to a suboptimal solution, whereas a very small learning rate can significantly slow down the training process. The system is trained with a learning rate of 1e-3, 1e-4, 1e-5, and 1e-6. The highest performance is obtained with a learning rate of 1e-5. The system achieved an accuracy of 98.8% as shown in Table 2. In order to generalize the experiments and result, the training was repeated 150 times (150 epochs) until the accuracy of the system was stable. The detailed results with different learning rates of the detecting system are presented in Table 2 and in figures below.

Table 4. Performance of the proposed system

Learning rate










Fig 19. Validation accuracy of the model, using different learning rates.

4.6 Limitations

Due to the nature of the two-part architecture, the system requires two types of training datasets: a special type of DFI dataset labeled with optic disc area for SSD training and a more common type of DFI dataset indicated whether it is glaucoma or normal for VggNet training. As a matter of fact, the amount of this special type of datasets with optic disc labels is limited. Thus, this limitation prevents the model from developing to its full potential. Additionally, it is not clear whether the training images are representative of all the variations of glaucoma. Abnormal exception cases might not be represented in the dataset.

5. Conclusion

Glaucoma is the second leading cause of blindness that affects a large population. One of the difficulties related to the diagnosis of open-angle glaucoma is its early detection. Nowadays, diagnosis is based on ophthalmologists’ visual observation, which is not necessarily reliable and accurate. This paper develops a highly accurate and time-effective open-angle glaucoma diagnosis system using deep learning architectures. The system obtained an accuracy of 98.8%.

This work proposes an innovative, highly accurate system structure which combines SSD and VggNet separately for optic disc image segmentation and glaucoma detection. The main advantage of this configuration is that it preprocesses the DFIs by segmenting out the optic discs, which significantly decreases the computational power necessary for the second part of the system and boosts its accuracy. Data augmentation was also used to enlarge the datasets for optimal results. In future work, a third part is planned to implement classification of early glaucoma, terminal glaucoma and normal eyes. Moreover, an app interface will be developed for users without expertise to detect glaucoma in its early stage, and serve as a screening tool for people with limited access to ophthalmologists. Instead of sets of professional equipment, this system requires only a smartphone and a special 3D-printable lens. If people test positive for glaucoma using this app, they can take further action to prevent any progression and the chance of becoming blind.


[1] Quigley H A 2006 The number of people with glaucoma worldwide in 2010 and 2020 British Journal of Ophthalmology 90 262–7

[2] Khouri A S and Fechtner R D 2015 Primary Open-Angle Glaucoma Glaucoma 333–45

[3] Oddone F, Virgili G, Parravano M, Brazzelli M, Novielli N and Michelessi M 2010 Optic nerve head and fibre layer imaging for diagnosing glaucoma Cochrane Database of Systematic Reviews

[4] O.d. M D H 1999 Optic disc size, an important consideration in the glaucoma evaluation Clinical Eye and Vision Care 11 59–62

[5] Harizman N 2006 The ISNT Rule and Differentiation of Normal From Glaucomatous Eyes Archives of Ophthalmology 124 1579

[6] Jonas J B 1992 Glaucomatous Parapapillary Atrophy Archives of Ophthalmology 110 214

[7] Bengio Y, Courville A and Vincent P 2013 Representation Learning: A Review and New Perspectives IEEE Transactions on Pattern Analysis and Machine Intelligence 35 1798–828

[8] Singh A, Dutta M K, Parthasarathi M, Uher V and Burget R 2016 Image processing based automatic diagnosis of glaucoma using wavelet features of segmented optic disc from fundus image Computer Methods and Programs in Biomedicine 124 108–20

[9] Chen X, Xu Y, Wong D W K, Wong T Y and Liu J 2015 Glaucoma detection based on deep convolutional neural network 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

[10] Chakrabarty L, Joshi G D, Chakravarty A, Raman G V, Krishnadas S and Sivaswamy J 2016 Automated Detection of Glaucoma From Topographic Features of the Optic Nerve Head in Color Fundus Photographs Journal of Glaucoma 25 590–7

[11] Anon CS231n Convolutional Neural Networks for Visual Recognition

[12] Sergey 2015 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

[13] Almazroa Retinal fundus images for glaucoma analysis: the RIGA dataset Deep Blue Data

[14] Anon

[15] Diaz-Pinto A, Morales S, Naranjo V, Köhler T, Mossi J M and Navea A 2019 CNNs for automatic glaucoma assessment using fundus images: an extensive validation BioMedical Engineering OnLine 18

[16] Anon RIM-ONE: An open retinal image database for optic nerve evaluation – IEEE Conference Publication

[17] Anon cvblab – Overview GitHub

[18] Anon Drishti-GS: Retinal image dataset for optic nerve head(ONH) segmentation – IEEE Conference Publication

[19] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y and Berg A C 2016 SSD: Single Shot MultiBox Detector Computer Vision – ECCV 2016 Lecture Notes in Computer Science 21–37

[20] Simonyan, Karen, Zisserman and Andrew 2015 Very Deep Convolutional Networks for Large-Scale Image Recognition

[21]Raghavendra, U, Fujita, H, Bhandary, SV, Gudigar, A, Tan, JH & Acharya, UR 2018, ‘Deep convolutional neural network for accurate diagnosis of glaucoma using digital fundus images’,

[22]Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell. 2017;39(6):1137-1149.

Glossary of key terms

Activation function

Batch normalization

A function to calculate the weighted sum of the input values and decide whether it should be kept in the model.

Batch normalization is a transform function that converts the input of each convolutional layer into a number between 0 and 1 based on the output of previous layers.

Convolutional layer

Convolutional neural networks

A layer utilises a set of learnable filers to perform operations on the original input image

A convolutional neural network is a class of deep neural networks, most commonly applied to analyzing visual imagery.


Digital Fundus Image, image of fundus produced by medical equipment and stored in public databases for studies.

Deep learning

A subset of machine learning based on artificial neural networks with representation learning.

Deep learning architectures

Fully connected layer

Deep learning architectures contain multilayer neural networks, including different configurations of networks

A fully connected layer works as means of mapping features with full connections to all activations from the previous layer.


The interior surface of the eye opposite the lens including the retina, optic disc, macula, fovea, and posterior pole.


A group of eye diseases that damages optic nerves and leads to vision loss.


Image Classification

A parameter whose value is used to control the learning process. By contrast, the values of other parameters are derived via training.

The image classification accepts the given input images and produces output classification for identifying whether the disease is present or not.

Loss function

A function to calculate the error of prediction of the model.

Machine Learning

The study of computer algorithms that improve automatically through experience.


A deep learning model contains weights and bias data for each layer in the structure of the neural networks.

Optic disc (OD)

An area in the retina where retinal ganglion cells exit the eye to form the optic nerve.

Optic nerve

Nerves to transfer visual information from the retina to the vision centers of the brain.


An algorithm that updates the model weights based on the error estimated by loss function.


Pooling layers

A common problems in deep learning model training; the production of an analysis that corresponds too closely or exactly to a particular set of data

Pooling layers in a CNN architecture are used for abstracting image features



A layer at the back of the eyeball containing cells that are sensitive to light

ReLU (Rectified Linear Unit) is one type of activation function, a function to calculate the weighted sum of the input values and decide whether the weighted sum should be kept


A deep learning architecture for segmenting optic disc from retina image


A deep learning architecture for detecting glaucoma features from images of optic disc



About the Author

This image has an empty alt attribute; its file name is Tony-Dangs-Profile-Picture-768x1024.jpg

Tony Dang is a high school junior in Iolani School in Honolulu, Hawaii. Inspired by his grandma, he started his research on glaucoma and eye diseases. He found out the flaws in the current diagnosis of glaucoma and applied machine learning to develop a more effective method on detecting glaucoma in its early stage. He was awarded in state science fair with his project Detecting Open-angle Glaucoma Using a Two-parts Deep Learning Architecture.

Leave a Reply

Your email address will not be published. Required fields are marked *