21 research outputs found
A new technique for cataract eye disease diagnosis in deep learning
Automated diagnosis of eye diseases using fundus images is challenging because manual analysis is time-consuming, prone to errors, and complicated. Thus, computer-aided tools for automatically detecting various ocular disorders from fundus images are needed. Deep learning algorithms enable improved image classification, making automated targeted ocular disease detection feasible. This study employed state-of-the-art deep learning image classifiers, such as VGG-19, to categorize the highly imbalanced ODIR-5K (Ocular Disease Intelligent Recognition) dataset of 5000 fundus images across eight disease classes, including cataract, glaucoma, diabetic retinopathy, and age-related macular degeneration. To address this imbalance, the multiclass problem is converted into binary classification tasks with equal samples in each category. The dataset was preprocessed and augmented to generate balanced datasets. The binary classifiers were trained on flat data using the VGG-19 (Visual Geometry Group) model. This approach achieved an accuracy of 95% for distinguishing normal versus cataract cases in only 15 epochs, outperforming the previous methods. Precision and recall were high for both classes â Normal and Cataract, with F1 scores of 0.95-0.96. Balancing the dataset and using deep VGG-19 classifiers significantly improved automated eye disease diagnosis accuracy from fundus images. With further research, this approach could lead to deploying AI (Artificial intelligence)-assisted tools for ophthalmologists to screen patients and support clinical decision-making
SystÚme d'apprentissage multitùche dédié à la segmentation des lésions sombres et claires de la rétine dans les images de fond d'oeil
Le travail de recherche menĂ© dans le cadre de cette maĂźtrise porte sur lâexploitation de lâimagerie de la rĂ©tine Ă des fins de diagnostic automatique. Il se concentre sur lâimage de fond dâoeil, qui donne accĂšs Ă une reprĂ©sentation en deux dimensions et en couleur de la surface de la rĂ©tine. Ces images peuvent prĂ©senter des symptĂŽmes de maladie, sous forme de lĂ©sions ou de dĂ©formations des structures anatomiques de la rĂ©tine. Lâobjet de cette maĂźtrise est de
proposer une mĂ©thodologie de segmentation simultanĂ©e de ces lĂ©sions dans lâimage de fond dâoeil, regroupĂ©es en deux catĂ©gories : claires ou sombres. RĂ©aliser cette double segmentation de façon simultanĂ©e est inĂ©dit : la vaste majoritĂ© des travaux prĂ©cĂ©dents se concentrant sur un seul type de lĂ©sions. Or, du fait des contraintes de temps et de la difficultĂ© que cela reprĂ©sente dans un environnement clinique, il est impossible pour un clinicien de tester la
multitude dâalgorithmes existants. Dâautant plus que lorsquâun patient se prĂ©sente pour un examen, le clinicien nâa aucune connaissance a priori sur le type de pathologie et par consĂ©quent sur le type dâalgorithme Ă utiliser. Pour envisager une utilisation clinique, il est donc important de rĂ©flĂ©chir Ă une solution polyvalente, rapide et aisĂ©ment dĂ©ployable. ParallĂšlement, lâapprentissage profond a dĂ©montrĂ© sa capacitĂ© Ă sâadapter Ă de nombreux problĂšmes de visions par ordinateur et Ă gĂ©nĂ©raliser ses performances sur des donnĂ©es variĂ©es malgrĂ© des ensembles dâentraĂźnement parfois restreints. Pour cela, de nouvelles stratĂ©gies sont
rĂ©guliĂšrement proposĂ©es, ambitionnant dâextraire toujours mieux les informations issues de la base dâentraĂźnement.
En consĂ©quence, nous nous sommes fixĂ©s pour objectif de dĂ©velopper une architecture de rĂ©seaux de neurones capable de rechercher toutes les lĂ©sions dans une image de fond dâoeil. Pour rĂ©pondre Ă cet objectif, notre mĂ©thodologie sâappuie sur une nouvelle architecture de rĂ©seaux de neurones convolutifs reposant sur une structure multitĂąche entraĂźnĂ©e selon une approche hybride faisant appel Ă de lâapprentissage supervisĂ© et faiblement supervisĂ©. Lâarchitecture se compose dâun encodeur partagĂ© par deux dĂ©codeurs spĂ©cialisĂ©s chacun dans un type de lĂ©sions. Ainsi, les mĂȘmes caractĂ©ristiques sont extraites par lâencodeur pour les deux dĂ©codeurs. Dans un premier temps, le rĂ©seau est entraĂźnĂ© avec des rĂ©gions dâimages et la vĂ©ritĂ© terrain correspondante indiquant les lĂ©sions (apprentissage supervisĂ©). Dans un second temps, seul lâencodeur est rĂ©-entraĂźnĂ© avec des images complĂštes avec une vĂ©ritĂ© terrain composĂ© dâun simple scalaire indiquant si lâimage prĂ©sente des pathologies ou non, sans prĂ©ciser leur position et leur type (apprentissage faiblement supervisĂ©).----------ABSTRACT: This work focuses on automatic diagnosis on fundus images, which are a bidimensional representation of the inner structure of the eye. The aim of this masterâs thesis is to discuss a solution for an automatic segmentation of the lesions that can be observed in the retina. The
proposed methodology regroups those lesions in two categories: red and bright. Obtaining a simultaneous double segmentation is a novel approach; most of the previous works focus on the detection of a single type of lesions. However, due to time constraints and the tedeous nature of this work, clinicians usually can not test all the existing methods. Moreover, from a screening perspective, the clinician has no clue a priori on the nature of the pathology he deals with and thus on which algorithm to start with. Therefore, the proposed algorithm requires to be versatile, fast and easily deployable. Conforted by the recent progresses obtained with machine learning methods (and especially deep learning), we decide to develop a novel convolutional neural network able to segment both types of lesions on fundus images. To reach this goal, our methodology relies on a new multitask architecture, trained on a hybrid method combining weak and normal supervised training. The architecture relies on hard parameter sharing: two decoders (one per type of lesion)
share a single encoder. Therefore, the encoder is trained on deriving an abstrast representation of the input image. Those extracted features permit a discrimination between both bright and red lesions. In other words, the encoder is trained on detecting pathological tissues from normal ones. The training is done in two steps. During the first one, the whole architecture is trained with patches, with a groundtruth at a pixel level, which is the typical way of training a segmentation network. The second step consists in weak supervision. Only the encoder is trained with full images and its task is to predict the status of the given
image (pathological or healthy), without specifying anything concerning the potential lesions in it (neither location nor type). In this case, the groundtruth is a simple boolean number. This second step allows the network to see a larger number of images: indeed, this type of groundtruth is considerably easier to acquire and already available in large public databases. This step relies on the hypothesis that it is possible to use an annotation at an image level
(globally) to enhance the performance at a pixel level (locally). This is an intuitive idea, as the pathological status is directly correlated with the presence of lesions
Deep learning for diabetic retinopathy detection and classification based on fundus images: A review.
Diabetic Retinopathy is a retina disease caused by diabetes mellitus and it is the leading cause of blindness globally. Early detection and treatment are necessary in order to delay or avoid vision deterioration and vision loss. To that end, many artificial-intelligence-powered methods have been proposed by the research community for the detection and classification of diabetic retinopathy on fundus retina images. This review article provides a thorough analysis of the use of deep learning methods at the various steps of the diabetic retinopathy detection pipeline based on fundus images. We discuss several aspects of that pipeline, ranging from the datasets that are widely used by the research community, the preprocessing techniques employed and how these accelerate and improve the models' performance, to the development of such deep learning models for the diagnosis and grading of the disease as well as the localization of the disease's lesions. We also discuss certain models that have been applied in real clinical settings. Finally, we conclude with some important insights and provide future research directions
Deep Learning Techniques for Automated Analysis and Processing of High Resolution Medical Imaging
Programa Oficial de Doutoramento en ComputaciĂłn . 5009V01[Abstract]
Medical imaging plays a prominent role in modern clinical practice for numerous
medical specialties. For instance, in ophthalmology, different imaging techniques are
commonly used to visualize and study the eye fundus. In this context, automated
image analysis methods are key towards facilitating the early diagnosis and adequate
treatment of several diseases. Nowadays, deep learning algorithms have already
demonstrated a remarkable performance for different image analysis tasks. However,
these approaches typically require large amounts of annotated data for the training
of deep neural networks. This complicates the adoption of deep learning approaches,
especially in areas where large scale annotated datasets are harder to obtain, such
as in medical imaging.
This thesis aims to explore novel approaches for the automated analysis of medical
images, particularly in ophthalmology. In this regard, the main focus is on
the development of novel deep learning-based approaches that do not require large
amounts of annotated training data and can be applied to high resolution images.
For that purpose, we have presented a novel paradigm that allows to take advantage
of unlabeled complementary image modalities for the training of deep neural
networks. Additionally, we have also developed novel approaches for the detailed
analysis of eye fundus images. In that regard, this thesis explores the analysis of
relevant retinal structures as well as the diagnosis of different retinal diseases. In
general, the developed algorithms provide satisfactory results for the analysis of the
eye fundus, even when limited annotated training data is available.[Resumen]
Las tĂ©cnicas de imagen tienen un papel destacado en la prĂĄctica clĂnica moderna
de numerosas especialidades mĂ©dicas. Por ejemplo, en oftalmologĂa es comĂșn el uso
de diferentes técnicas de imagen para visualizar y estudiar el fondo de ojo. En este
contexto, los métodos automåticos de anålisis de imagen son clave para facilitar
el diagnĂłstico precoz y el tratamiento adecuado de diversas enfermedades. En la
actualidad, los algoritmos de aprendizaje profundo ya han demostrado un notable
rendimiento en diferentes tareas de anålisis de imagen. Sin embargo, estos métodos
suelen necesitar grandes cantidades de datos etiquetados para el entrenamiento de
las redes neuronales profundas. Esto complica la adopción de los métodos de aprendizaje
profundo, especialmente en ĂĄreas donde los conjuntos masivos de datos etiquetados
son mĂĄs difĂciles de obtener, como es el caso de la imagen mĂ©dica.
Esta tesis tiene como objetivo explorar nuevos mĂ©todos para el anĂĄlisis automĂĄtico de imagen mĂ©dica, concretamente en oftalmologĂa. En este sentido, el foco
principal es el desarrollo de nuevos métodos basados en aprendizaje profundo que no
requieran grandes cantidades de datos etiquetados para el entrenamiento y puedan
aplicarse a imĂĄgenes de alta resoluciĂłn. Para ello, hemos presentado un nuevo
paradigma que permite aprovechar modalidades de imagen complementarias no etiquetadas
para el entrenamiento de redes neuronales profundas. Ademås, también
hemos desarrollado nuevos métodos para el anålisis en detalle de las imågenes del
fondo de ojo. En este sentido, esta tesis explora el anĂĄlisis de estructuras retinianas
relevantes, asĂ como el diagnĂłstico de diferentes enfermedades de la retina. En
general, los algoritmos desarrollados proporcionan resultados satisfactorios para el
anĂĄlisis de las imĂĄgenes de fondo de ojo, incluso cuando la disponibilidad de datos
de entrenamiento etiquetados es limitada.[Resumo]
As tĂ©cnicas de imaxe teñen un papel destacado na prĂĄctica clĂnica moderna de
numerosas especialidades mĂ©dicas. Por exemplo, en oftalmoloxĂa Ă© comĂșn o uso
de diferentes técnicas de imaxe para visualizar e estudar o fondo de ollo. Neste
contexto, os métodos automåticos de anålises de imaxe son clave para facilitar o
diagn ostico precoz e o tratamento adecuado de diversas enfermidades. Na actualidade,
os algoritmos de aprendizaxe profunda xa demostraron un notable rendemento
en diferentes tarefas de anålises de imaxe. Con todo, estes métodos adoitan necesitar
grandes cantidades de datos etiquetos para o adestramento das redes neuronais
profundas. Isto complica a adopción dos métodos de aprendizaxe profunda, especialmente
en ĂĄreas onde os conxuntos masivos de datos etiquetados son mĂĄis difĂciles
de obter, como é o caso da imaxe médica.
Esta tese ten como obxectivo explorar novos métodos para a anålise automåtica
de imaxe mĂ©dica, concretamente en oftalmoloxĂa. Neste sentido, o foco principal
é o desenvolvemento de novos métodos baseados en aprendizaxe profunda que non
requiran grandes cantidades de datos etiquetados para o adestramento e poidan aplicarse
a imaxes de alta resoluciĂłn. Para iso, presentamos un novo paradigma que
permite aproveitar modalidades de imaxe complementarias non etiquetadas para o
adestramento de redes neuronais profundas. Ademais, tamén desenvolvemos novos
métodos para a anålise en detalle das imaxes do fondo de ollo. Neste sentido, esta
tese explora a anĂĄlise de estruturas retinianas relevantes, asĂ como o diagnĂłstico de
diferentes enfermidades da retina. En xeral, os algoritmos desenvolvidos proporcionan
resultados satisfactorios para a anĂĄlise das imaxes de fondo de ollo, mesmo
cando a dispoñibilidade de datos de adestramento etiquetados é limitada
Automated Analysis of Retinal and Choroidal OCT and OCTA Images in AMD
La dĂ©gĂ©nĂ©rescence maculaire liĂ©e Ă l'Ăąge (DMLA) est une maladie oculaire progressive qui se manifeste principalement au niveau de la rĂ©tine externe et de la choroĂŻde. Le projet de recherche vise Ă dĂ©terminer si des mesures obtenues Ă partir d'images de tomographie par cohĂ©rence optique (OCT) et d'angiographie OCT (OCTA) peuvent ĂȘtre utilisĂ©es afin de fournir de nouvelles informations sur des biomarqueurs de la DMLA, ainsi quâune mĂ©thode de dĂ©tection prĂ©coce de la maladie. Ă cette fin, un appareil permettant lâOCT et lâOCTA a Ă©tĂ© utilisĂ© pour imager des sujets DMLA prĂ©coces et intermĂ©diaires, et des sujets tĂ©moins. Ă la configuration sĂ©lectionnĂ©e de lâappareil OCT, chaque acquisition d'un Ćil fournit un volume de donnĂ©es qui est constituĂ© de 300 images transversales appelĂ©es B-scan. Au total, des acquisitions de 10 yeux de sujets atteints de DMLA prĂ©coce et intermĂ©diaire (3000 images B-scan) et un cas de DMLA nĂ©ovasculaire, 12 yeux de sujets ĂągĂ©s de plus de 50 ans (3600 images B-scan) et 11 yeux de sujets ĂągĂ©s de moins de 50 ans (3300 images B-scan) ont Ă©tĂ© obtenues. Cinq mĂ©thodes d'extraction de caractĂ©ristiques ont Ă©tĂ© reproduites ou dĂ©veloppĂ©es afin de dĂ©terminer si des diffĂ©rences significatives au niveau de lâĆil pouvaient ĂȘtre observĂ©es entre les sujets atteints de DMLA prĂ©coce et intermĂ©diaire et les sujets tĂ©moins dâĂąge similaire. GrĂące Ă des tests non paramĂ©triques, il a Ă©tĂ© Ă©tabli que deux mĂ©thodes connues d'extraction de biomarqueurs de la DMLA (analyse dâabsence de signal de dĂ©bit sanguin au niveau de la choriocapillaire et une mĂ©thode de segmentation des drusen) produisent des mesures qui montrent des diffĂ©rences significatives entre les groupes, et qui sont reprĂ©sentĂ©es de façon uniforme Ă travers le plan frontal de lâĆil. Il a ensuite Ă©tĂ© souhaitĂ© de tirer parti des mesures et de gĂ©nĂ©rer un modĂšle de classification de la DMLA interprĂ©table basĂ© sur l'apprentissage automatique au niveau des B-scans. Des spectres de frĂ©quence rĂ©sultant de la transformĂ© de Fourier rapide de sĂ©ries spatiales dĂ©rivĂ©es de mesures considĂ©rĂ©es comme reprĂ©sentatives des deux biomarqueurs ont Ă©tĂ© obtenues, et utilisĂ©es comme caractĂ©ristiques pour former un classifieur de type forĂȘt alĂ©atoire et un classifieur de type forĂȘt profonde. L'analyse en composantes principales (PCA) a Ă©tĂ© utilisĂ©e pour rĂ©duire la dimensionnalitĂ© de lâespace des caractĂ©ristiques, et la performance des modĂšles et l'importance des prĂ©dicteurs ont Ă©tĂ© Ă©valuĂ©es. Une nouvelle mĂ©thode a Ă©tĂ© conçue qui permet une reconstruction 3D automatisĂ©e et une Ă©valuation quantitative de la structure des signaux OCTA et ainsi des vaisseaux rĂ©tiniens. Des mesures reprĂ©sentatives des drusen et de la choriocapillaire ont Ă©tĂ© utilisĂ©es pour crĂ©er des modĂšles interprĂ©tables pour la classification de la DMLA prĂ©coce et intermĂ©diaire. Alors que la prĂ©valence mondiale de la DMLA augmente et que les appareils OCT deviennent plus disponibles, un plus grand nombre de personnes hautement qualifiĂ©es est nĂ©cessaire pour interprĂ©ter les informations mĂ©dicales et fournir les soins cliniques appropriĂ©s. L'analyse et le classement du niveau de sĂ©vĂ©ritĂ© de la DMLA par des experts par le biais d'images OCT sont coĂ»teux et prennent du temps. Les modĂšles proposĂ©s pourraient servir Ă automatiser la dĂ©tection de la DMLA, mĂȘme lorsqu'elle est asymptomatique, et signaler Ă un ophtalmologue la nĂ©cessitĂ© de surveiller et de traiter la condition avant la survenue de pertes graves de la vision. Les modĂšles sont transparents et sont en mesure de fournir une classification Ă partir dâune seule image transversale. Par consĂ©quent, l'outil diagnostic automatisĂ© pourrait Ă©galement ĂȘtre utilisĂ© dans des situations oĂč seules des donnĂ©es mĂ©dicales partielles sont disponibles ou lorsque l'accĂšs aux ressources de soins de santĂ© est limitĂ©.----------ABSTRACT
Age-related macular degeneration (AMD) is a progressive eye disease which manifests primarily at the outer retina and choroid. The research project aimed to determine whether measures obtained from optical coherence tomography (OCT) and OCT angiography (OCTA) images could be used to provide novel AMD biomarker insight and an early disease detection method. To that end, an OCT and OCTA enabled device was used to image AMD subjects and controls. At the selected device scan size, each scan of one eye gathered using an OCT device provides a volume of data which is constructed of 300 cross-sectional images termed B-scans. In total, scans of 10 eyes from subjects with early and intermediate AMD (3,000 B-scan images) and a case of neovascular AMD, 12 eyes from subjects over the age of 50 years old (3,600 B-scan images), and 11 eyes from subjects under the age of 50 years old (3,300 B-scan images) were obtained. Five feature extraction methods were either reproduced or developed in order to determine if significant differences could be observed between the early and intermediate AMD subjects and control subjects at the eye level. Through non-parametric testing it was established that two AMD biomarker extraction methods (choriocapillaris flow voids analysis and a drusen segmentation method) produced measures which showed significant differences between groups, and which were also uniformly represented across the frontal plane of the eye. It was then desired to leverage the measures and generate a B-scan level, interpretable machine learning-based AMD classification model. Frequency spectrums resulting from the fast Fourier transforms of spatial series derived from measures believed to be representative of the two biomarkers were obtained and used as features to train a random forest and a deep forest classifier. Principal component analysis was used to reduce dimensionality of the feature space, and model performance and predictor importance were assessed. A new method was devised which allows automated 3D reconstruction and quantitative evaluation of retinal flow signal patterns and incidentally of retinal microvasculature. Measures representative of drusen and choriocapillaris were leveraged to create interpretable models for the classification of early and intermediate AMD. As the worldwide prevalence of AMD increases and OCT devices are becoming more available, a greater number of highly trained personnel is needed to interpret medical information and provide the appropriate clinical care. Expert analysis and grading of AMD through OCT images are expensive and time consuming. The models proposed could serve to automate AMD detection, even when it is asymptomatic, and signal to an ophthalmologist the need to monitor and treat the condition before the occurrence of severe visual loss. The models are transparent and provide classification from single cross-sectional images. Therefore, the automated diagnosis tool could also be used in situations where only partial medical data are available, or where there is limited access to health care resources
Advanced Representation Learning for Dense Prediction Tasks in Medical Image Analysis
Machine learning is a rapidly growing field of artificial intelligence that allows computers to learn and make predictions using human labels. However, traditional machine learning methods have many drawbacks, such as being time-consuming, inefficient, task-specific biased, and requiring a large amount of domain knowledge. A subfield of machine learning, representation learning, focuses on learning meaningful and useful features or representations from input data. It aims to automatically learn relevant features from raw data, saving time, increasing efficiency and generalization, and reducing reliance on expert knowledge. Recently, deep learning has further accelerated the development of representation learning. It leverages deep architectures to extract complex and abstract representations, resulting in significant outperformance in many areas.
In the field of computer vision, deep learning has made remarkable progress, particularly in high-level and real-world computer vision tasks. Since deep learning methods do not require handcrafted features and have the ability to understand complex visual information, they facilitate researchers to design automated systems that make accurate diagnoses and interpretations, especially in the field of medical image analysis. Deep learning has achieved state-of-the-art performance in many medical image analysis tasks, such as medical image regression/classification, generation and segmentation tasks. Compared to regression/classification tasks, medical image generation and segmentation tasks are more complex dense prediction tasks that understand semantic representations and generate pixel-level predictions.
This thesis focuses on designing representation learning methods to improve the performance of dense prediction tasks in the field of medical image analysis. With advances in imaging technology, more complex medical images become available for use in this field. In contrast to traditional machine learning algorithms, current deep learning-based representation learning methods provide an end-to-end approach to automatically extract representations without the need for manual feature engineering from the complex data. In the field of medical image analysis, there are three unique challenges requiring the design of advanced representation learning architectures, \ie, limited labeled medical images, overfitting with limited data, and lack of interpretability. To address these challenges, we aim to design robust representation learning architectures for the two main directions of dense prediction tasks, namely medical image generation and segmentation.
For medical image generation, the specific topic that we focus on is chromosome straightening. This task involves generating a straightened chromosome image from a curved chromosome input. In addition, the challenges of this task include insufficient training images and corresponding ground truth, as well as the non-rigid nature of chromosomes, leading to distorted details and shapes after straightening. We first propose a study for the chromosome straightening task. We introduce a novel framework using image-to-image translation and demonstrate its efficacy and robustness in generating straightened chromosomes. The framework addresses the challenges of limited training data and outperforms existing studies. We then present a subsequent study to address the limitations of our previous framework, resulting in new state-of-the-art performance and better interpretability and generalization capability. We propose a new robust chromosome straightening framework, named Vit-Patch GAN, which instead learns the motion representation of chromosomes for straightening while retaining more details of shape and banding patterns.
For medical image segmentation, we focus on the fovea localization task, which is transferred from localization to small region segmentation. Accurate segmentation of the fovea region is crucial for monitoring and analyzing retinal diseases to prevent irreversible vision loss. This task also requires the incorporation of global features to effectively identify the fovea region and overcome hard cases associated with retinal diseases and non-standard fovea locations. We first propose a novel two-branch architecture, Bilateral-ViT, for fovea localization in retina image segmentation. This vision-transformer-based architecture incorporates global image context and blood vessel structure. It surpasses existing methods and achieves state-of-the-art results on two public datasets. We then propose a subsequent method to further improve the performance of fovea localization. We design a novel dual-stream deep learning architecture called Bilateral-Fuser. In contrast to our previous Bilateral-ViT, Bilateral-Fuser globally incorporates long-range connections from multiple cues, including fundus and vessel distribution. Moreover, with the newly designed Bilateral Token Incorporation module, Bilateral-Fuser learns anatomical-aware tokens, significantly reducing computational costs while achieving new state-of-the-art performance. Our comprehensive experiments also demonstrate that Bilateral-Fuser achieves better accuracy and robustness on both normal and diseased retina images, with excellent generalization capability