21 research outputs found

    Learning universal representations across tasks and domains

    Get PDF
    A longstanding goal in computer vision research is to produce broad and general-purpose systems that work well on a broad range of vision problems and are capable of learning concepts only from few labelled samples. In contrast, existing models are limited to work only in specific tasks or domains (datasets), e.g., a semantic segmentation model for indoor images (Silberman et al., 2012). In addition, they are data inefficient and require large labelled dataset for each task or domain. While there has been works proposed for domain/task-agnostic representations by either loss balancing strategies or architecture design, it remains a challenging problem on optimizing such universal representation network. This thesis focuses on addressing the challenges of learning universal representations that generalize well over multiple tasks (e.g. segmentation, depth estimation) or various visual domains (e.g. image object classification, image action classification). In addition, the thesis also shows that these representations can be learned from partial supervision and transferred and adopted to previously unseen tasks/domains in a data-efficient manner. The first part of the dissertation focuses on learning universal representations, i.e. a single universal network for multi-task learning (e.g., learning a single network jointly for different dense prediction tasks like segmentation and depth estimation) and multi- domain learning (e.g. image classification for various vision datasets, each collected for a different problem like texture, flower or action classification). Learning such universal representations by jointly minimizing the sum of all task-specific losses is challenging because of the interference between tasks and it leads to unbalanced results (i.e. some tasks dominate or interfere other tasks and the universal network performs worse than task/domain-specific networks each of which is trained for a task/domain independently). Hence a new solution is proposed to regularize the optimization of the universal network by encouraging the universal network to produce the same features as the ones of task-specific networks. The experimental results demonstrate that the proposed method learns a single universal network that performs well for multiple tasks or various visual domains. Despite the recent advances in multi-task learning of dense prediction problems, most methods rely on expensive labelled datasets. Relaxing this assumption gives rise to a new multi-task learning setting, called multi-task partially-supervised learning in this thesis, in which the goal is to jointly learn of multiple dense prediction tasks on partially annotated data (i.e. not all the task labels are available for each training image). In the thesis, a label efficient approach is proposed to successfully leverage task relations to supervise its multi-task learning when data is partially annotated. In particular, the proposed method learns to map each task pair to a joint pairwise task- space which enables sharing information between them in a computationally efficient way through another network conditioned on task pairs, and avoids learning trivial cross-task relations by retaining high-level information about the input image. The final part of the dissertation studies the problem of adapting a model to pre- viously unseen tasks (from seen or unseen domains) with very few labelled training samples of the new tasks, i.e. cross-domain few-shot learning. Recent methods have focused on using various adaptation strategies for aligning their visual representations to new domains or selecting the relevant ones from multiple domain-specific feature extractors. In this dissertation, new methods are formulated to learn a single task- agnostic network from multiple domains during meta-training and attach light-weight task-specific parameters that are learned from limited training samples and adapt the task-agnostic network to accommodate the previously unseen tasks. Systematic analysis is performed to study various task adaptation strategies for few-shot learning. Extensive experimental evidence demonstrates that the proposed methods that learn a single set of task-agnostic representations and adapt the representations via residual adapters in matrix form attached to the task-agnostic model significantly benefits the cross-domain few-shot learning

    Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability

    Full text link
    Video segmentation encompasses a wide range of categories of problem formulation, e.g., object, scene, actor-action and multimodal video segmentation, for delineating task-specific scene components with pixel-level masks. Recently, approaches in this research area shifted from concentrating on ConvNet-based to transformer-based models. In addition, various interpretability approaches have appeared for transformer models and video temporal dynamics, motivated by the growing interest in basic scientific understanding, model diagnostics and societal implications of real-world deployment. Previous surveys mainly focused on ConvNet models on a subset of video segmentation tasks or transformers for classification tasks. Moreover, component-wise discussion of transformer-based video segmentation models has not yet received due focus. In addition, previous reviews of interpretability methods focused on transformers for classification, while analysis of video temporal dynamics modelling capabilities of video models received less attention. In this survey, we address the above with a thorough discussion of various categories of video segmentation, a component-wise discussion of the state-of-the-art transformer-based models, and a review of related interpretability methods. We first present an introduction to the different video segmentation task categories, their objectives, specific challenges and benchmark datasets. Next, we provide a component-wise review of recent transformer-based models and document the state of the art on different video segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc interpretability methods for transformer models and interpretability methods for understanding the role of the temporal dimension in video models. Finally, we conclude our discussion with future research directions

    Representation Analysis Methods to Model Context for Speech Technology

    Get PDF
    Speech technology has developed to levels equivalent with human parity through the use of deep neural networks. However, it is unclear how the learned dependencies within these networks can be attributed to metrics such as recognition performance. This research focuses on strategies to interpret and exploit these learned context dependencies to improve speech recognition models. Context dependency analysis had not yet been explored for speech recognition networks. In order to highlight and observe dependent representations within speech recognition models, a novel analysis framework is proposed. This analysis framework uses statistical correlation indexes to compute the coefficiency between neural representations. By comparing the coefficiency of neural representations between models using different approaches, it is possible to observe specific context dependencies within network layers. By providing insights on context dependencies it is then possible to adapt modelling approaches to become more computationally efficient and improve recognition performance. Here the performance of End-to-End speech recognition models are analysed, providing insights on the acoustic and language modelling context dependencies. The modelling approach for a speaker recognition task is adapted to exploit acoustic context dependencies and reach comparable performance with the state-of-the-art methods, reaching 2.89% equal error rate using the Voxceleb1 training and test sets with 50% of the parameters. Furthermore, empirical analysis of the role of acoustic context for speech emotion recognition modelling revealed that emotion cues are presented as a distributed event. These analyses and results for speech recognition applications aim to provide objective direction for future development of automatic speech recognition systems

    Relevant data representation by a Kernel-based framework

    Get PDF
    Nowadays, the analysis of a large amount of data has emerged as an issue of great interest taking increasing place in the scientific community, especially in automation, signal processing, pattern recognition, and machine learning. In this sense, the identification, description, classification, visualization, and clustering of events or patterns are important problems for engineering developments and scientific issues, such as biology, medicine, economy, artificial vision, artificial intelligence, and industrial production. Nonetheless, it is difficult to interpret the available information due to its complexity and a large amount of obtained features. In addition, the analysis of the input data requires the development of methodologies that allow to reveal the relevant behaviors of the studied process, particularly, when such signals contain hidden structures varying over a given domain, e.g., space and/or time. When the analyzed signal contains such kind of properties, directly applying signal processing and machine learning procedures without considering a suitable model that deals with both the statistical distribution and the data structure, can lead in unstable performance results. Regarding this, kernel functions appear as an alternative approach to address the aforementioned issues by providing flexible mathematical tools that allow enhancing data representation for supporting signal processing and machine learning systems. Moreover, kernelbased methods are powerful tools for developing better-performing solutions by adapting the kernel to a given problem, instead of learning data relationships from explicit raw vector representations. However, building suitable kernels requires some user prior knowledge about input data, which is not available in most of the practical cases. Furthermore, using the definitions of traditional kernel methods directly, possess a challenging estimation problem that often leads to strong simplifications that restrict the kind of representation that we can use on the data. In this study, we propose a data representation framework based on kernel methods to learn automatically relevant sample relationships in learning systems. Namely, the proposed framework is divided into five kernel-based approaches, which aim to compute relevant data representations by adapting them according to both the imposed sample relationships constraints and the learning scenario (unsupervised or supervised task). First, we develop a kernel-based representation approach that allows revealing the main input sample relations by including relevant data structures using graph-based sparse constraints. Thus, salient data structures are highlighted aiming to favor further unsupervised clustering stages. This approach can be viewed as a graph pruning strategy within a spectral clustering framework which allows enhancing both the local and global data consistencies for a given input similarity matrix. Second, we introduce a kernel-based representation methodology that captures meaningful data relations in terms of their statistical distribution. Thus, an information theoretic learning (ITL) based penalty function is introduced to estimate a kernel-based similarity that maximizes the whole information potential variability. So, we seek for a reproducing kernel Hilbert space (RKHS) that spans the widest information force magnitudes among data points to support further clustering stages. Third, an entropy-like functional on positive definite matrices based on Renyi’s definition is adapted to develop a kernel-based representation approach which considers the statistical distribution and the salient data structures. Thereby, relevant input patterns are highlighted in unsupervised learning tasks. Particularly, the introduced approach is tested as a tool to encode relevant local and global input data relationships in dimensional reduction applications. Fourth, a supervised kernel-based representation is introduced via a metric learning procedure in RKHS that takes advantage of the user-prior knowledge, when available, regarding the studied learning task. Such an approach incorporates the proposed ITL-based kernel functional estimation strategy to adapt automatically the relevant representation using both the supervised information and the input data statistical distribution. As a result, relevant sample dependencies are highlighted by weighting the input features that mostly encode the supervised learning task. Finally, a new generalized kernel-based measure is proposed by taking advantage of different RKHSs. In this way, relevant dependencies are highlighted automatically by considering the input data domain-varying behavior and the user-prior knowledge (supervised information) when available. The proposed measure is an extension of the well-known crosscorrentropy function based on Hilbert space embeddings. Throughout the study, the proposed kernel-based framework is applied to biosignal and image data as an alternative to support aided diagnosis systems and image-based object analysis. Indeed, the introduced kernel-based framework improve, in most of the cases, unsupervised and supervised learning performances, aiding researchers in their quest to process and to favor the understanding of complex dataResumen: Hoy en día, el análisis de datos se ha convertido en un tema de gran interés para la comunidad científica, especialmente en campos como la automatización, el procesamiento de señales, el reconocimiento de patrones y el aprendizaje de máquina. En este sentido, la identificación, descripción, clasificación, visualización, y la agrupación de eventos o patrones son problemas importantes para desarrollos de ingeniería y cuestiones científicas, tales como: la biología, la medicina, la economía, la visión artificial, la inteligencia artificial y la producción industrial. No obstante, es difícil interpretar la información disponible debido a su complejidad y la gran cantidad de características obtenidas. Además, el análisis de los datos de entrada requiere del desarrollo de metodologías que permitan revelar los comportamientos relevantes del proceso estudiado, en particular, cuando tales señales contienen estructuras ocultas que varían sobre un dominio dado, por ejemplo, el espacio y/o el tiempo. Cuando la señal analizada contiene este tipo de propiedades, los rendimientos pueden ser inestables si se aplican directamente técnicas de procesamiento de señales y aprendizaje automático sin tener en cuenta la distribución estadística y la estructura de datos. Al respecto, las funciones núcleo (kernel) aparecen como un enfoque alternativo para abordar las limitantes antes mencionadas, proporcionando herramientas matemáticas flexibles que mejoran la representación de los datos de entrada. Por otra parte, los métodos basados en funciones núcleo son herramientas poderosas para el desarrollo de soluciones de mejor rendimiento mediante la adaptación del núcleo de acuerdo al problema en estudio. Sin embargo, la construcción de funciones núcleo apropiadas requieren del conocimiento previo por parte del usuario sobre los datos de entrada, el cual no está disponible en la mayoría de los casos prácticos. Por otra parte, a menudo la estimación de las funciones núcleo conllevan sesgos el modelo, siendo necesario apelar a simplificaciones matemáticas que no siempre son acordes con la realidad. En este estudio, se propone un marco de representación basado en métodos núcleo para resaltar relaciones relevantes entre los datos de forma automática en sistema de aprendizaje de máquina. A saber, el marco propuesto consta de cinco enfoques núcleo, en aras de adaptar la representación de acuerdo a las relaciones impuestas sobre las muestras y sobre el escenario de aprendizaje (sin/con supervisión). En primer lugar, se desarrolla un enfoque de representación núcleo que permite revelar las principales relaciones entre muestras de entrada mediante la inclusión de estructuras relevantes utilizando restricciones basadas en modelado por grafos. Por lo tanto, las estructuras de datos más sobresalientes se destacan con el objetivo de favorecer etapas posteriores de agrupamiento no supervisado. Este enfoque puede ser visto como una estrategia de depuración de grafos dentro de un marco de agrupamiento espectral que permite mejorar las consistencias locales y globales de los datos En segundo lugar, presentamos una metodología de representación núcleo que captura relaciones significativas entre muestras en términos de su distribución estadística. De este modo, se introduce una función de costo basada en aprendizaje por teoría de la información para estimar una similitud que maximice la variabilidad del potencial de información de los datos de entrada. Así, se busca un espacio de Hilbert generado por el núcleo que contenga altas fuerzas de información entre los puntos para favorecer el agrupamiento entre los mismos. En tercer lugar, se propone un esquema de representación que incluye un funcional de entropía para matrices definidas positivas a partir de la definición de Renyi. En este sentido, se pretenden incluir la distribución estadística de las muestras y sus estructuras relevantes. Por consiguiente, los patrones de entrada pertinentes se destacan en tareas de aprendizaje sin supervisión. En particular, el enfoque introducido se prueba como una herramienta para codificar las relaciones locales y globales de los datos en tareas de reducción de dimensión. En cuarto lugar, se introduce una metodología de representación núcleo supervisada a través de un aprendizaje de métrica en el espacio de Hilbert generado por una función núcleo en aras de aprovechar el conocimiento previo del usuario con respecto a la tarea de aprendizaje. Este enfoque incorpora un funcional por teoría de información que permite adaptar automáticamente la representación utilizando tanto información supervisada y la distribución estadística de los datos de entrada. Como resultado, las dependencias entre las muestras se resaltan mediante la ponderación de las características de entrada que codifican la tarea de aprendizaje supervisado. Por último, se propone una nueva medida núcleo mediante el aprovechamiento de diferentes espacios de representación. De este modo, las dependencias más relevantes entre las muestras se resaltan automáticamente considerando el dominio de interés de los datos de entrada y el conocimiento previo del usuario (información supervisada). La medida propuesta es una extensión de la función de cross-correntropia a partir de inmersiones en espacios de Hilbert. A lo largo del estudio, el esquema propuesto se valida sobre datos relacionados con bioseñales e imágenes como una alternativa para apoyar sistemas de apoyo diagnóstico y análisis objetivo basado en imágenes. De hecho, el marco introducido permite mejorar, en la mayoría de los casos, el rendimiento de sistemas de aprendizaje supervisado y no supervisado, favoreciendo la precisión de la tarea y la interpretabilidad de los datosDoctorad

    Towards Real-World Data Streams for Deep Continual Learning

    Get PDF
    Continual Learning deals with Artificial Intelligent agents striving to learn from an ever-ending stream of data. Recently, Deep Continual Learning focused on the design of new strategies to endow Artificial Neural Networks with the ability to learn continuously without forgetting previous knowledge. In fact, the learning process of any Artificial Neural Network model is well-known to lack the sufficient stability to preserve existing knowledge when learning new information. This phenomenon, called catastrophic forgetting or simply forgetting, is considered one of the main obstacles for the design of effective Continual Learning agents. However, existing strategies designed to mitigate forgetting have been evaluated on a restricted set of Continual Learning scenarios. The most used one is, by far, the Class-Incremental scenario applied on object detection tasks. Even though it drove interest in Continual Learning, Class-Incremental scenarios strongly constraint the properties of the data stream, thus limiting its ability to model real-world environments. The core of this thesis concerns the introduction of three Continual Learning data streams, whose design is centered around specific real-world environments properties. First, we propose the Class- Incremental with Repetition scenario, which builds a data stream including both the introduction of new concepts and the repetition of previous ones. Repetition is naturally present in many environments and it constitutes an important source of information. Second, we formalize the Continual Pre-Training scenario, which leverages a data stream of unstructured knowledge to keep a pre-trained model updated over time. One important objective of this scenario is to study how to continuously build general, robust representations that does not strongly depend on the specific task to be solved. This is a fundamental property of real-world agents, which build cross-task knowledge and then adapts it to specific needs. Third, we study Continual Learning scenarios where data streams are composed by temporally-correlated data. Temporal correlation is ubiquitous and lies at the foundation of most environments we, as humans, experience during our life. We leverage Recurrent Neural Networks as our main model, due to their intrinsic ability to model temporal correlations. We discovered that, when applied to recurrent models, Continual Learning strategies behave in an unexpected manner. This highlights the limits of the current experimental validation, mostly focused on Computer Vision tasks. Ultimately, the introduction of new data streams contributed to deepen our understanding of how Artificial Neural Networks learn continuously. We discover that forgetting strongly depends on the properties of the data stream and we observed large changes from one data stream to another. Moreover, when forgetting is mild, we were able to effectively mitigate it with simple strategies, or even without any specific ones. Loosening the focus on forgetting allows us to turn our attention to other interesting problems, outlined in this thesis, like (i) separation between continual representation learning and quick adaptation to novel tasks, (ii) robustness to unbalanced data streams and (iii) ability to continuously learn temporal correlations. These objectives currently defy existing strategies and will likely represent the next challenge for Continual Learning research

    Deep representation learning for speech recognition

    Get PDF
    Representation learning is a fundamental ingredient of deep learning. However, learning a good representation is a challenging task. For speech recognition, such a representation should contain the information needed to perform well in this task. A robust representation should also be reusable, hence it should capture the structure of the data. Interpretability is another desired characteristic. In this thesis we strive to learn an optimal deep representation for speech recognition using feed-forward Neural Networks (NNs) with different connectivity patterns. First and foremost, we aim to improve the robustness of the acoustic models. We use attribute-aware and adaptive training strategies to model the underlying factors of variation related to the speakers and the acoustic conditions. We focus on low-latency and real-time decoding scenarios. We explore different utterance summaries (referred to as utterance embeddings), capturing various sources of speech variability, and we seek to optimise speaker adaptive training (SAT) with control networks acting on the embeddings. We also propose a multi-scale CNN layer, to learn factorised representations. The proposed multi-scale approach also tackles the computational and memory efficiency. We also present a number of different approaches as an attempt to better understand learned representations. First, with a controlled design, we aim to assess the role of individual components of deep CNN acoustic models. Next, with saliency maps, we evaluate the importance of each input feature with respect to the classification criterion. Then, we propose to evaluate layer-wise and model-wise learned representations in different diagnostic verification tasks (speaker and acoustic condition verification). We propose a deep CNN model as the embedding extractor, merging the information learned at different layers in the network. Similarly, we perform the analyses for the embeddings used in SAT-DNNs to gain more insight. For the multi-scale models, we also show how to compare learned representations (and assess their robustness) with a metric invariant to affine transformations

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

    Get PDF
    © 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution

    Segmentation and Characterization of Small Retinal Vessels in Fundus Images Using the Tensor Voting Approach

    Get PDF
    RÉSUMÉ La rétine permet de visualiser facilement une partie du réseau vasculaire humain. Elle offre ainsi un aperçu direct sur le développement et le résultat de certaines maladies liées au réseau vasculaire dans son entier. Chaque complication visible sur la rétine peut avoir un impact sur la capacité visuelle du patient. Les plus petits vaisseaux sanguins sont parmi les premières structures anatomiques affectées par la progression d’une maladie, être capable de les analyser est donc crucial. Les changements dans l’état, l’aspect, la morphologie, la fonctionnalité, ou même la croissance des petits vaisseaux indiquent la gravité des maladies. Le diabète est une maladie métabolique qui affecte des millions de personnes autour du monde. Cette maladie affecte le taux de glucose dans le sang et cause des changements pathologiques dans différents organes du corps humain. La rétinopathie diabétique décrit l’en- semble des conditions et conséquences du diabète au niveau de la rétine. Les petits vaisseaux jouent un rôle dans le déclenchement, le développement et les conséquences de la rétinopa- thie. Dans les dernières étapes de cette maladie, la croissance des nouveaux petits vaisseaux, appelée néovascularisation, présente un risque important de provoquer la cécité. Il est donc crucial de détecter tous les changements qui ont lieu dans les petits vaisseaux de la rétine dans le but de caractériser les vaisseaux sains et les vaisseaux anormaux. La caractérisation en elle-même peut faciliter la détection locale d’une rétinopathie spécifique. La segmentation automatique des structures anatomiques comme le réseau vasculaire est une étape cruciale. Ces informations peuvent être fournies à un médecin pour qu’elles soient considérées lors de son diagnostic. Dans les systèmes automatiques d’aide au diagnostic, le rôle des petits vaisseaux est significatif. Ne pas réussir à les détecter automatiquement peut conduire à une sur-segmentation du taux de faux positifs des lésions rouges dans les étapes ultérieures. Les efforts de recherche se sont concentrés jusqu’à présent sur la localisation précise des vaisseaux de taille moyenne. Les modèles existants ont beaucoup plus de difficultés à extraire les petits vaisseaux sanguins. Les modèles existants ne sont pas robustes à la grande variance d’apparence des vaisseaux ainsi qu’à l’interférence avec l’arrière-plan. Les modèles de la littérature existante supposent une forme générale qui n’est pas suffisante pour s’adapter à la largeur étroite et la courbure qui caractérisent les petits vaisseaux sanguins. De plus, le contraste avec l’arrière-plan dans les régions des petits vaisseaux est très faible. Les méthodes de segmentation ou de suivi produisent des résultats fragmentés ou discontinus. Par ailleurs, la segmentation des petits vaisseaux est généralement faite aux dépends de l’amplification du bruit. Les modèles déformables sont inadéquats pour segmenter les petits vaisseaux. Les forces utilisées ne sont pas assez flexibles pour compenser le faible contraste, la largeur, et vii la variance des vaisseaux. Enfin, les approches de type apprentissage machine nécessitent un entraînement avec une base de données étiquetée. Il est très difficile d’obtenir ces bases de données dans le cas des petits vaisseaux. Cette thèse étend les travaux de recherche antérieurs en fournissant une nouvelle mé- thode de segmentation des petits vaisseaux rétiniens. La détection de ligne à échelles multiples (MSLD) est une méthode récente qui démontre une bonne performance de segmentation dans les images de la rétine, tandis que le vote tensoriel est une méthode proposée pour reconnecter les pixels. Une approche combinant un algorithme de détection de ligne et de vote tensoriel est proposée. L’application des détecteurs de lignes a prouvé son efficacité à segmenter les vais- seaux de tailles moyennes. De plus, les approches d’organisation perceptuelle comme le vote tensoriel ont démontré une meilleure robustesse en combinant les informations voisines d’une manière hiérarchique. La méthode de vote tensoriel est plus proche de la perception humain que d’autres modèles standards. Comme démontré dans ce manuscrit, c’est un outil pour segmenter les petits vaisseaux plus puissant que les méthodes existantes. Cette combinaison spécifique nous permet de surmonter les défis de fragmentation éprouvés par les méthodes de type modèle déformable au niveau des petits vaisseaux. Nous proposons également d’utiliser un seuil adaptatif sur la réponse de l’algorithme de détection de ligne pour être plus robuste aux images non-uniformes. Nous illustrons également comment une combinaison des deux méthodes individuelles, à plusieurs échelles, est capable de reconnecter les vaisseaux sur des distances variables. Un algorithme de reconstruction des vaisseaux est également proposé. Cette dernière étape est nécessaire car l’information géométrique complète est requise pour pouvoir utiliser la segmentation dans un système d’aide au diagnostic. La segmentation a été validée sur une base de données d’images de fond d’oeil à haute résolution. Cette base contient des images manifestant une rétinopathie diabétique. La seg- mentation emploie des mesures de désaccord standards et aussi des mesures basées sur la perception. En considérant juste les petits vaisseaux dans les images de la base de données, l’amélioration dans le taux de sensibilité que notre méthode apporte par rapport à la méthode standard de détection multi-niveaux de lignes est de 6.47%. En utilisant les mesures basées sur la perception, l’amélioration est de 7.8%. Dans une seconde partie du manuscrit, nous proposons également une méthode pour caractériser les rétines saines ou anormales. Certaines images contiennent de la néovascula- risation. La caractérisation des vaisseaux en bonne santé ou anormale constitue une étape essentielle pour le développement d’un système d’aide au diagnostic. En plus des défis que posent les petits vaisseaux sains, les néovaisseaux démontrent eux un degré de complexité encore plus élevé. Ceux-ci forment en effet des réseaux de vaisseaux à la morphologie com- plexe et inhabituelle, souvent minces et à fortes courbures. Les travaux existants se limitent viii à l’utilisation de caractéristiques de premier ordre extraites des petits vaisseaux segmentés. Notre contribution est d’utiliser le vote tensoriel pour isoler les jonctions vasculaires et d’uti- liser ces jonctions comme points d’intérêts. Nous utilisons ensuite une statistique spatiale de second ordre calculée sur les jonctions pour caractériser les vaisseaux comme étant sains ou pathologiques. Notre méthode améliore la sensibilité de la caractérisation de 9.09% par rapport à une méthode de l’état de l’art. La méthode développée s’est révélée efficace pour la segmentation des vaisseaux réti- niens. Des tenseurs d’ordre supérieur ainsi que la mise en œuvre d’un vote par tenseur via un filtrage orientable pourraient être étudiés pour réduire davantage le temps d’exécution et résoudre les défis encore présents au niveau des jonctions vasculaires. De plus, la caractéri- sation pourrait être améliorée pour la détection de la rétinopathie proliférative en utilisant un apprentissage supervisé incluant des cas de rétinopathie diabétique non proliférative ou d’autres pathologies. Finalement, l’incorporation des méthodes proposées dans des systèmes d’aide au diagnostic pourrait favoriser le dépistage régulier pour une détection précoce des rétinopathies et d’autres pathologies oculaires dans le but de réduire la cessité au sein de la population.----------ABSTRACT As an easily accessible site for the direct observation of the circulation system, human retina can offer a unique insight into diseases development or outcome. Retinal vessels are repre- sentative of the general condition of the whole systematic circulation, and thus can act as a "window" to the status of the vascular network in the whole body. Each complication on the retina can have an adverse impact on the patient’s sight. In this direction, small vessels’ relevance is very high as they are among the first anatomical structures that get affected as diseases progress. Moreover, changes in the small vessels’ state, appearance, morphology, functionality, or even growth indicate the severity of the diseases. This thesis will focus on the retinal lesions due to diabetes, a serious metabolic disease affecting millions of people around the world. This disorder disturbs the natural blood glucose levels causing various pathophysiological changes in different systems across the human body. Diabetic retinopathy is the medical term that describes the condition when the fundus and the retinal vessels are affected by diabetes. As in other diseases, small vessels play a crucial role in the onset, the development, and the outcome of the retinopathy. More importantly, at the latest stage, new small vessels, or neovascularizations, growth constitutes a factor of significant risk for blindness. Therefore, there is a need to detect all the changes that occur in the small retinal vessels with the aim of characterizing the vessels to healthy or abnormal. The characterization, in turn, can facilitate the detection of a specific retinopathy locally, like the sight-threatening proliferative diabetic retinopathy. Segmentation techniques can automatically isolate important anatomical structures like the vessels, and provide this information to the physician to assist him in the final decision. In comprehensive systems for the automatization of DR detection, small vessels role is significant as missing them early in a CAD pipeline might lead to an increase in the false positive rate of red lesions in subsequent steps. So far, the efforts have been concentrated mostly on the accurate localization of the medium range vessels. In contrast, the existing models are weak in case of the small vessels. The required generalization to adapt an existing model does not allow the approaches to be flexible, yet robust to compensate for the increased variability in the appearance as well as the interference with the background. So far, the current template models (matched filtering, line detection, and morphological processing) assume a general shape for the vessels that is not enough to approximate the narrow, curved, characteristics of the small vessels. Additionally, due to the weak contrast in the small vessel regions, the current segmentation and the tracking methods produce fragmented or discontinued results. Alternatively, the small vessel segmentation can be accomplished at the expense of x background noise magnification, in the case of using thresholding or the image derivatives methods. Furthermore, the proposed deformable models are not able to propagate a contour to the full extent of the vasculature in order to enclose all the small vessels. The deformable model external forces are ineffective to compensate for the low contrast, the low width, the high variability in the small vessel appearance, as well as the discontinuities. Internal forces, also, are not able to impose a global shape constraint to the contour that could be able to approximate the variability in the appearance of the vasculature in different categories of vessels. Finally, machine learning approaches require the training of a classifier on a labelled set. Those sets are difficult to be obtained, especially in the case of the smallest vessels. In the case of the unsupervised methods, the user has to predefine the number of clusters and perform an effective initialization of the cluster centers in order to converge to the global minimum. This dissertation expanded the previous research work and provides a new segmentation method for the smallest retinal vessels. Multi-scale line detection (MSLD) is a recent method that demonstrates good segmentation performance in the retinal images, while tensor voting is a method first proposed for reconnecting pixels. For the first time, we combined the line detection with the tensor voting framework. The application of the line detectors has been proved an effective way to segment medium-sized vessels. Additionally, perceptual organization approaches like tensor voting, demonstrate increased robustness by combining information coming from the neighborhood in a hierarchical way. Tensor voting is closer than standard models to the way human perception functions. As we show, it is a more powerful tool to segment small vessels than the existing methods. This specific combination allows us to overcome the apparent fragmentation challenge of the template methods at the smallest vessels. Moreover, we thresholded the line detection response adaptively to compensate for non-uniform images. We also combined the two individual methods in a multi-scale scheme in order to reconnect vessels at variable distances. Finally, we reconstructed the vessels from their extracted centerlines based on pixel painting as complete geometric information is required to be able to utilize the segmentation in a CAD system. The segmentation was validated on a high-resolution fundus image database that in- cludes diabetic retinopathy images of varying stages, using standard discrepancy as well as perceptual-based measures. When only the smallest vessels are considered, the improve- ments in the sensitivity rate for the database against the standard multi-scale line detection method is 6.47%. For the perceptual-based measure, the improvement is 7.8% against the basic method. The second objective of the thesis was to implement a method for the characterization of isolated retinal areas into healthy or abnormal cases. Some of the original images, from which xi these patches are extracted, contain neovascularizations. Investigation of image features for the vessels characterization to healthy or abnormal constitutes an essential step in the direction of developing CAD system for the automatization of DR screening. Given that the amount of data will significantly increase under CAD systems, the focus on this category of vessels can facilitate the referral of sight-threatening cases to early treatment. In addition to the challenges that small healthy vessels pose, neovessels demonstrate an even higher degree of complexity as they form networks of convolved, twisted, looped thin vessels. The existing work is limited to the use of first-order characteristics extracted from the small segmented vessels that limits the study of patterns. Our contribution is in using the tensor voting framework to isolate the retinal vascular junctions and in turn using those junctions as points of interests. Second, we exploited second-order statistics computed on the junction spatial distribution to characterize the vessels as healthy or neovascularizations. In fact, the second-order spatial statistics extracted from the junction distribution are combined with widely used features to improve the characterization sensitivity by 9.09% over the state of art. The developed method proved effective for the segmentation of the retinal vessels. Higher order tensors along with the implementation of tensor voting via steerable filtering could be employed to further reduce the execution time, and resolve the challenges at vascular junctions. Moreover, the characterization could be advanced to the detection of prolifera- tive retinopathy by extending the supervised learning to include non-proliferative diabetic retinopathy cases or other pathologies. Ultimately, the incorporation of the methods into CAD systems could facilitate screening for the effective reduction of the vision-threatening diabetic retinopathy rates, or the early detection of other than ocular pathologies
    corecore