201 research outputs found

    GAN-Based Super-Resolution And Segmentation Of Retinal Layers In Optical Coherence Tomography Scans

    Get PDF
    Optical Coherence Tomography (OCT) has been identified as a noninvasive and cost-effective imaging modality for identifying potential biomarkers for Alzheimer\u27s diagnosis and progress detection. Current hypotheses indicate that retinal layer thickness, which can be assessed via OCT scans, is an efficient biomarker for identifying Alzheimer\u27s disease. Due to factors such as speckle noise, a small target region, and unfavorable imaging conditions manual segmentation of retina layers is a challenging task. Therefore, as a reasonable first step, this study focuses on automatically segmenting retinal layers to separate them for subsequent investigations. Another important challenge commonly faced is the lack of clarity of the layer boundaries in retina OCT scans, which compels the research of super-resolving the images for improved clarity. Deep learning pipelines have stimulated substantial progress for the segmentation tasks. Generative adversarial networks (GANs) are a prominent field of deep learning which achieved astonishing performance in semantic segmentation. Conditional adversarial networks as a general-purpose solution to image-to-image translation problems not only learn the mapping from the input image to the output image but also learn a loss function to train this mapping. We propose a GAN-based segmentation model and evaluate incorporating popular networks, namely, U-Net and ResNet, in the GAN architecture with additional blocks of transposed convolution and sub-pixel convolution for the task of upscaling OCT images from low to high resolution by a factor of four. We also incorporate the Dice loss as an additional reconstruction loss term to improve the performance of this joint optimization task. Our best model configuration empirically achieved the Dice coefficient of 0.867 and mIOU of 0.765

    Efficient Methods for the Design and Training of Neural Networks

    Get PDF
    The field of artificial intelligence has seen significant advancements with the development of neural networks, which have numerous applications in computer vision, natural language processing, and speech processing. Despite these advancements, designing and training these networks still pose numerous challenges. This thesis aims to address two critical aspects of neural network development, design and training, within the context of computer vision tasks. The thesis focuses on three main challenges in the development of neural networks. The first challenge is finding an efficient way to perform architecture search in an extremely large or even unlimited search space. To address this challenge, the thesis proposes a Neural Search-space Evolution (NSE) scheme that enables efficient and effective architecture search in large-scale search spaces. The second challenge is to improve the efficiency of self-supervised learning for model pretraining. To address this challenge, the thesis proposes a combinatorial patches approach that significantly improves the efficiency of self-supervised learning. The third challenge is to develop an efficient and versatile multitask model that can leverage the benefits of large-scale multitask training. To address this challenge, the thesis proposes a Unified model for Human-Centric Perceptions (UniHCP) as a simple and scalable solution for a human-centric perception system that unifies multiple human-centric tasks into a neat, efficient, and scalable model. The results of this thesis demonstrate the effectiveness of the proposed methods in improving the practicality and performance of neural network design and training. The NSE scheme, combinatorial patches approach, and UniHCP have been tested on a broad range of datasets, tasks, and settings, yielding impressive results. These findings affirm the efficacy of the proposed methods in enhancing the efficiency of the design and training process of neural networks

    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Full text link
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy
    • …
    corecore