1,778 research outputs found

    Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions

    Full text link
    Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images, audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to inappropriate design of network architecture, use of objective function and selection of optimization algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs have been investigated based on techniques of re-engineered network architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge, there is no existing survey that has particularly focused on broad and systematic developments of these solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. We first identify key research issues within each design and optimization technique and then propose a new taxonomy to structure solutions by key research issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants proposed within each solution and their relationships. Finally, based on the insights gained, we present the promising research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table

    A detection-based pattern recognition framework and its applications

    Get PDF
    The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation. Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages. A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage. This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min

    Unsupervised Superpixel Generation using Edge-Sparse Embedding

    Full text link
    Partitioning an image into superpixels based on the similarity of pixels with respect to features such as colour or spatial location can significantly reduce data complexity and improve subsequent image processing tasks. Initial algorithms for unsupervised superpixel generation solely relied on local cues without prioritizing significant edges over arbitrary ones. On the other hand, more recent methods based on unsupervised deep learning either fail to properly address the trade-off between superpixel edge adherence and compactness or lack control over the generated number of superpixels. By using random images with strong spatial correlation as input, \ie, blurred noise images, in a non-convolutional image decoder we can reduce the expected number of contrasts and enforce smooth, connected edges in the reconstructed image. We generate edge-sparse pixel embeddings by encoding additional spatial information into the piece-wise smooth activation maps from the decoder's last hidden layer and use a standard clustering algorithm to extract high quality superpixels. Our proposed method reaches state-of-the-art performance on the BSDS500, PASCAL-Context and a microscopy dataset

    Computational Methods for Segmentation of Multi-Modal Multi-Dimensional Cardiac Images

    Get PDF
    Segmentation of the heart structures helps compute the cardiac contractile function quantified via the systolic and diastolic volumes, ejection fraction, and myocardial mass, representing a reliable diagnostic value. Similarly, quantification of the myocardial mechanics throughout the cardiac cycle, analysis of the activation patterns in the heart via electrocardiography (ECG) signals, serve as good cardiac diagnosis indicators. Furthermore, high quality anatomical models of the heart can be used in planning and guidance of minimally invasive interventions under the assistance of image guidance. The most crucial step for the above mentioned applications is to segment the ventricles and myocardium from the acquired cardiac image data. Although the manual delineation of the heart structures is deemed as the gold-standard approach, it requires significant time and effort, and is highly susceptible to inter- and intra-observer variability. These limitations suggest a need for fast, robust, and accurate semi- or fully-automatic segmentation algorithms. However, the complex motion and anatomy of the heart, indistinct borders due to blood flow, the presence of trabeculations, intensity inhomogeneity, and various other imaging artifacts, makes the segmentation task challenging. In this work, we present and evaluate segmentation algorithms for multi-modal, multi-dimensional cardiac image datasets. Firstly, we segment the left ventricle (LV) blood-pool from a tri-plane 2D+time trans-esophageal (TEE) ultrasound acquisition using local phase based filtering and graph-cut technique, propagate the segmentation throughout the cardiac cycle using non-rigid registration-based motion extraction, and reconstruct the 3D LV geometry. Secondly, we segment the LV blood-pool and myocardium from an open-source 4D cardiac cine Magnetic Resonance Imaging (MRI) dataset by incorporating average atlas based shape constraint into the graph-cut framework and iterative segmentation refinement. The developed fast and robust framework is further extended to perform right ventricle (RV) blood-pool segmentation from a different open-source 4D cardiac cine MRI dataset. Next, we employ convolutional neural network based multi-task learning framework to segment the myocardium and regress its area, simultaneously, and show that segmentation based computation of the myocardial area is significantly better than that regressed directly from the network, while also being more interpretable. Finally, we impose a weak shape constraint via multi-task learning framework in a fully convolutional network and show improved segmentation performance for LV, RV and myocardium across healthy and pathological cases, as well as, in the challenging apical and basal slices in two open-source 4D cardiac cine MRI datasets. We demonstrate the accuracy and robustness of the proposed segmentation methods by comparing the obtained results against the provided gold-standard manual segmentations, as well as with other competing segmentation methods
    corecore