7 research outputs found

    Hybrid machine learning approaches for scene understanding: From segmentation and recognition to image parsing

    Get PDF
    We alleviate the problem of semantic scene understanding by studies on object segmentation/recognition and scene labeling methods respectively. We propose new techniques for joint recognition, segmentation and pose estimation of infrared (IR) targets. The problem is formulated in a probabilistic level set framework where a shape constrained generative model is used to provide a multi-class and multi-view shape prior and where the shape model involves a couplet of view and identity manifolds (CVIM). A level set energy function is then iteratively optimized under the shape constraints provided by the CVIM. Since both the view and identity variables are expressed explicitly in the objective function, this approach naturally accomplishes recognition, segmentation and pose estimation as joint products of the optimization process. For realistic target chips, we solve the resulting multi-modal optimization problem by adopting a particle swarm optimization (PSO) algorithm and then improve the computational efficiency by implementing a gradient-boosted PSO (GB-PSO). Evaluation was performed using the Military Sensing Information Analysis Center (SENSIAC) ATR database, and experimental results show that both of the PSO algorithms reduce the cost of shape matching during CVIM-based shape inference. Particularly, GB-PSO outperforms other recent ATR algorithms, which require intensive shape matching, either explicitly (with pre-segmentation) or implicitly (without pre-segmentation). On the other hand, under situations when target boundaries are not obviously observed and object shapes are not preferably detected, we explored some sparse representation classification (SRC) methods on ATR applications, and developed a fusion technique that combines the traditional SRC and a group constrained SRC algorithm regulated by a sparsity concentration index for improved classification accuracy on the Comanche dataset. Moreover, we present a compact rare class-oriented scene labeling framework (RCSL) with a global scene assisted rare class retrieval process, where the retrieved subset was expanded by choosing scene regulated rare class patches. A complementary rare class balanced CNN is learned to alleviate imbalanced data distribution problem at lower cost. A superpixels-based re-segmentation was implemented to produce more perceptually meaningful object boundaries. Quantitative results demonstrate the promising performances of proposed framework on both pixel and class accuracy for scene labeling on the SIFTflow dataset, especially for rare class objects

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
    corecore