1,214 research outputs found

    Learning Generative ConvNets via Multi-grid Modeling and Sampling

    Full text link
    This paper proposes a multi-grid method for learning energy-based generative ConvNet models of images. For each grid, we learn an energy-based probabilistic model where the energy function is defined by a bottom-up convolutional neural network (ConvNet or CNN). Learning such a model requires generating synthesized examples from the model. Within each iteration of our learning algorithm, for each observed training image, we generate synthesized images at multiple grids by initializing the finite-step MCMC sampling from a minimal 1 x 1 version of the training image. The synthesized image at each subsequent grid is obtained by a finite-step MCMC initialized from the synthesized image generated at the previous coarser grid. After obtaining the synthesized examples, the parameters of the models at multiple grids are updated separately and simultaneously based on the differences between synthesized and observed examples. We show that this multi-grid method can learn realistic energy-based generative ConvNet models, and it outperforms the original contrastive divergence (CD) and persistent CD.Comment: CVPR 201

    Panoramic Annular Localizer: Tackling the Variation Challenges of Outdoor Localization Using Panoramic Annular Images and Active Deep Descriptors

    Full text link
    Visual localization is an attractive problem that estimates the camera localization from database images based on the query image. It is a crucial task for various applications, such as autonomous vehicles, assistive navigation and augmented reality. The challenging issues of the task lie in various appearance variations between query and database images, including illumination variations, dynamic object variations and viewpoint variations. In order to tackle those challenges, Panoramic Annular Localizer into which panoramic annular lens and robust deep image descriptors are incorporated is proposed in this paper. The panoramic annular images captured by the single camera are processed and fed into the NetVLAD network to form the active deep descriptor, and sequential matching is utilized to generate the localization result. The experiments carried on the public datasets and in the field illustrate the validation of the proposed system.Comment: Accepted by ITSC 201

    Two Heads are Better than One: A Bio-inspired Method for Improving Classification on EEG-ET Data

    Full text link
    Classifying EEG data is integral to the performance of Brain Computer Interfaces (BCI) and their applications. However, external noise often obstructs EEG data due to its biological nature and complex data collection process. Especially when dealing with classification tasks, standard EEG preprocessing approaches extract relevant events and features from the entire dataset. However, these approaches treat all relevant cognitive events equally and overlook the dynamic nature of the brain over time. In contrast, we are inspired by neuroscience studies to use a novel approach that integrates feature selection and time segmentation of EEG data. When tested on the EEGEyeNet dataset, our proposed method significantly increases the performance of Machine Learning classifiers while reducing their respective computational complexity.Comment: 6 pages, 3 figures, HCI International 2023 Poste

    A Developmental Learning Approach of Mobile Manipulator via Playing

    Get PDF
    Inspired by infant development theories, a robotic developmental model combined with game elements is proposed in this paper. This model does not require the definition of specific developmental goals for the robot, but the developmental goals are implied in the goals of a series of game tasks. The games are characterized into a sequence of game modes based on the complexity of the game tasks from simple to complex, and the task complexity is determined by the applications of developmental constraints. Given a current mode, the robot switches to play in a more complicated game mode when it cannot find any new salient stimuli in the current mode. By doing so, the robot gradually achieves it developmental goals by playing different modes of games. In the experiment, the game was instantiated into a mobile robot with the playing task of picking up toys, and the game is designed with a simple game mode and a complex game mode. A developmental algorithm, “Lift-Constraint, Act and Saturate,” is employed to drive the mobile robot move from the simple mode to the complex one. The experimental results show that the mobile manipulator is able to successfully learn the mobile grasping ability after playing simple and complex games, which is promising in developing robotic abilities to solve complex tasks using games

    IMPROVED DESIGN OF DTW AND GMM CASCADED ARABIC SPEAKER

    Get PDF
    In this paper, we discuss about the design, implementation and assessment of a two-stage Arabic speaker recognition system, which aims to recognize a target Arabic speaker among several people. The first stage uses improved DTW (Dynamic Time Warping) algorithm and the second stage uses SA-KM-based GMM (Gaussian Mixture Model). MFCC (Mel Frequency Cepstral Coefficients) and its differences form, as acoustic feature, are extracted from the sample speeches. DTW provides three most possible speakers and then the recognition results are conveyed to GMM training processes. A specified similarity assessment algorithm, KL distance, is applied to find the best match with the target speaker. Experimental results show that text-independent recognition rate of the cascaded system reaches 90 percent
    • …
    corecore