5 research outputs found

    Comparison for Improvements of Singing Voice Detection System Based on Vocal Separation

    Full text link
    Singing voice detection is the task to identify the frames which contain the singer vocal or not. It has been one of the main components in music information retrieval (MIR), which can be applicable to melody extraction, artist recognition, and music discovery in popular music. Although there are several methods which have been proposed, a more robust and more complete system is desired to improve the detection performance. In this paper, our motivation is to provide an extensive comparison in different stages of singing voice detection. Based on the analysis a novel method was proposed to build a more efficiently singing voice detection system. In the proposed system, there are main three parts. The first is a pre-process of singing voice separation to extract the vocal without the music. The improvements of several singing voice separation methods were compared to decide the best one which is integrated to singing voice detection system. And the second is a deep neural network based classifier to identify the given frames. Different deep models for classification were also compared. The last one is a post-process to filter out the anomaly frame on the prediction result of the classifier. The median filter and Hidden Markov Model (HMM) based filter as the post process were compared. Through the step by step module extension, the different methods were compared and analyzed. Finally, classification performance on two public datasets indicates that the proposed approach which based on the Long-term Recurrent Convolutional Networks (LRCN) model is a promising alternative.Comment: 15 page

    Combining blockwise and multi-coefficient stepwise approches in a general framework for online audio source separation

    Get PDF
    This article considers the problem of online audio source separation. Various algorithms can be found in the literature, featuring either blockwise or stepwise approaches, and using either the spectral or spatial characteristics of the sound sources of a mixture. We offer an algorithm that can combine both stepwise and blockwise approaches, and that can use spectral and spatial information. We propose a method for pre-processing the data of each block and offer a way to deduce an Equivalent Rectangular Bandwith time-frequency representation out of a Short-Time Fourier Transform. The efficiency of our algorithm is then tested for various parameters and the effect of each of those parameters on the quality of separation and on the computation time is then discussed

    Browse-to-search

    Full text link
    This demonstration presents a novel interactive online shopping application based on visual search technologies. When users want to buy something on a shopping site, they usually have the requirement of looking for related information from other web sites. Therefore users need to switch between the web page being browsed and other websites that provide search results. The proposed application enables users to naturally search products of interest when they browse a web page, and make their even causal purchase intent easily satisfied. The interactive shopping experience is characterized by: 1) in session - it allows users to specify the purchase intent in the browsing session, instead of leaving the current page and navigating to other websites; 2) in context - -the browsed web page provides implicit context information which helps infer user purchase preferences; 3) in focus - users easily specify their search interest using gesture on touch devices and do not need to formulate queries in search box; 4) natural-gesture inputs and visual-based search provides users a natural shopping experience. The system is evaluated against a data set consisting of several millions commercial product images. © 2012 Authors
    corecore