39,477 research outputs found

    Automated classification of cricket pitch frames in cricket video

    Get PDF
    The automated detection of the cricket pitch in a video recording of a cricket match is a fundamental step in content-based indexing and summarization of cricket videos. In this paper, we propose visualcontent based algorithms to automate the extraction of video frames with the cricket pitch in focus. As a preprocessing step, we first select a subset of frames with a view of the cricket field, of which the cricket pitch forms a part. This filtering process reduces the search space by eliminating frames that contain a view of the audience, close-up shots of specific players, advertisements, etc. The subset of frames containing the cricket field is then subject to statistical modeling of the grayscale (brightness) histogram (SMoG). Since SMoG does not utilize color or domain-specific information such as the region in the frame where the pitch is expected to be located, we propose an alternative algorithm: component quantization based region of interest extraction (CQRE) for the extraction of pitch frames. Experimental results demonstrate that, regardless of the quality of the input, successive application of the two methods outperforms either one applied exclusively. The SMoG-CQRE combination for pitch frame classification yields an average accuracy of 98:6% in the best case (a high resolution video with good contrast) and an average accuracy of 87:9% in the worst case (a low resolution video with poor contrast). Since, the extraction of pitch frames forms the first step in analyzing the important events in a match, we also present a post-processing step, viz. , an algorithm to detect players in the extracted pitch frames

    BaNa: a noise resilient fundamental frequency detection algorithm for speech and music

    Get PDF
    Fundamental frequency (F0) is one of the essential features in many acoustic related applications. Although numerous F0 detection algorithms have been developed, the detection accuracy in noisy environments still needs improvement. We present a hybrid noise resilient F0 detection algorithm named BaNa that combines the approaches of harmonic ratios and Cepstrum analysis. A Viterbi algorithm with a cost function is used to identify the F0 value among several F0 candidates. Speech and music databases with eight different types of additive noise are used to evaluate the performance of the BaNa algorithm and several classic and state-of-the-art F0 detection algorithms. Results show that for almost all types of noise and signal-to-noise ratio (SNR) values investigated, BaNa achieves the lowest Gross Pitch Error (GPE) rate among all the algorithms. Moreover, for the 0 dB SNR scenarios, the BaNa algorithm is shown to achieve 20% to 35% GPE rate for speech and 12% to 39% GPE rate for music. We also describe implementation issues that must be addressed to run the BaNa algorithm as a real-time application on a smartphone platform.Peer ReviewedPostprint (author's final draft

    Multiple-F0 estimation of piano sounds exploiting spectral structure and temporal evolution

    Get PDF
    This paper proposes a system for multiple fundamental frequency estimation of piano sounds using pitch candidate selection rules which employ spectral structure and temporal evolution. As a time-frequency representation, the Resonator Time-Frequency Image of the input signal is employed, a noise suppression model is used, and a spectral whitening procedure is performed. In addition, a spectral flux-based onset detector is employed in order to select the steady-state region of the produced sound. In the multiple-F0 estimation stage, tuning and inharmonicity parameters are extracted and a pitch salience function is proposed. Pitch presence tests are performed utilizing information from the spectral structure of pitch candidates, aiming to suppress errors occurring at multiples and sub-multiples of the true pitches. A novel feature for the estimation of harmonically related pitches is proposed, based on the common amplitude modulation assumption. Experiments are performed on the MAPS database using 8784 piano samples of classical, jazz, and random chords with polyphony levels between 1 and 6. The proposed system is computationally inexpensive, being able to perform multiple-F0 estimation experiments in realtime. Experimental results indicate that the proposed system outperforms state-of-the-art approaches for the aforementioned task in a statistically significant manner. Index Terms: multiple-F0 estimation, resonator timefrequency image, common amplitude modulatio

    Exploiting Contextual Information for Prosodic Event Detection Using Auto-Context

    Get PDF
    Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are important aspects of speech. In the field of prosodic event detection, many local acoustic features have been investigated; however, contextual information has not yet been thoroughly exploited. The most difficult aspect of this lies in learning the long-distance contextual dependencies effectively and efficiently. To address this problem, we introduce the use of an algorithm called auto-context. In this algorithm, a classifier is first trained based on a set of local acoustic features, after which the generated probabilities are used along with the local features as contextual information to train new classifiers. By iteratively using updated probabilities as the contextual information, the algorithm can accurately model contextual dependencies and improve classification ability. The advantages of this method include its flexible structure and the ability of capturing contextual relationships. When using the auto-context algorithm based on support vector machine, we can improve the detection accuracy by about 3% and F-score by more than 7% on both two-way and four-way pitch accent detections in combination with the acoustic context. For boundary detection, the accuracy improvement is about 1% and the F-score improvement reaches 12%. The new algorithm outperforms conditional random fields, especially on boundary detection in terms of F-score. It also outperforms an n-gram language model on the task of pitch accent detection

    Pitch-based speech perturbation measures using a novel GCI detection algorithm: Application to pathological voice classification

    Get PDF
    International audienceClassical pitch-based perturbation measures, such as Jitter and Shimmer, are generally based on detection algorithms of pitch marks which assume the existence of a periodic pitch pattern and/or rely on the linear source-filter speech model. While these assumptions can hold for normal speech, they are generally not valid for pathological speech. The latter can indeed present strong aperiodicity, nonlinearity and turbulence/noise. Recently, we introduced on a novel nonlinear algorithm for Glottal Closure Instants (GCI) detection which has the strong advantage of not making such assumptions. In this paper, we use this new algorithm to compute standard pitch-based perturbation measures and compare its performances to the widely used tool PRAAT. We address the task of classification between normal and pathological speech, and carry out the experiments using the popular MEEI database. The results show that our algorithm leads to significantly higher classification accuracy than PRAAT. Moreover, some important statistical features become significantly discriminative, while they are meaningless when using PRAAT (in the sense that they have almost no discrimination power)

    A scalable system for microcalcification cluster automated detection in a distributed mammographic database

    Get PDF
    A computer-aided detection (CADe) system for microcalcification cluster identification in mammograms has been developed in the framework of the EU-founded MammoGrid project. The CADe software is mainly based on wavelet transforms and artificial neural networks. It is able to identify microcalcifications in different datasets of mammograms (i.e. acquired with different machines and settings, digitized with different pitch and bit depth or direct digital ones). The CADe can be remotely run from GRID-connected acquisition and annotation stations, supporting clinicians from geographically distant locations in the interpretation of mammographic data. We report and discuss the system performances on different datasets of mammograms and the status of the GRID-enabled CADe analysis.Comment: 6 pages, 4 figures; Proceedings of the IEEE NNS and MIC Conference, October 23-29, 2005, Puerto Ric

    Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

    Get PDF
    In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated
    • …
    corecore