355 research outputs found

    Learning sound representations using trainable COPE feature extractors

    Get PDF
    Sound analysis research has mainly been focused on speech and music processing. The deployed methodologies are not suitable for analysis of sounds with varying background noise, in many cases with very low signal-to-noise ratio (SNR). In this paper, we present a method for the detection of patterns of interest in audio signals. We propose novel trainable feature extractors, which we call COPE (Combination of Peaks of Energy). The structure of a COPE feature extractor is determined using a single prototype sound pattern in an automatic configuration process, which is a type of representation learning. We construct a set of COPE feature extractors, configured on a number of training patterns. Then we take their responses to build feature vectors that we use in combination with a classifier to detect and classify patterns of interest in audio signals. We carried out experiments on four public data sets: MIVIA audio events, MIVIA road events, ESC-10 and TU Dortmund data sets. The results that we achieved (recognition rate equal to 91.71% on the MIVIA audio events, 94% on the MIVIA road events, 81.25% on the ESC-10 and 94.27% on the TU Dortmund) demonstrate the effectiveness of the proposed method and are higher than the ones obtained by other existing approaches. The COPE feature extractors have high robustness to variations of SNR. Real-time performance is achieved even when the value of a large number of features is computed.Comment: Accepted for publication in Pattern Recognitio

    A Graph-Kernel Method for Re-identification

    Get PDF
    Re-identification, that is recognizing that an object appearing in a scene is a reoccurrence of an object seen previously by the system (by the same camera or possibly by a different one) is a challenging problem in video surveillance. In this paper, the problem is addressed using a structural, graph-based representation of the objects of interest. A recently proposed graph kernel is adopted for extending to this representation the Principal Component Analyisis (PCA) technique. An experimental evaluation of the method has been performed on two video sequences from the publicly available PETS2009 database

    Supervised vessel delineation in retinal fundus images with the automatic selection of B-COSFIRE filters

    Get PDF
    The inspection of retinal fundus images allows medical doctors to diagnose various pathologies. Computer-aided diagnosis systems can be used to assist in this process. As a first step, such systems delineate the vessel tree from the background. We propose a method for the delineation of blood vessels in retinal images that is effective for vessels of different thickness. In the proposed method, we employ a set of B-COSFIRE filters selective for vessels and vesselendings. Such a set is determined in an automatic selection process and can adapt to different applications. We compare the performance of different selection methods based upon machine learning and information theory. The results that we achieve by performing experiments on two public benchmark data sets, namely DRIVE and STARE, demonstrate the effectiveness of the proposed approach

    Vessels delineation in retinal images using COSFIRE filters

    Get PDF
    Retinal image analysis is widely used in the medical community to diagnose several pathologies. The automatic analysis of such images is important to perform more ef-ficient diagnosis. We propose an effective method for the delineation of blood vessels in retinal images using train-able bar-selective COSFIRE filters. The results that we achieve on three publicly available data sets (DRIVE: Se = 0.7655, Sp = 0.9704; STARE: Se = 0.7763, Sp = 0.9695; CHASE DB1: Se = 0.7699, Sp = 0.9476) demonstrate the effectiveness of the proposed approach.peer-reviewe

    Learning skeleton representations for human action recognition

    Get PDF
    Automatic interpretation of human actions gained strong interest among researchers in patter recognition and computer vision because of its wide range of applications, such as in social and home robotics, elderly people health care, surveillance, among others. In this paper, we propose a method for recognition of human actions by analysis of skeleton poses. The method that we propose is based on novel trainable feature extractors, which can learn the representation of prototype skeleton examples and can be employed to recognize skeleton poses of interest. We combine the proposed feature extractors with an approach for classification of pose sequences based on string kernels. We carried out experiments on three benchmark data sets (MIVIA-S, MSRSDA and MHAD) and the results that we achieved are comparable or higher than the ones obtained by other existing methods. A further important contribution of this work is the MIVIA-S dataset, that we collected and made publicly available

    A Method for Counting People in Crowded Scenes

    Get PDF
    This paper presents a novel method to count people for video surveillance applications. Methods in the literature either follow a direct approach, by first detecting people and then counting them, or an indirect approach, by establishing a relation between some easily detectable scene features and the estimated number of people. The indirect approach is considerably more robust, but it is not easy to take into account such factors as perspective or people groups with different densities. The proposed technique, while based on the indirect approach, specifically addresses these problems; furthermore it is based on a trainable estimator that does not require an explicit formulation of a priori knowledge about the perspective and density effects present in the scene at hand. In the experimental evaluation, the method has been extensively compared with the algorithm by Albiol et al., which provided the highest performance at the PETS 2009 contest on people counting. The experimentation has used the public PETS 2009 datasets. The results confirm that the proposed method improves the accuracy, while retaining the robustness of the indirect approach

    An ensemble of rejecting classifiers for anomaly detection of audio events

    Get PDF
    Audio analytic systems are receiving an increasing interest in the scientific community, not only as stand alone systems for the automatic detection of abnormal events by the interpretation of the audio track, but also in conjunction with video analytics tools for enforcing the evidence of anomaly detection. In this paper we present an automatic recognizer of a set of abnormal audio events that works by extracting suitable features from the signals obtained by microphones installed into a surveilled area, and by classifying them using two classifiers that operate at different time resolutions. An original aspect of the proposed system is the estimation of the reliability of each response of the individual classifiers. In this way, each classifier is able to reject the samples having an overall reliability below a threshold. This approach allows our system to combine only reliable decisions, so increasing the overall performance of the method. The system has been tested on a large dataset of samples acquired from real world scenarios; the audio classes of interests are represented by gunshot, scream and glass breaking in addition to the background sounds. The preliminary results obtained encourage further research in this direction
    corecore