Search CORE

1,811 research outputs found

Audio-visual foreground extraction for event characterization

Author: Bicego Manuele
Cristani Marco
Murino Vittorio
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

This paper presents a new method able to integrate audio and visual information for scene analysis in a typical surveillance scenario, using only one camera and one monaural microphone. Visual information is analyzed by a standard visual background/foreground (BG/FG) modelling module, enhanced with a novelty detection stage, and coupled with an audio BG/FG modelling scheme. The audiovisual association is performed on-line, by exploiting the concept of synchrony. Experimental tests carrying out classification and clustering of events show all the potentialities of the proposed approach, also in comparison with the results obtained by using the single modalities

Catalogo dei prodotti della ricerca

UnissResearch

Review of Person Re-identification Techniques

Author: Aini Hussain
Allouch A.
Bhattacharyya A.
Bilmes J.A.
Cong D‐N.T.
Cong T.
Corvee E.
De Oliveira I.O.
Du Y.
Forsśen P.E.
Gheissari N.
Goldmann L.
Halimah Badioze Zaman
Hamdoun O.
Horprasert T.
Kawai R.
Khedher M.I.
Lantagne M.
Layne R.
Mohamad Hanif Md. Saad
Mohammad Ali Saghafi
Musa Z.B.
Nguyen H.Q.
Ohara Y.
Skog D.
Stauffer C.
Sun J.
Wang J.
Xiang J.
Yang H.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/12/2014
Field of study

Person re-identification across different surveillance cameras with disjoint fields of view has become one of the most interesting and challenging subjects in the area of intelligent video surveillance. Although several methods have been developed and proposed, certain limitations and unresolved issues remain. In all of the existing re-identification approaches, feature vectors are extracted from segmented still images or video frames. Different similarity or dissimilarity measures have been applied to these vectors. Some methods have used simple constant metrics, whereas others have utilised models to obtain optimised metrics. Some have created models based on local colour or texture information, and others have built models based on the gait of people. In general, the main objective of all these approaches is to achieve a higher-accuracy rate and lowercomputational costs. This study summarises several developments in recent literature and discusses the various available methods used in person re-identification. Specifically, their advantages and disadvantages are mentioned and compared.Comment: Published 201

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Real-time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification

Author: Ai Haizhou
Chen Long
Shang Chong
Zhuang Zijie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/09/2018
Field of study

Online multi-object tracking is a fundamental problem in time-critical video analysis applications. A major challenge in the popular tracking-by-detection framework is how to associate unreliable detection results with existing tracks. In this paper, we propose to handle unreliable detection by collecting candidates from outputs of both detection and tracking. The intuition behind generating redundant candidates is that detection and tracks can complement each other in different scenarios. Detection results of high confidence prevent tracking drifts in the long term, and predictions of tracks can handle noisy detection caused by occlusion. In order to apply optimal selection from a considerable amount of candidates in real-time, we present a novel scoring function based on a fully convolutional neural network, that shares most computations on the entire image. Moreover, we adopt a deeply learned appearance representation, which is trained on large-scale person re-identification datasets, to improve the identification ability of our tracker. Extensive experiments show that our tracker achieves real-time and state-of-the-art performance on a widely used people tracking benchmark.Comment: ICME 201

arXiv.org e-Print Archive

Crossref

Improving Bag-of-Words model with spatial information

Author: Mayo Michael
Zhang Edmond Yiwen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Bag-of-Words (BOW) models have recently become popular for the task of object recognition, owing to their good performance and simplicity. Much work has been proposed over the years to improve the BOW model, where the Spatial Pyramid Matching technique is the most notable. In this work, we propose three novel techniques to capture more re_ned spatial information between image features than that provided by the Spatial Pyramids. Our techniques demonstrate a performance gain over the Spatial Pyramid representation of the BOW model

CiteSeerX

Crossref

Research Commons@Waikato

Automatic recognition of Persian musical modes in audio musical signals

Author: Heydarian Peyman
Publication venue
Publication date: 01/06/2016
Field of study

This research proposes new approaches for computational identification of Persian musical modes. This involves constructing a database of audio musical files and developing computer algorithms to perform a musical analysis of the samples. Essential features, the spectral average, chroma, and pitch histograms, and the use of symbolic data, are discussed and compared. A tonic detection algorithm is developed to align the feature vectors and to make the mode recognition methods independent of changes in tonality. Subsequently, a geometric distance measure, such as the Manhattan distance, which is preferred, and cross correlation, or a machine learning method (the Gaussian Mixture Models), is used to gauge similarity between a signal and a set of templates that are constructed in the training phase, in which data-driven patterns are made for each dastgàh (Persian mode). The effects of the following parameters are considered and assessed: the amount of training data; the parts of the frequency range to be used for training; down sampling; tone resolution (12-TET, 24-TET, 48-TET and 53-TET); the effect of using overlapping or nonoverlapping frames; and silence and high-energy suppression in pre-processing. The santur (hammered string instrument), which is extensively used in the musical database samples, is described and its physical properties are characterised; the pitch and harmonic deviations characteristic of it are measured; and the inharmonicity factor of the instrument is calculated for the first time. The results are applicable to Persian music and to other closely related musical traditions of the Mediterranean and the Near East. This approach enables content-based analyses of, and content-based searches of, musical archives. Potential applications of this research include: music information retrieval, audio snippet (thumbnailing), music archiving and access to archival content, audio compression and coding, associating of images with audio content, music transcription, music synthesis, music editors, music instruction, automatic music accompaniment, and setting new standards and symbols for musical notation

London Met Repository