14 research outputs found

    Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media

    Get PDF
    Person discovery in the absence of prior identity knowledge requires accurate association of visual and auditory cues. In broadcast data, multimodal analysis faces additional challenges due to narrated voices over muted scenes or dubbing in different languages. To address these challenges, we define and analyze the problem of dubbing detection in broadcast data, which has not been explored before. We propose a method to represent the temporal relationship between the auditory and visual streams. This method consists of canonical correlation analysis to learn a joint multimodal space, and long short term memory (LSTM) networks to model cross-modality temporal dependencies. Our contributions also include the introduction of a newly acquired dataset of face-speech segments from TV data, which we have made publicly available. The proposed method achieves promising performance on this real world dataset as compared to several baselines

    Random noise suppression using normalized convolution filter

    No full text

    Exploiting stereoscopic disparity for augmenting human activity recognition performance

    No full text
    This work investigates several ways to exploit scene depth information, implicitly available through the modality of stereoscopic disparity in 3D videos, with the purpose of augmenting performance in the problem of recognizing complex human activities in natural settings. The standard state-of-the-art activity recognition algorithmic pipeline consists in the consecutive stages of video description, video representation and video classification. Multimodal, depth-aware modifications to standard methods are being proposed and studied, both for video description and for video representation, that indirectly incorporate scene geometry information derived from stereo disparity. At the descriptionlevel, this is made possible by suitably manipulating video interest points based on disparity data. At the representation level, the followed approach represents each video by multiple vectors corresponding to different disparity zones, resulting in multiple activity descriptions defined by disparity characteristics. In both cases, a scene segmentation is thus implicitly implemented, based on the distance of each imaged object from the camera during video acquisition. The investigated approaches are flexible and able to cooperate with any monocular low-level feature descriptor. They are evaluated using a publicly available activity recognition dataset of unconstrained stereoscopic 3D videos, consisting inextracts from Hollywood movies, and compared both against competing depth-aware approaches and a state-of-the-art monocular algorithm. Quantitative evaluation reveals that some of the examined approaches achieve state-of-the-art performance

    Эффективный алгоритм обнаружения дыма и пламени с использованием цветного и вейвлет-анализа

    Get PDF
    Fire detection is an important task in many applications. Smoke and flame are two essential symbols of fire in images. In this paper, we propose an algorithm to detect smoke and flame simultaneously for color dynamic video sequences obtained from a stationary camera in open space. Motion is a common feature of smoke and flame and usually has been used at the beginning for extraction from a current frame of candidate areas. The adaptive background subtraction has been utilized at a stage of moving detection. In addition, the optical flow-based movement estimation has been applied to identify a chaotic motion. With the spatial and temporal wavelet analysis, Weber contrast analysis and color segmentation, we achieved moving blobs classification. Real video surveillance sequences from publicly available datasets have been used for smoke detection with the utilization of our algorithm. We also have conducted a set of experiments. Experiments results have shown that our algorithm can achieve higher detection rate of 87% for smoke and 92% for flame
    corecore