Search CORE

4 research outputs found

Research and Practice on Fusion of Visual and Audio Perception

Author: 李剑
Publication venue
Publication date: 22/05/2015
Field of study

随着监控系统智能化的快速发展，监控数据在交通、环境、安防等领域发挥着越来越重要的作用。受人类感知模型的启发，利用音频数据与视频数据的互补效应对场景进行感知具有较好地研究价值。然而随之产生的海量监控数据越来越难以检索，这迫使人们寻找更加有效地分析方法，从而将人从重复的劳动中解脱出来。因此，音视频融合感知技术不仅具有重要的理论研究价值，在应用前景上也是大有可为。本文研究了当前音视频融合感知领域发展的现状，以传统视频监控平台为基础，设计了音视频融合感知的体系结构。立足于音视频内容分析，研究了基于音视频融合感知的暴力场景分析模型。本文主要贡献如下： 1. 以音视频融合感知的监控平台为出发点，设计...With the rapid development of intelligent monitoring system, monitoring data is playing an increasingly important role in traffic, environment, security and the other fields. Inspired by the model of human perception, people use the complementary effect of audio and visual data to percept the scene. And then the huge amount of visual-audio data forces people to look for a more effective way to ana...学位：工学硕士院系专业：信息科学与技术学院_计算机科学与技术学号：2302012115292

Xiamen University Institutional Repository

Recognition of Emotions using Energy Based Bimodal Information Fusion and Correlation

Author: Asawa Krishna
Manchanda Priyanka
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 05/02/2020
Field of study

Multi-sensor information fusion is a rapidly developing research area which forms the backbone of numerous essential technologies such as intelligent robotic control, sensor networks, video and image processing and many more. In this paper, we have developed a novel technique to analyze and correlate human emotions expressed in voice tone & facial expression. Audio and video streams captured to populate audio and video bimodal data sets to sense the expressed emotions in voice tone and facial expression respectively. An energy based mapping is being done to overcome the inherent heterogeneity of the recorded bi-modal signal. The fusion process uses sampled and mapped energy signal of both modalities’s data stream and further recognize the overall emotional component using Support Vector Machine (SVM) classifier with the accuracy 93.06%

Re-UNIR

Unsupervised methods in multilingual and multimodal semantic modeling

Author: Hazara Murtaza
Publication venue
Publication date: 29/09/2014
Field of study

In the first part of this project, independent component analysis has been applied to extract word clusters from two Farsi corpora. Both word-document and word-context matrices have been considered to extract such clusters. The application of ICA on the word-document matrices extracted from these two corpora led to the detection of syntagmatic word clusters, while the utilization of word-context matrix resulted in the extraction of both syntagmatic and paradigmatic word clusters. Furthermore, we have discussed some potential benefits of this automatically extracted thesaurus. In such a thesaurus, a word is defined by some other words without being connected to the outer physical objects. In order to fill such a gap, symbol grounding has been proposed by philosophers as a mechanism which might connect words to their physical referents. From their point of view, if words are properly connected to their referents, their meaning might be realized. Once this objective is achieved, a new promising horizon would open in the realm of artificial intelligence. In the second part of the project, we have offered a simple but novel method for grounding words based on the features coming from the visual modality. Firstly, indexical grounding is implemented. In this naïve symbol grounding method, a word is characterized using video indexes as its context. Secondly, such indexical word vectors have been normalized according to the features calculated for motion videos. This multimodal fusion has been referred to as the pattern grounding. In addition, the indexical word vectors have been normalized using some randomly generated data instead of the original motion features. This third case was called randomized grounding. These three cases of symbol grounding have been compared in terms of the performance of translation. Besides that, word clusters have been excerpted by comparing the vector distances and from the dendrograms generated using an agglomerative hierarchical clustering method. We have observed that pattern grounding exceled the indexical grounding in the translation of the motion annotated words, while randomized grounding has deteriorated the translation significantly. Moreover, pattern grounding culminated in the formation of clusters in which a word fit semantically to the other members, while using the indexical grounding, some of the closely related words dispersed into arbitrary clusters

Aaltodoc Publication Archive