2,520 research outputs found

    K-Space at TRECVid 2007

    Get PDF
    In this paper we describe K-Space participation in TRECVid 2007. K-Space participated in two tasks, high-level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission utilized multi-modal low-level features which included visual, audio and temporal elements. Specific concept detectors (such as Face detectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination. This year we also participated in interactive search, submitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance. The first of the two systems was a ā€˜shotā€™ based interface, where the results from a query were presented as a ranked list of shots. The second interface was ā€˜broadcastā€™ based, where results were presented as a ranked list of broadcasts. Both systems made use of the outputs of our high-level feature submission as well as low-level visual features

    Adaptive Decision Fusion for Audio-Visual Speech Recognition

    Get PDF

    Fuzzy Layered Convolution Neutral Network for Feature Level Fusion Based On Multimodal Sentiment Classification

    Get PDF
    Multimodal sentiment analysis (MSA) is one of the core research topics of natural language processing (NLP). MSA has become a challenge for scholars and is equally complicated for an appliance to comprehend. One study that supports MS difficulties is the MSA, which is learning opinions, emotions, and attitudes in an audio-visual format. In order words, using such diverse modalities to obtain opinions and identify emotions is necessary. Such utilization can be achieved via modality data fusion, such as feature fusion. In handling the data fusion of such diverse modalities while obtaining high performance, a typical machine learning algorithm is Deep Learning (DL), particularly the Convolutional Neutral Network (CNN), which has the capacity to handle tasks of great intricacy and difficulty. In this paper, we present a CNN architecture with an integrated layer via fuzzy methodologies for MSA, a task yet to be explored in improving the accuracy performance of CNN for diverse inputs. Experiments conducted on a benchmark multimodal dataset, MOSI, obtaining 37.5% and 81% on seven (7) class and binary classification respectively, reveals an improved accuracy performance compared with the typical CNN, which acquired 28.9% and 78%, respectively

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin
    • ā€¦
    corecore