Search CORE

13 research outputs found

Trecvid 2019: an evaluation campaign to benchmark video activity detection, video captioning and matching, and video search & retrieval

Author: Awad George M.
Butt Asad A.
Delgado Andrew
Fiscus Jon
Godil Afzal
Graham Yvette
Lee Yooyoung
Smeaton Alan F.
Publication venue
Publication date: 12/11/2019
Field of study

DCU Online Research Access Service

TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search

Author: Awad George
Blasi Saverio
Butt Asad,
Curtis Keith
Delgado Andrew
Fiscus Jonathan
Godil Afzad
Graham Yvette
Joy David
Kraaij Wessel
Lee Yooyoung
Magalhaes Joao
Quénot Georges
Semedo David
Smeaton Alan,
Publication venue: HAL CCSD
Publication date: 13/11/2018
Field of study

International audienc

Hal - Université Grenoble Alpes

TRECVID 2015 – An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics

Author: Aly Robin
Awad George
Fiscus Jon
Joy David
Kraaij Wessel
Michel Martial
Ordelman Roeland
Over Paul
Quénot Georges
Smeaton Alan,
Publication venue: HAL CCSD
Publication date: 16/11/2015
Field of study

International audienc

Discovering visual concept structure with sparse and incomplete tags

Author: Andrew
Berger
Beyer
Breiman
Breiman
Cabral
Candès
Chen
Chen
Choi
Chua
Cover
Criminisi
Deng
Deng
Desai
Duin
Fellbaum
Feng
Frey
Frome
Frénay
Gong
Griffiths
Hardoon
Hu
Huang
Hwang
Jain
Jardine
Jingya Wang
Johnson
Klaser
Lin
Liu
Liu
Makadia
Montillo
Mu
Natarajan
Natarajan
Ng
Ngiam
Quadrianto
Rai
Rand
Russakovsky
Shaogang Gong
Sharma
Shi
Shotton
Simonyan
Sivic
Srivastava
Steinley
Truong
Vahdat
Vahdat
Vedaldi
Vinh
Vinh
Wang
Wei
Weinberger
Wu
Xiatian Zhu
Xu
Zhao
Zhao
Zheng
Zhou
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date: 30/05/2017
Field of study

This work was partially supported by the China Scholarship Council, Vision Semantics Limited, and Royal Society Newton Advanced Fellowship Programme (NA150459)

arXiv.org e-Print Archive

Crossref

Queen Mary Research Online

Intelligent Data Analytics using Deep Learning for Data Science

Author: Presa Reyes Maria E
Publication venue: FIU Digital Commons
Publication date: 13/05/2022
Field of study

Nowadays, data science stimulates the interest of academics and practitioners because it can assist in the extraction of significant insights from massive amounts of data. From the years 2018 through 2025, the Global Datasphere is expected to rise from 33 Zettabytes to 175 Zettabytes, according to the International Data Corporation. This dissertation proposes an intelligent data analytics framework that uses deep learning to tackle several difficulties when implementing a data science application. These difficulties include dealing with high inter-class similarity, the availability and quality of hand-labeled data, and designing a feasible approach for modeling significant correlations in features gathered from various data sources. The proposed intelligent data analytics framework employs a novel strategy for improving data representation learning by incorporating supplemental data from various sources and structures. First, the research presents a multi-source fusion approach that utilizes confident learning techniques to improve the data quality from many noisy sources. Meta-learning methods based on advanced techniques such as the mixture of experts and differential evolution combine the predictive capacity of individual learners with a gating mechanism, ensuring that only the most trustworthy features or predictions are integrated to train the model. Then, a Multi-Level Convolutional Fusion is presented to train a model on the correspondence between local-global deep feature interactions to identify easily confused samples of different classes. The convolutional fusion is further enhanced with the power of Graph Transformers, aggregating the relevant neighboring features in graph-based input data structures and achieving state-of-the-art performance on a large-scale building damage dataset. Finally, weakly-supervised strategies, noise regularization, and label propagation are proposed to train a model on sparse input labeled data, ensuring the model\u27s robustness to errors and supporting the automatic expansion of the training set. The suggested approaches outperformed competing strategies in effectively training a model on a large-scale dataset of 500k photos, with just about 7% of the images annotated by a human. The proposed framework\u27s capabilities have benefited various data science applications, including fluid dynamics, geometric morphometrics, building damage classification from satellite pictures, disaster scene description, and storm-surge visualization

DigitalCommons@Florida International University

Spatio-Temporal Multimedia Big Data Analytics Using Deep Neural Networks

Author: Pouyanfar Samira
Publication venue: FIU Digital Commons
Publication date: 01/01/2019
Field of study

With the proliferation of online services and mobile technologies, the world has stepped into a multimedia big data era, where new opportunities and challenges appear with the high diversity multimedia data together with the huge amount of social data. Nowadays, multimedia data consisting of audio, text, image, and video has grown tremendously. With such an increase in the amount of multimedia data, the main question raised is how one can analyze this high volume and variety of data in an efficient and effective way. A vast amount of research work has been done in the multimedia area, targeting different aspects of big data analytics, such as the capture, storage, indexing, mining, and retrieval of multimedia big data. However, there is insufficient research that provides a comprehensive framework for multimedia big data analytics and management. To address the major challenges in this area, a new framework is proposed based on deep neural networks for multimedia semantic concept detection with a focus on spatio-temporal information analysis and rare event detection. The proposed framework is able to discover the pattern and knowledge of multimedia data using both static deep data representation and temporal semantics. Specifically, it is designed to handle data with skewed distributions. The proposed framework includes the following components: (1) a synthetic data generation component based on simulation and adversarial networks for data augmentation and deep learning training, (2) an automatic sampling model to overcome the imbalanced data issue in multimedia data, (3) a deep representation learning model leveraging novel deep learning techniques to generate the most discriminative static features from multimedia data, (4) an automatic hyper-parameter learning component for faster training and convergence of the learning models, (5) a spatio-temporal deep learning model to analyze dynamic features from multimedia data, and finally (6) a multimodal deep learning fusion model to integrate different data modalities. The whole framework has been evaluated using various large-scale multimedia datasets that include the newly collected disaster-events video dataset and other public datasets

DigitalCommons@Florida International University

Multimodal Data Analytics and Fusion for Data Science

Author: Tian Haiman
Publication venue: FIU Digital Commons
Publication date: 01/01/2019
Field of study

Advances in technologies have rapidly accumulated a zettabyte of “new” data every two years. The huge amount of data have a powerful impact on various areas in science and engineering and generates enormous research opportunities, which calls for the design and development of advanced approaches in data analytics. Given such demands, data science has become an emerging hot topic in both industry and academia, ranging from basic business solutions, technological innovations, and multidisciplinary research to political decisions, urban planning, and policymaking. Within the scope of this dissertation, a multimodal data analytics and fusion framework is proposed for data-driven knowledge discovery and cross-modality semantic concept detection. The proposed framework can explore useful knowledge hidden in different formats of data and incorporate representation learning from data in multimodalities, especial for disaster information management. First, a Feature Affinity-based Multiple Correspondence Analysis (FA-MCA) method is presented to analyze the correlations between low-level features from different features, and an MCA-based Neural Network (MCA-NN) ispro- posedto capture the high-level features from individual FA-MCA models and seamlessly integrate the semantic data representations for video concept detection. Next, a genetic algorithm-based approach is presented for deep neural network selection. Furthermore, the improved genetic algorithm is integrated with deep neural networks to generate populations for producing optimal deep representation learning models. Then, the multimodal deep representation learning framework is proposed to incorporate the semantic representations from data in multiple modalities efficiently. At last, fusion strategies are applied to accommodate multiple modalities. In this framework, cross-modal mapping strategies are also proposed to organize the features in a better structure to improve the overall performance

DigitalCommons@Florida International University

Audio Event Classification Using Deep Learning Methods

Author: Xu Zhicun
Publication venue
Publication date: 10/12/2018
Field of study

Whether crossing the road or enjoying a concert, sound carries important information about the world around us. Audio event classification refers to recognition tasks involving the assignment of one or several labels, such as ‘dog bark’ or ‘doorbell’, to a particular audio signal. Thus, teaching machines to conduct this classification task can help humans in many fields. Since deep learning has shown its great potential and usefulness in many AI applications, this thesis focuses on studying deep learning methods and building suitable neural networks for this audio event classification task. In order to evaluate the performance of different neural networks, we tested them on both Google AudioSet and the dataset for DCASE 2018 Task 2. Instead of providing original audio files, AudioSet offers compact 128-dimensional embeddings outputted by a modified VGG model for audio with a frame length of 960ms. For DCASE 2018 Task 2, we firstly preprocessed the soundtracks and then fine-tuned the VGG model that AudioSet used as a feature extractor. Thus, each soundtrack from both tasks is represented as a series of 128-dimensional features. We then compared the DNN, LSTM, and multi-level attention models with different hyper parameters. The results show that fine-tuning the feature generation model for the DCASE task greatly improved the evaluation score. In addition, the attention models were found to perform the best in our settings for both tasks. The results indicate that utilizing a CNN-like model as a feature extractor for the log-mel spectrograms and modeling the dynamics information using an attention model can achieve state-of-the-art results in the task of audio event classification. For future research, the thesis suggests training a better CNN model for feature extraction, utilizing multi-scale and multi-level features for better classification, and combining the audio features with other multimodal information for audiovisual data analysis

Aaltodoc Publication Archive

Interactive video retrieval using implicit user feedback.

Author: Vrochidis Stefanos
Publication venue: 'Queen Mary University of London'
Publication date: 01/03/2013
Field of study

PhDIn the recent years, the rapid development of digital technologies and the low cost of recording media have led to a great increase in the availability of multimedia content worldwide. This availability places the demand for the development of advanced search engines. Traditionally, manual annotation of video was one of the usual practices to support retrieval. However, the vast amounts of multimedia content make such practices very expensive in terms of human effort. At the same time, the availability of low cost wearable sensors delivers a plethora of user-machine interaction data. Therefore, there is an important challenge of exploiting implicit user feedback (such as user navigation patterns and eye movements) during interactive multimedia retrieval sessions with a view to improving video search engines. In this thesis, we focus on automatically annotating video content by exploiting aggregated implicit feedback of past users expressed as click-through data and gaze movements. Towards this goal, we have conducted interactive video retrieval experiments, in order to collect click-through and eye movement data in not strictly controlled environments. First, we generate semantic relations between the multimedia items by proposing a graph representation of aggregated past interaction data and exploit them to generate recommendations, as well as to improve content-based search. Then, we investigate the role of user gaze movements in interactive video retrieval and propose a methodology for inferring user interest by employing support vector machines and gaze movement-based features. Finally, we propose an automatic video annotation framework, which combines query clustering into topics by constructing gaze movement-driven random forests and temporally enhanced dominant sets, as well as video shot classification for predicting the relevance of viewed items with respect to a topic. The results show that exploiting heterogeneous implicit feedback from past users is of added value for future users of interactive video retrieval systems

Queen Mary Research Online