19 research outputs found
Spatio-Temporal Multimedia Big Data Analytics Using Deep Neural Networks
With the proliferation of online services and mobile technologies, the world has stepped into a multimedia big data era, where new opportunities and challenges appear with the high diversity multimedia data together with the huge amount of social data. Nowadays, multimedia data consisting of audio, text, image, and video has grown tremendously. With such an increase in the amount of multimedia data, the main question raised is how one can analyze this high volume and variety of data in an efficient and effective way. A vast amount of research work has been done in the multimedia area, targeting different aspects of big data analytics, such as the capture, storage, indexing, mining, and retrieval of multimedia big data. However, there is insufficient research that provides a comprehensive framework for multimedia big data analytics and management.
To address the major challenges in this area, a new framework is proposed based on deep neural networks for multimedia semantic concept detection with a focus on spatio-temporal information analysis and rare event detection. The proposed framework is able to discover the pattern and knowledge of multimedia data using both static deep data representation and temporal semantics. Specifically, it is designed to handle data with skewed distributions. The proposed framework includes the following components: (1) a synthetic data generation component based on simulation and adversarial networks for data augmentation and deep learning training, (2) an automatic sampling model to overcome the imbalanced data issue in multimedia data, (3) a deep representation learning model leveraging novel deep learning techniques to generate the most discriminative static features from multimedia data, (4) an automatic hyper-parameter learning component for faster training and convergence of the learning models, (5) a spatio-temporal deep learning model to analyze dynamic features from multimedia data, and finally (6) a multimodal deep learning fusion model to integrate different data modalities. The whole framework has been evaluated using various large-scale multimedia datasets that include the newly collected disaster-events video dataset and other public datasets
Deep Spatio-Temporal Representation Learning for Multi-Class Imbalanced Data Classification
Deep learning, particularly Convolutional Neural Networks (CNNs), has significantly improved visual data processing. In recent years, video classification has attracted significant attention in the multimedia and deep learning community. It is one of the most challenging tasks since both visual and temporal information should be processed effectively. Existing techniques either disregard temporal information between video sequences or generate very complex and computationally expensive models to integrate the spatio-temporal data. In addition, most deep learning techniques do not automatically consider the data imbalance problem. This paper presents an effective deep learning framework for imbalanced video classification by utilizing both spatial and temporal information. This framework includes a spatio-temporal synthetic oversampling to handle data with a skewed distribution, a pre-trained CNN model for spatial sequence feature extraction, followed by a residual bidirectional Long Short Term Memory (LSTM) to capture temporal knowledge in video datasets. Experimental results on two imbalanced video datasets demonstrate the superiority of the proposed framework compared to the state-of-the-art approaches
An efficient deep residual-inception network for multimedia classification
Deep learning has led to many breakthroughs in machine perception and data mining. Although there are many substantial advances of deep learning in the applications of image recognition and natural language processing, very few work has been done in video analysis and semantic event detection. Very deep inception and residual networks have yielded promising results in the 2014 and 2015 ILSVRC challenges, respectively. Now the question is whether these architectures are applicable to and computationally reasonable in a variety of multimedia datasets. To answer this question, an efficient and lightweight deep convolutional network is proposed in this paper. This network is carefully designed to decrease the depth and width of the state-of-the-art networks while maintaining the high-performance. The proposed deep network includes the traditional convolutional architecture in conjunction with residual connections and very light inception modules. Experimental results demonstrate that the proposed network not only accelerates the training procedure, but also improves the performance in different multimedia classification tasks
Recommended from our members
Multimodal deep learning based on multiple correspondence analysis for disaster management
The fast and explosive growth of digital data in social media and World Wide Web has led to numerous opportunities and research activities in multimedia big data. Among them, disaster management applications have attracted a lot of attention in recent years due to its impacts on society and government. This study targets content analysis and mining for disaster management. Specifically, a multimedia big data framework based on the advanced deep learning techniques is proposed. First, a video dataset of natural disasters is collected from YouTube. Then, two separate deep networks including a temporal audio model and a spatio-temporal visual model are presented to analyze the audio-visual modalities in video clips effectively. Thereafter, the results of both models are integrated using the proposed fusion model based on the Multiple Correspondence Analysis (MCA) algorithm which considers the correlations between data modalities and final classes. The proposed multimodal framework is evaluated on the collected disaster dataset and compared with several state-of-the-art single modality and fusion techniques. The results demonstrate the effectiveness of both visual model and fusion model compared to the baseline approaches. Specifically, the accuracy of the final multi-class classification using the proposed MCA-based fusion reaches to 73% on this challenging dataset
Multimodal deep representation learning for video classification
Real-world applications usually encounter data with various modalities, each containing valuable information. To enhance these applications, it is essential to effectively analyze all information extracted from different data modalities, while most existing learning models ignore some data types and only focus on a single modality. This paper presents a new multimodal deep learning framework for event detection from videos by leveraging recent advances in deep neural networks. First, several deep learning models are utilized to extract useful information from multiple modalities. Among these are pre-trained Convolutional Neural Networks (CNNs) for visual and audio feature extraction and a word embedding model for textual analysis. Then, a novel fusion technique is proposed that integrates different data representations in two levels, namely frame-level and video-level. Different from the existing multimodal learning algorithms, the proposed framework can reason about a missing data type using other available data modalities. The proposed framework is applied to a new video dataset containing natural disaster classes. The experimental results illustrate the effectiveness of the proposed framework compared to some single modal deep learning models as well as conventional fusion techniques. Specifically, the final accuracy is improved more than 16% and 7% compared to the best results from single modality and fusion models, respectively
Multimedia Big Data Analytics A Survey
With the proliferation of online services and mobile technologies, the world has stepped into a multimedia big data era. A vast amount of research work has been done in the multimedia area, targeting different aspects of big data analytics, such as the capture, storage, indexing, mining, and retrieval of multimedia big data. However, very few research work provides a complete survey of the whole pine-line of the multimedia big data analytics, including the management and analysis of the large amount of data, the challenges and opportunities, and the promising research directions. To serve this purpose, we present this survey, which conducts a comprehensive overview of the state-of-the-art research work on multimedia big data analytics. It also aims to bridge the gap between multimedia challenges and big data solutions by providing the current big data frameworks, their applications in multimedia analyses, the strengths and limitations of the existing methods, and the potential future directions in multimedia big data analytics. To the best of our knowledge, this is the first survey that targets the most recent multimedia management techniques for very large-scale data and also provides the research studies and technologies advancing the multimedia analyses in this big data era
IF-MCA: Importance Factor-Based Multiple Correspondence Analysis for Multimedia Data Analytics
Multimedia concept detection is a challenging topic due to the well-known class imbalance issue, where the data instances are distributed unevenly across different classes. This problem becomes even more prominent when the minority class that contains an extremely small proportion of the data represents the concept of interest as has occurred in many real-world applications such as frauds in banking transactions and goal events in soccer videos. Traditional data mining approaches often have difficulty handling largely skewed data distributions. To address this issue, in this paper, an importance-factor (IF)-based multiple correspondence analysis (MCA) framework is proposed to deal with the imbalanced datasets. Specifically, a hierarchical information gain analysis method, which is inspired by the decision tree algorithm, is presented for critical feature selection and IF assignment. Then, the derived IF is incorporated with the MCA algorithm for effective concept detection and retrieval. The comparison results in video concept detection using the disaster dataset and the soccer dataset demonstrate the effectiveness of the proposed framework
A Scalable and Automatic Validation Process for Florida Public Hurricane Loss Model (Invited Paper)
The Florida Public Hurricane Loss Model (FPHLM) is a public catastrophe model that integrates and regulates all key components, such as meteorology, engineering, and actuarial components, by following a certain workflow in the execution phase. The validation phase governed by an Automatic Data Validation (ADV) program simulates each modeled execution component with a large number of historical insurance data of specific hurricane events. The differences between the actual losses and the modeled losses of the insurance portfolios are evaluated to validate the model. The original validation process is time-consuming and error-prone when handling large data sets. This paper presents how the automated computer program efficiently and correctly incorporates the key components and produces useful reports for the validation purposes. By considering sixty-six combinations (i.e., the combination of one company and one hurricane) of the claim data, the FPHLM model adopts the largest set of portfolios comparing to the other four private models, which makes the validation process more challenging
Recommended from our members
Unconstrained Flood Event Detection Using Adversarial Data Augmentation
Nowadays, the world faces extreme climate changes, resulting in an increase of natural disaster events and their severities. In these conditions, the necessity of disaster information management systems has become more imperative. Specifically, in this paper, the problem of flood event detection from images with real-world conditions is addressed. That is, the images may be taken in several conditions, including day, night, blurry, clear, foggy, rainy, different lighting conditions, etc. All these abnormal scenarios significantly reduce the performance of the learning algorithms. In addition, many existing image classification methods use datasets that usually include high-resolution images without considering real-world noise. In this paper, we propose a new image classification framework based on adversarial data augmentation and deep learning algorithms to address the aforementioned problems. We validate the performance of the flood event detection framework on a real-world noisy visual dataset collected from social networks