1,470 research outputs found

    End-to-End Audiovisual Fusion with LSTMs

    Full text link
    Several end-to-end deep learning approaches have been recently presented which simultaneously extract visual features from the input images and perform visual speech classification. However, research on jointly extracting audio and visual features and performing classification is very limited. In this work, we present an end-to-end audiovisual model based on Bidirectional Long Short-Term Memory (BLSTM) networks. To the best of our knowledge, this is the first audiovisual fusion model which simultaneously learns to extract features directly from the pixels and spectrograms and perform classification of speech and nonlinguistic vocalisations. The model consists of multiple identical streams, one for each modality, which extract features directly from mouth regions and spectrograms. The temporal dynamics in each stream/modality are modeled by a BLSTM and the fusion of multiple streams/modalities takes place via another BLSTM. An absolute improvement of 1.9% in the mean F1 of 4 nonlingusitic vocalisations over audio-only classification is reported on the AVIC database. At the same time, the proposed end-to-end audiovisual fusion system improves the state-of-the-art performance on the AVIC database leading to a 9.7% absolute increase in the mean F1 measure. We also perform audiovisual speech recognition experiments on the OuluVS2 database using different views of the mouth, frontal to profile. The proposed audiovisual system significantly outperforms the audio-only model for all views when the acoustic noise is high.Comment: Accepted to AVSP 2017. arXiv admin note: substantial text overlap with arXiv:1709.00443 and text overlap with arXiv:1701.0584

    Data-driven Flood Emulation: Speeding up Urban Flood Predictions by Deep Convolutional Neural Networks

    Full text link
    Computational complexity has been the bottleneck of applying physically-based simulations on large urban areas with high spatial resolution for efficient and systematic flooding analyses and risk assessments. To address this issue of long computational time, this paper proposes that the prediction of maximum water depth rasters can be considered as an image-to-image translation problem where the results are generated from input elevation rasters using the information learned from data rather than by conducting simulations, which can significantly accelerate the prediction process. The proposed approach was implemented by a deep convolutional neural network trained on flood simulation data of 18 designed hyetographs on three selected catchments. Multiple tests with both designed and real rainfall events were performed and the results show that the flood predictions by neural network uses only 0.5 % of time comparing with physically-based approaches, with promising accuracy and ability of generalizations. The proposed neural network can also potentially be applied to different but relevant problems including flood predictions for urban layout planning

    Feature discovery and visualization of robot mission data using convolutional autoencoders and Bayesian nonparametric topic models

    Full text link
    The gap between our ability to collect interesting data and our ability to analyze these data is growing at an unprecedented rate. Recent algorithmic attempts to fill this gap have employed unsupervised tools to discover structure in data. Some of the most successful approaches have used probabilistic models to uncover latent thematic structure in discrete data. Despite the success of these models on textual data, they have not generalized as well to image data, in part because of the spatial and temporal structure that may exist in an image stream. We introduce a novel unsupervised machine learning framework that incorporates the ability of convolutional autoencoders to discover features from images that directly encode spatial information, within a Bayesian nonparametric topic model that discovers meaningful latent patterns within discrete data. By using this hybrid framework, we overcome the fundamental dependency of traditional topic models on rigidly hand-coded data representations, while simultaneously encoding spatial dependency in our topics without adding model complexity. We apply this model to the motivating application of high-level scene understanding and mission summarization for exploratory marine robots. Our experiments on a seafloor dataset collected by a marine robot show that the proposed hybrid framework outperforms current state-of-the-art approaches on the task of unsupervised seafloor terrain characterization.Comment: 8 page

    Probabilistic XGBoost Threshold Classification with Autoencoder for Credit Card Fraud Detection

    Get PDF
    Due to the imbalanced data of outnumbered legitimate transactions than the fraudulent transaction, the detection of fraud is a challenging task to find an effective solution. In this study, autoencoder with probabilistic threshold shifting of XGBoost (AE-XGB) for credit card fraud detection is designed. Initially, AE-XGB employs autoencoder the prevalent dimensionality reduction technique to extract data features from latent space representation. Then the reconstructed lower dimensional features utilize eXtreame Gradient Boost (XGBoost), an ensemble boosting algorithm with probabilistic threshold to classify the data as fraudulent or legitimate. In addition to AE-XGB, other existing ensemble algorithms such as Adaptive Boosting (AdaBoost), Gradient Boosting Machine (GBM), Random Forest, Categorical Boosting (CatBoost), LightGBM and XGBoost are compared with optimal and default threshold. To validate the methodology, we used IEEE-CIS fraud detection dataset for our experiment. Class imbalance and high dimensionality characteristics of dataset reduce the performance of model hence the data is preprocessed and trained. To evaluate the performance of the model, evaluation indicators such as precision, recall, f1-score, g-mean and Mathews Correlation Coefficient (MCC) are accomplished. The findings revealed that the performance of the proposed AE-XGB model is effective in handling imbalanced data and able to detect fraudulent transactions with 90.4% of recall and 90.5% of f1-score from incoming new transactions

    Data Augmentation Using Generative Adversarial Networks

    Get PDF
    Většina dat z reálného světa není rovnoměrně rozdělena do odpovídajících tříd, ale je nevyvážená, což může mít velký vliv na kvalitu predikce klasifikačních modelů. Obecný přístup k řešení tohoto problému je modifikace původních datových sad tak, abychom dosáhli vyváženosti jednotlivých tříd. Tato práce se zaobírá vyvážením obrazových dat za pomoci generativních adversariálních sítí. Primární důraz je kladen na generování obrazových dat náležících do tříd s nedostatečným počtem reprezentantů, což je proces známý jako class balancing. Práce se zabývá analýzou a porovnáním různých technik používaných pro rozšíření dat, jako jsou geometrické metody nebo modely založené na principu neuronových sítí. Vyhodnocení je provedeno pomocí klasifikačních modelů, natrénovaných na původních, nevyvážených i uměle vyvážených datových sadách. Dosažené výsledky naznačují, jak schopnost jednotlivých metod rozšířit datové sady klesá se zvětšující mírou nevyvážení a rozmanitostí těchto sad.Most labelled real-world data is not uniformly distributed within classes, which can have a severe impact on the prediction quality of classification models. A general approach is to overcome this issue by modifying the original data to restore the balance of the classes. This thesis deals with balancing image datasets by data augmentation using generative adversarial neural networks. The primary focus is on generating images of underrepresented classes in imbalanced datasets, which is a process known as class balancing. The aim of this thesis is to analyse and compare data augmentation techniques including standard methods, generative adversarial networks and autoencoders. Evaluation is done using classifiers trained on the original, unbalanced and augmented datasets. The results achieved suggest how the performance of the methods proportionately deteriorates with increasing imbalance rate and diversity of datasets
    corecore