1,470 research outputs found
End-to-End Audiovisual Fusion with LSTMs
Several end-to-end deep learning approaches have been recently presented
which simultaneously extract visual features from the input images and perform
visual speech classification. However, research on jointly extracting audio and
visual features and performing classification is very limited. In this work, we
present an end-to-end audiovisual model based on Bidirectional Long Short-Term
Memory (BLSTM) networks. To the best of our knowledge, this is the first
audiovisual fusion model which simultaneously learns to extract features
directly from the pixels and spectrograms and perform classification of speech
and nonlinguistic vocalisations. The model consists of multiple identical
streams, one for each modality, which extract features directly from mouth
regions and spectrograms. The temporal dynamics in each stream/modality are
modeled by a BLSTM and the fusion of multiple streams/modalities takes place
via another BLSTM. An absolute improvement of 1.9% in the mean F1 of 4
nonlingusitic vocalisations over audio-only classification is reported on the
AVIC database. At the same time, the proposed end-to-end audiovisual fusion
system improves the state-of-the-art performance on the AVIC database leading
to a 9.7% absolute increase in the mean F1 measure. We also perform audiovisual
speech recognition experiments on the OuluVS2 database using different views of
the mouth, frontal to profile. The proposed audiovisual system significantly
outperforms the audio-only model for all views when the acoustic noise is high.Comment: Accepted to AVSP 2017. arXiv admin note: substantial text overlap
with arXiv:1709.00443 and text overlap with arXiv:1701.0584
Data-driven Flood Emulation: Speeding up Urban Flood Predictions by Deep Convolutional Neural Networks
Computational complexity has been the bottleneck of applying physically-based
simulations on large urban areas with high spatial resolution for efficient and
systematic flooding analyses and risk assessments. To address this issue of
long computational time, this paper proposes that the prediction of maximum
water depth rasters can be considered as an image-to-image translation problem
where the results are generated from input elevation rasters using the
information learned from data rather than by conducting simulations, which can
significantly accelerate the prediction process. The proposed approach was
implemented by a deep convolutional neural network trained on flood simulation
data of 18 designed hyetographs on three selected catchments. Multiple tests
with both designed and real rainfall events were performed and the results show
that the flood predictions by neural network uses only 0.5 % of time comparing
with physically-based approaches, with promising accuracy and ability of
generalizations. The proposed neural network can also potentially be applied to
different but relevant problems including flood predictions for urban layout
planning
Feature discovery and visualization of robot mission data using convolutional autoencoders and Bayesian nonparametric topic models
The gap between our ability to collect interesting data and our ability to
analyze these data is growing at an unprecedented rate. Recent algorithmic
attempts to fill this gap have employed unsupervised tools to discover
structure in data. Some of the most successful approaches have used
probabilistic models to uncover latent thematic structure in discrete data.
Despite the success of these models on textual data, they have not generalized
as well to image data, in part because of the spatial and temporal structure
that may exist in an image stream.
We introduce a novel unsupervised machine learning framework that
incorporates the ability of convolutional autoencoders to discover features
from images that directly encode spatial information, within a Bayesian
nonparametric topic model that discovers meaningful latent patterns within
discrete data. By using this hybrid framework, we overcome the fundamental
dependency of traditional topic models on rigidly hand-coded data
representations, while simultaneously encoding spatial dependency in our topics
without adding model complexity. We apply this model to the motivating
application of high-level scene understanding and mission summarization for
exploratory marine robots. Our experiments on a seafloor dataset collected by a
marine robot show that the proposed hybrid framework outperforms current
state-of-the-art approaches on the task of unsupervised seafloor terrain
characterization.Comment: 8 page
Probabilistic XGBoost Threshold Classification with Autoencoder for Credit Card Fraud Detection
Due to the imbalanced data of outnumbered legitimate transactions than the fraudulent transaction, the detection of fraud is a challenging task to find an effective solution. In this study, autoencoder with probabilistic threshold shifting of XGBoost (AE-XGB) for credit card fraud detection is designed. Initially, AE-XGB employs autoencoder the prevalent dimensionality reduction technique to extract data features from latent space representation. Then the reconstructed lower dimensional features utilize eXtreame Gradient Boost (XGBoost), an ensemble boosting algorithm with probabilistic threshold to classify the data as fraudulent or legitimate. In addition to AE-XGB, other existing ensemble algorithms such as Adaptive Boosting (AdaBoost), Gradient Boosting Machine (GBM), Random Forest, Categorical Boosting (CatBoost), LightGBM and XGBoost are compared with optimal and default threshold. To validate the methodology, we used IEEE-CIS fraud detection dataset for our experiment. Class imbalance and high dimensionality characteristics of dataset reduce the performance of model hence the data is preprocessed and trained. To evaluate the performance of the model, evaluation indicators such as precision, recall, f1-score, g-mean and Mathews Correlation Coefficient (MCC) are accomplished. The findings revealed that the performance of the proposed AE-XGB model is effective in handling imbalanced data and able to detect fraudulent transactions with 90.4% of recall and 90.5% of f1-score from incoming new transactions
Data Augmentation Using Generative Adversarial Networks
Většina dat z reálného světa není rovnoměrně rozdělena do odpovídajících tříd, ale je nevyvážená, což může mít velký vliv na kvalitu predikce klasifikačních modelů. Obecný přístup k řešení tohoto problému je modifikace původních datových sad tak, abychom dosáhli vyváženosti jednotlivých tříd. Tato práce se zaobírá vyvážením obrazových dat za pomoci generativních adversariálních sítí. Primární důraz je kladen na generování obrazových dat náležících do tříd s nedostatečným počtem reprezentantů, což je proces známý jako class balancing. Práce se zabývá analýzou a porovnáním různých technik používaných pro rozšíření dat, jako jsou geometrické metody nebo modely založené na principu neuronových sítí. Vyhodnocení je provedeno pomocí klasifikačních modelů, natrénovaných na původních, nevyvážených i uměle vyvážených datových sadách. Dosažené výsledky naznačují, jak schopnost jednotlivých metod rozšířit datové sady klesá se zvětšující mírou nevyvážení a rozmanitostí těchto sad.Most labelled real-world data is not uniformly distributed within classes, which can have a severe impact on the prediction quality of classification models. A general approach is to overcome this issue by modifying the original data to restore the balance of the classes. This thesis deals with balancing image datasets by data augmentation using generative adversarial neural networks. The primary focus is on generating images of underrepresented classes in imbalanced datasets, which is a process known as class balancing. The aim of this thesis is to analyse and compare data augmentation techniques including standard methods, generative adversarial networks and autoencoders. Evaluation is done using classifiers trained on the original, unbalanced and augmented datasets. The results achieved suggest how the performance of the methods proportionately deteriorates with increasing imbalance rate and diversity of datasets
- …