328 research outputs found
Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset
Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken
communication. Machine learning models such as neural networks have already been proposed for audio signal
modeling, where recurrent structures can take advantage of temporal dependencies. This work aims to study the
implementation of several neural network-based systems for speech and music event detection over a collection of
77,937 10-second audio segments (216 h), selected from the Google AudioSet dataset. These segments belong to
YouTube videos and have been represented as mel-spectrograms. We propose and compare two approaches. The
first one is the training of two different neural networks, one for speech detection and another for music detection.
The second approach consists on training a single neural network to tackle both tasks at the same time. The studied
architectures include fully connected, convolutional and LSTM (long short-term memory) recurrent networks.
Comparative results are provided in terms of classification performance and model complexity. We would like to
highlight the performance of convolutional architectures, specially in combination with an LSTM stage. The hybrid
convolutional-LSTM models achieve the best overall results (85% accuracy) in the three proposed tasks. Furthermore,
a distractor analysis of the results has been carried out in order to identify which events in the ontology are the most
harmful for the performance of the models, showing some difficult scenarios for the detection of music and speechThis work has been supported by project “DSSL: Redes Profundas y Modelos
de Subespacios para Deteccion y Seguimiento de Locutor, Idioma y
Enfermedades Degenerativas a partir de la Voz” (TEC2015-68172-C2-1-P),
funded by the Ministry of Economy and Competitivity of Spain and FEDE
Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking
Discriminative Correlation Filters (DCF) have demonstrated excellent
performance for visual object tracking. The key to their success is the ability
to efficiently exploit available negative data by including all shifted
versions of a training sample. However, the underlying DCF formulation is
restricted to single-resolution feature maps, significantly limiting its
potential. In this paper, we go beyond the conventional DCF framework and
introduce a novel formulation for training continuous convolution filters. We
employ an implicit interpolation model to pose the learning problem in the
continuous spatial domain. Our proposed formulation enables efficient
integration of multi-resolution deep feature maps, leading to superior results
on three object tracking benchmarks: OTB-2015 (+5.1% in mean OP), Temple-Color
(+4.6% in mean OP), and VOT2015 (20% relative reduction in failure rate).
Additionally, our approach is capable of sub-pixel localization, crucial for
the task of accurate feature point tracking. We also demonstrate the
effectiveness of our learning formulation in extensive feature point tracking
experiments. Code and supplementary material are available at
http://www.cvl.isy.liu.se/research/objrec/visualtracking/conttrack/index.html.Comment: Accepted at ECCV 201
Protein interface prediction using graph convolutional networks
2017 Fall.Includes bibliographical references.Proteins play a critical role in processes both within and between cells, through their interactions with each other and other molecules. Proteins interact via an interface forming a protein complex, which is difficult, expensive, and time consuming to determine experimentally, giving rise to computational approaches. These computational approaches utilize known electrochemical properties of protein amino acid residues in order to predict if they are a part of an interface or not. Prediction can occur in a partner independent fashion, where amino acid residues are considered independently of their neighbor, or in a partner specific fashion, where pairs of potentially interacting residues are considered together. Ultimately, prediction of protein interfaces can help illuminate cellular biology, improve our understanding of diseases, and aide pharmaceutical research. Interface prediction has historically been performed with a variety of methods, to include docking, template matching, and more recently, machine learning approaches. The field of machine learning has undergone a revolution of sorts with the emergence of convolutional neural networks as the leading method of choice for a wide swath of tasks. Enabled by large quantities of data and the increasing power and availability of computing resources, convolutional neural networks efficiently detect patterns in grid structured data and generate hierarchical representations that prove useful for many types of problems. This success has motivated the work presented in this thesis, which seeks to improve upon state of the art interface prediction methods by incorporating concepts from convolutional neural networks. Proteins are inherently irregular, so they don't easily conform to a grid structure, whereas a graph representation is much more natural. Various convolution operations have been proposed for graph data, each geared towards a particular application. We adapted these convolutions for use in interface prediction, and proposed two new variants. Neural networks were trained on the Docking Benchmark Dataset version 4.0 complexes and tested on the new complexes added in version 5.0. Results were compared against the state of the art method partner specific method, PAIRpred [1]. Results show that multiple variants of graph convolution outperform PAIRpred, with no method emerging as the clear winner. In the future, additional training data may be incorporated from other sources, unsupervised pretraining such as autoencoding may be employed, and a generalization of convolution to simplicial complexes may also be explored. In addition, the various graph convolution approaches may be applied to other applications with graph structured data, such as Quantitative Structure Activity Relationship (QSAR) learning, and knowledge base inference
Analysis of machine learning techniques applied to sensory detection of vehicles in intelligent crosswalks
Improving road safety through artificial intelligence-based systems is now crucial turning smart cities into a reality. Under this highly relevant and extensive heading, an approach is proposed to improve vehicle detection in smart crosswalks using machine learning models. Contrarily to classic fuzzy classifiers, machine learning models do not require the readjustment of labels that depend on the location of the system and the road conditions. Several machine learning models were trained and tested using real traffic data taken from urban scenarios in both Portugal and Spain. These include random forest, time-series forecasting, multi-layer perceptron, support vector machine, and logistic regression models. A deep reinforcement learning agent, based on a state-of-the-art double-deep recurrent Q-network, is also designed and compared with the machine learning models just mentioned. Results show that the machine learning models can efficiently replace the classic fuzzy classifier.Ministry of Economy and Knowledge of the Andalusian Government, Spain
5947info:eu-repo/semantics/publishedVersio
Improving 3D convolutional neural network comprehensibility via interactive visualization of relevance maps: Evaluation in Alzheimer's disease
Background: Although convolutional neural networks (CNN) achieve high
diagnostic accuracy for detecting Alzheimer's disease (AD) dementia based on
magnetic resonance imaging (MRI) scans, they are not yet applied in clinical
routine. One important reason for this is a lack of model comprehensibility.
Recently developed visualization methods for deriving CNN relevance maps may
help to fill this gap. We investigated whether models with higher accuracy also
rely more on discriminative brain regions predefined by prior knowledge.
Methods: We trained a CNN for the detection of AD in N=663 T1-weighted MRI
scans of patients with dementia and amnestic mild cognitive impairment (MCI)
and verified the accuracy of the models via cross-validation and in three
independent samples including N=1655 cases. We evaluated the association of
relevance scores and hippocampus volume to validate the clinical utility of
this approach. To improve model comprehensibility, we implemented an
interactive visualization of 3D CNN relevance maps.
Results: Across three independent datasets, group separation showed high
accuracy for AD dementia vs. controls (AUC0.92) and moderate accuracy for
MCI vs. controls (AUC0.75). Relevance maps indicated that hippocampal
atrophy was considered as the most informative factor for AD detection, with
additional contributions from atrophy in other cortical and subcortical
regions. Relevance scores within the hippocampus were highly correlated with
hippocampal volumes (Pearson's r-0.86, p<0.001).
Conclusion: The relevance maps highlighted atrophy in regions that we had
hypothesized a priori. This strengthens the comprehensibility of the CNN
models, which were trained in a purely data-driven manner based on the scans
and diagnosis labels.Comment: 24 pages, 9 figures/tables, supplementary material, source code
available on GitHu
Analysis of human-computer interaction time series using Deep Learning
Dissertação de mestrado integrado em Engenharia InformáticaThe collection and use of data resulting from human-computer interaction are becoming more and more common. These have been allowing for the birth of intelligent systems that extract powerful knowledge, potentially
improving the user experience or even originating various digital services. With the rapid scientific advancements
that have been taking place in the field of Deep Learning, it is convenient to review the underlying techniques
currently used in these systems.
In this work, we propose an approach to the general task of analyzing such interactions in the form of time
series, using Deep Learning. We then rely on this approach to develop an anti-cheating system for video games
using only keyboard and mouse input data. This system can work with any video game, and with minor adjustments, it can be easily adapted to new platforms (such as mobile and gaming consoles).
Experiments suggest that analyzing HCI time series data with deep learning yields better results while providing solutions that do not rely highly on domain knowledge as traditional systems.A recolha e a utilização de dados resultantes da interação humano-computador estão a tornar-se cada vez
mais comuns. Estas têm permitido o surgimento de sistemas inteligentes capazes de extrair conhecimento ex tremamente útil, potencialmente melhorando a experiência do utilizador ou mesmo originando diversos serviços
digitais. Com os acelerados avanços científicos na área do Deep Learning, torna-se conveniente rever as técni cas subjacentes a estes sistemas.
Neste trabalho, propomos uma abordagem ao problema geral de analisar tais interações na forma de séries
temporais, utilizando Deep Learning. Apoiamo-nos então nesta abordagem para desenvolver um sistema de
anti-cheating para videojogos, utilizando apenas dados de input de rato e teclado. Este sistema funciona com
qualquer jogo e pode, com pequenos ajustes, ser adaptado para novas plataformas (como dispositivos móveis
ou consolas).
As experiências sugerem que analisar dados de séries temporais de interação humano-computador pro duz melhores resultados, disponibilizando soluções que não são altamente dependentes de conhecimento de
domínio como sistemas tradicionais
- …