597 research outputs found
Deep learning for video game playing
In this article, we review recent Deep Learning advances in the context of
how they have been applied to play different types of video games such as
first-person shooters, arcade games, and real-time strategy games. We analyze
the unique requirements that different game genres pose to a deep learning
system and highlight important open challenges in the context of applying these
machine learning methods to video games, such as general game playing, dealing
with extremely large decision spaces and sparse rewards
Speech Processing in Computer Vision Applications
Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data
WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
Low-end and compact mobile cameras demonstrate limited photo quality mainly
due to space, hardware and budget constraints. In this work, we propose a deep
learning solution that translates photos taken by cameras with limited
capabilities into DSLR-quality photos automatically. We tackle this problem by
introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image
Generative Adversarial Network-based architecture. The proposed model is
trained by under weak supervision: unlike previous works, there is no need for
strong supervision in the form of a large annotated dataset of aligned
original/enhanced photo pairs. The sole requirement is two distinct datasets:
one from the source camera, and one composed of arbitrary high-quality images
that can be generally crawled from the Internet - the visual content they
exhibit may be unrelated. Hence, our solution is repeatable for any camera:
collecting the data and training can be achieved in a couple of hours. In this
work, we emphasize on extensive evaluation of obtained results. Besides
standard objective metrics and subjective user study, we train a virtual rater
in the form of a separate CNN that mimics human raters on Flickr data and use
this network to get reference scores for both original and enhanced photos. Our
experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from
several generations of smartphones demonstrate that WESPE produces comparable
or improved qualitative results with state-of-the-art strongly supervised
methods
A theory of relation learning and cross-domain generalization
People readily generalize knowledge to novel domains and stimuli. We present
a theory, instantiated in a computational model, based on the idea that
cross-domain generalization in humans is a case of analogical inference over
structured (i.e., symbolic) relational representations. The model is an
extension of the LISA and DORA models of relational inference and learning. The
resulting model learns both the content and format (i.e., structure) of
relational representations from non-relational inputs without supervision, when
augmented with the capacity for reinforcement learning, leverages these
representations to learn individual domains, and then generalizes to new
domains on the first exposure (i.e., zero-shot learning) via analogical
inference. We demonstrate the capacity of the model to learn structured
relational representations from a variety of simple visual stimuli, and to
perform cross-domain generalization between video games (Breakout and Pong) and
between several psychological tasks. We demonstrate that the model's trajectory
closely mirrors the trajectory of children as they learn about relations,
accounting for phenomena from the literature on the development of children's
reasoning and analogy making. The model's ability to generalize between domains
demonstrates the flexibility afforded by representing domains in terms of their
underlying relational structure, rather than simply in terms of the statistical
relations between their inputs and outputs.Comment: Includes supplemental materia
Multi-Agent Actor-Critic with Hierarchical Graph Attention Network
Most previous studies on multi-agent reinforcement learning focus on deriving
decentralized and cooperative policies to maximize a common reward and rarely
consider the transferability of trained policies to new tasks. This prevents
such policies from being applied to more complex multi-agent tasks. To resolve
these limitations, we propose a model that conducts both representation
learning for multiple agents using hierarchical graph attention network and
policy learning using multi-agent actor-critic. The hierarchical graph
attention network is specially designed to model the hierarchical relationships
among multiple agents that either cooperate or compete with each other to
derive more advanced strategic policies. Two attention networks, the
inter-agent and inter-group attention layers, are used to effectively model
individual and group level interactions, respectively. The two attention
networks have been proven to facilitate the transfer of learned policies to new
tasks with different agent compositions and allow one to interpret the learned
strategies. Empirically, we demonstrate that the proposed model outperforms
existing methods in several mixed cooperative and competitive tasks.Comment: Accepted as a conference paper at the Thirty-Fourth AAAI Conference
on Artificial Intelligence (AAAI-20), New York, US
Attention is more than prediction precision [Commentary on target article]
A cornerstone of the target article is that, in a predictive coding framework, attention can be modelled by weighting prediction error with a measure of precision. We argue that this is not a complete explanation, especially in the light of ERP (event-related potentials) data showing large evoked responses for frequently presented target stimuli, which thus are predicted
Recommended from our members
Computational models of the human visual cortex: on individual differences and ecologically valid input statistics
Perception relies on cortical processes in response to sensory stimuli. Visual input entering the
eyes ascends a cascade of processing steps from the retina to high-level regions of the cortex.
Vision science investigates these transformations that give rise to high-level processing of
visual objects, such as object recognition. In this thesis I investigate computational models
of the human visual cortex with regard to their ability to predict cortical responses to visual
objects. In particular, I describe two factors playing an important role in using deep neural
networks (DNNs) to better understand cortical functioning: the initial weight state and
ecologically more valid input statistics.
In Chapter 1 of this thesis I will introduce relevant literature pertaining to deep neural
networks as a modeling framework for the visual cortex. Next, I will lay out the motivation
for the research questions investigated in this thesis and described in detail in Chapters 2, 3,
and 4.
Chapter 2 focuses on the impact of the initial weight state of a model on its ability
to predict cortical representations. I describe work in which we demonstrate that two
DNN instances identical in every aspect but their initial weights, yield very dissimilar
representations. Relying on single network instances to predict cortical activation patterns
in response to sensory stimuli poses a problem for computational neuroscience: depending
on the initial set of weights the ability to mirror the cortical representations of these stimuli
might vary. Thus, results based on single (âoff-the-shelfâ) model instances - as commonly
used in computational neuroscience - may not generalize. In contrast, using multiple DNN
instances might alleviate this problem as they allow insights in the variability of a given
model architecture to predict cortical representations. These individual differences between
model instances suggest that to allow results to generalize more easily the model instances
should be treated similar to human experimental participants.
In Chapter 3 I focus on ecologically more valid input statistics (in the form of training
images) aiming to improve a modelâs ability to predict cortical representations. The most
successful models of the human visual cortex to date are DNNs trained on object recognition
tasks designed with machine learning goals in mind. However, the image sets used for training
these DNNs are often not ecologically realistic. For example, training on the most-widely used image set in computational neuroscience (ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) 2012) requires the fine-grained distinction of 120 dog breeds, but does
not contain visual object categories encountered frequently in everyday human life (e.g.
woman, man, or child). This suggests that taking into account the human visual experience
when training models of the human visual cortex on a categorization task might help to
predict cortical representations. In this Chapter I describe the creation of a set of images
aimed at mimicking the human visual diet: ecoset. Ecoset contains more than 1.5 million
images from 565 basic level categories and is the largest image set specifically designed for
computational neuroscience to date. Ecoset is freely available to allow the community to test
their own hypotheses of models trained with input statistics matched to the human visual
environment.
In Chapter 4 we build on the results from the previous two Chapters. Using multiple
DNN instances I investigate whether a brain-inspired model architecture (vNet) trained on
ecologically more valid input statistics (ecoset) might improve its ability to predict cortical
representations. I first demonstrate that ecoset might improve an architectureâs ability to
mirror cortical representations. Furthermore, ecoset-trained vNet also outperforms state-ofthe-
art computer vision and computational neuroscience models in terms of mirroring cortical
representations in the human brain. Thus, incorporating biological and ecological aspects,
such as brain-inspired architectural features and ecologically more valid input statistics, into
computational models may yield better predictions of response patterns in the human visual
cortex.
Treating DNN instances similar to human experimental participants and considering
ecological and biological factors for building these DNNs may be an important step towards
better models of the human visual cortex. Such models might allow a better understanding of
the cortical processes underlying high-level vision in the human brain.Cambridge Trust - Vice Chancellor's Award 2015
Cambridge Philosophical Society
MRC Cognition and Brain Sciences Uni
Object detection and recognition with event driven cameras
This thesis presents study, analysis and implementation of algorithms
to perform object detection and recognition using an event-based cam
era. This sensor represents a novel paradigm which opens a wide range
of possibilities for future developments of computer vision. In partic
ular it allows to produce a fast, compressed, illumination invariant
output, which can be exploited for robotic tasks, where fast dynamics
and signi\ufb01cant illumination changes are frequent. The experiments
are carried out on the neuromorphic version of the iCub humanoid
platform. The robot is equipped with a novel dual camera setup
mounted directly in the robot\u2019s eyes, used to generate data with a
moving camera. The motion causes the presence of background clut
ter in the event stream.
In such scenario the detection problem has been addressed with an at
tention mechanism, speci\ufb01cally designed to respond to the presence of
objects, while discarding clutter. The proposed implementation takes
advantage of the nature of the data to simplify the original proto
object saliency model which inspired this work.
Successively, the recognition task was \ufb01rst tackled with a feasibility
study to demonstrate that the event stream carries su\ufb03cient informa
tion to classify objects and then with the implementation of a spiking
neural network. The feasibility study provides the proof-of-concept
that events are informative enough in the context of object classi\ufb01
cation, whereas the spiking implementation improves the results by
employing an architecture speci\ufb01cally designed to process event data.
The spiking network was trained with a three-factor local learning rule
which overcomes weight transport, update locking and non-locality
problem.
The presented results prove that both detection and classi\ufb01cation can
be carried-out in the target application using the event data
Top-Down Selection in Convolutional Neural Networks
Feedforward information processing fills the role of hierarchical feature encoding, transformation, reduction, and abstraction in a bottom-up manner. This paradigm of information processing is sufficient for task requirements that are satisfied in the one-shot rapid traversal of sensory information through the visual hierarchy. However, some tasks demand higher-order information processing using short-term recurrent, long-range feedback, or other processes. The predictive, corrective, and modulatory information processing in top-down fashion complement the feedforward pass to fulfill many complex task requirements. Convolutional neural networks have recently been successful in addressing some aspects of the feedforward processing. However, the role of top-down processing in such models has not yet been fully understood. We propose a top-down selection framework for convolutional neural networks to address the selective and modulatory nature of top-down processing in vision systems. We examine various aspects of the proposed model in different experimental settings such as object localization, object segmentation, task priming, compact neural representation, and contextual interference reduction. We test the hypothesis that the proposed approach is capable of accomplishing hierarchical feature localization according to task cuing. Additionally, feature modulation using the proposed approach is tested for demanding tasks such as segmentation and iterative parameter fine-tuning. Moreover, the top-down attentional traces are harnessed to enable a more compact neural representation. The experimental achievements support the practical complementary role of the top-down selection mechanisms to the bottom-up feature encoding routines
Machine Learning Applications for Load Predictions in Electrical Energy Network
In this work collected operational data of typical urban and rural energy network are analysed for predictions of energy consumption, as well as for selected region of Nordpool electricity markets. The regression techniques are systematically investigated for electrical energy prediction and correlating other impacting parameters. The k-Nearest Neighbour (kNN), Random Forest (RF) and Linear Regression (LR) are analysed and evaluated both by using continuous and vertical time approach. It is observed that for 30 minutes predictions the RF Regression has the best results, shown by a mean absolute percentage error (MAPE) in the range of 1-2 %. kNN show best results for the day-ahead forecasting with MAPE of 2.61 %. The presented vertical time approach outperforms the continuous time approach. To enhance pre-processing stage, refined techniques from the domain of statistics and time series are adopted in the modelling. Reducing the dimensionality through principal components analysis improves the predictive performance of Recurrent Neural Networks (RNN). In the case of Gated Recurrent Units (GRU) networks, the results for all the seasons are improved through principal components analysis (PCA). This work also considers abnormal operation due to various instances (e.g. random effect, intrusion, abnormal operation of smart devices, cyber-threats, etc.). In the results of kNN, iforest and Local Outlier Factor (LOF) on urban area data and from rural region data, it is observed that the anomaly detection for the scenarios are different. For the rural region, most of the anomalies are observed in the latter timeline of the data concentrated in the last year of the collected data. For the urban area data, the anomalies are spread out over the entire timeline. The frequency of detected anomalies where considerably higher for the rural area load demand than for the urban area load demand. Observing from considered case scenarios, the incidents of detected anomalies are more data driven, than exceptions in the algorithms. It is observed that from the domain knowledge of smart energy systems the LOF is able to detect observations that could not have detected by visual inspection alone, in contrast to kNN and iforest. Whereas kNN and iforest excludes an upper and lower bound, the LOF is density based and separates out anomalies amidst in the data. The capability that LOF has to identify anomalies amidst the data together with the deep domain knowledge is an advantage, when detecting anomalies in smart meter data. This work has shown that the instance based models can compete with models of higher complexity, yet some methods in preprocessing (such as circular coding) does not function for an instance based learner such as k-Nearest Neighbor, and hence kNN can not option for this kind of complexity even in the feature engineering of the model. It will be interesting for the future work of electrical load forecasting to develop solution that combines a high complexity in the feature engineering and have the explainability of instance based models.publishedVersio
- âŠ