597 research outputs found

    Deep learning for video game playing

    Get PDF
    In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards

    Speech Processing in Computer Vision Applications

    Get PDF
    Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data

    WESPE: Weakly Supervised Photo Enhancer for Digital Cameras

    Full text link
    Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adversarial Network-based architecture. The proposed model is trained by under weak supervision: unlike previous works, there is no need for strong supervision in the form of a large annotated dataset of aligned original/enhanced photo pairs. The sole requirement is two distinct datasets: one from the source camera, and one composed of arbitrary high-quality images that can be generally crawled from the Internet - the visual content they exhibit may be unrelated. Hence, our solution is repeatable for any camera: collecting the data and training can be achieved in a couple of hours. In this work, we emphasize on extensive evaluation of obtained results. Besides standard objective metrics and subjective user study, we train a virtual rater in the form of a separate CNN that mimics human raters on Flickr data and use this network to get reference scores for both original and enhanced photos. Our experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from several generations of smartphones demonstrate that WESPE produces comparable or improved qualitative results with state-of-the-art strongly supervised methods

    A theory of relation learning and cross-domain generalization

    Get PDF
    People readily generalize knowledge to novel domains and stimuli. We present a theory, instantiated in a computational model, based on the idea that cross-domain generalization in humans is a case of analogical inference over structured (i.e., symbolic) relational representations. The model is an extension of the LISA and DORA models of relational inference and learning. The resulting model learns both the content and format (i.e., structure) of relational representations from non-relational inputs without supervision, when augmented with the capacity for reinforcement learning, leverages these representations to learn individual domains, and then generalizes to new domains on the first exposure (i.e., zero-shot learning) via analogical inference. We demonstrate the capacity of the model to learn structured relational representations from a variety of simple visual stimuli, and to perform cross-domain generalization between video games (Breakout and Pong) and between several psychological tasks. We demonstrate that the model's trajectory closely mirrors the trajectory of children as they learn about relations, accounting for phenomena from the literature on the development of children's reasoning and analogy making. The model's ability to generalize between domains demonstrates the flexibility afforded by representing domains in terms of their underlying relational structure, rather than simply in terms of the statistical relations between their inputs and outputs.Comment: Includes supplemental materia

    Multi-Agent Actor-Critic with Hierarchical Graph Attention Network

    Full text link
    Most previous studies on multi-agent reinforcement learning focus on deriving decentralized and cooperative policies to maximize a common reward and rarely consider the transferability of trained policies to new tasks. This prevents such policies from being applied to more complex multi-agent tasks. To resolve these limitations, we propose a model that conducts both representation learning for multiple agents using hierarchical graph attention network and policy learning using multi-agent actor-critic. The hierarchical graph attention network is specially designed to model the hierarchical relationships among multiple agents that either cooperate or compete with each other to derive more advanced strategic policies. Two attention networks, the inter-agent and inter-group attention layers, are used to effectively model individual and group level interactions, respectively. The two attention networks have been proven to facilitate the transfer of learned policies to new tasks with different agent compositions and allow one to interpret the learned strategies. Empirically, we demonstrate that the proposed model outperforms existing methods in several mixed cooperative and competitive tasks.Comment: Accepted as a conference paper at the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), New York, US

    Attention is more than prediction precision [Commentary on target article]

    Get PDF
    A cornerstone of the target article is that, in a predictive coding framework, attention can be modelled by weighting prediction error with a measure of precision. We argue that this is not a complete explanation, especially in the light of ERP (event-related potentials) data showing large evoked responses for frequently presented target stimuli, which thus are predicted

    Object detection and recognition with event driven cameras

    Get PDF
    This thesis presents study, analysis and implementation of algorithms to perform object detection and recognition using an event-based cam era. This sensor represents a novel paradigm which opens a wide range of possibilities for future developments of computer vision. In partic ular it allows to produce a fast, compressed, illumination invariant output, which can be exploited for robotic tasks, where fast dynamics and signi\ufb01cant illumination changes are frequent. The experiments are carried out on the neuromorphic version of the iCub humanoid platform. The robot is equipped with a novel dual camera setup mounted directly in the robot\u2019s eyes, used to generate data with a moving camera. The motion causes the presence of background clut ter in the event stream. In such scenario the detection problem has been addressed with an at tention mechanism, speci\ufb01cally designed to respond to the presence of objects, while discarding clutter. The proposed implementation takes advantage of the nature of the data to simplify the original proto object saliency model which inspired this work. Successively, the recognition task was \ufb01rst tackled with a feasibility study to demonstrate that the event stream carries su\ufb03cient informa tion to classify objects and then with the implementation of a spiking neural network. The feasibility study provides the proof-of-concept that events are informative enough in the context of object classi\ufb01 cation, whereas the spiking implementation improves the results by employing an architecture speci\ufb01cally designed to process event data. The spiking network was trained with a three-factor local learning rule which overcomes weight transport, update locking and non-locality problem. The presented results prove that both detection and classi\ufb01cation can be carried-out in the target application using the event data

    Top-Down Selection in Convolutional Neural Networks

    Get PDF
    Feedforward information processing fills the role of hierarchical feature encoding, transformation, reduction, and abstraction in a bottom-up manner. This paradigm of information processing is sufficient for task requirements that are satisfied in the one-shot rapid traversal of sensory information through the visual hierarchy. However, some tasks demand higher-order information processing using short-term recurrent, long-range feedback, or other processes. The predictive, corrective, and modulatory information processing in top-down fashion complement the feedforward pass to fulfill many complex task requirements. Convolutional neural networks have recently been successful in addressing some aspects of the feedforward processing. However, the role of top-down processing in such models has not yet been fully understood. We propose a top-down selection framework for convolutional neural networks to address the selective and modulatory nature of top-down processing in vision systems. We examine various aspects of the proposed model in different experimental settings such as object localization, object segmentation, task priming, compact neural representation, and contextual interference reduction. We test the hypothesis that the proposed approach is capable of accomplishing hierarchical feature localization according to task cuing. Additionally, feature modulation using the proposed approach is tested for demanding tasks such as segmentation and iterative parameter fine-tuning. Moreover, the top-down attentional traces are harnessed to enable a more compact neural representation. The experimental achievements support the practical complementary role of the top-down selection mechanisms to the bottom-up feature encoding routines

    Machine Learning Applications for Load Predictions in Electrical Energy Network

    Get PDF
    In this work collected operational data of typical urban and rural energy network are analysed for predictions of energy consumption, as well as for selected region of Nordpool electricity markets. The regression techniques are systematically investigated for electrical energy prediction and correlating other impacting parameters. The k-Nearest Neighbour (kNN), Random Forest (RF) and Linear Regression (LR) are analysed and evaluated both by using continuous and vertical time approach. It is observed that for 30 minutes predictions the RF Regression has the best results, shown by a mean absolute percentage error (MAPE) in the range of 1-2 %. kNN show best results for the day-ahead forecasting with MAPE of 2.61 %. The presented vertical time approach outperforms the continuous time approach. To enhance pre-processing stage, refined techniques from the domain of statistics and time series are adopted in the modelling. Reducing the dimensionality through principal components analysis improves the predictive performance of Recurrent Neural Networks (RNN). In the case of Gated Recurrent Units (GRU) networks, the results for all the seasons are improved through principal components analysis (PCA). This work also considers abnormal operation due to various instances (e.g. random effect, intrusion, abnormal operation of smart devices, cyber-threats, etc.). In the results of kNN, iforest and Local Outlier Factor (LOF) on urban area data and from rural region data, it is observed that the anomaly detection for the scenarios are different. For the rural region, most of the anomalies are observed in the latter timeline of the data concentrated in the last year of the collected data. For the urban area data, the anomalies are spread out over the entire timeline. The frequency of detected anomalies where considerably higher for the rural area load demand than for the urban area load demand. Observing from considered case scenarios, the incidents of detected anomalies are more data driven, than exceptions in the algorithms. It is observed that from the domain knowledge of smart energy systems the LOF is able to detect observations that could not have detected by visual inspection alone, in contrast to kNN and iforest. Whereas kNN and iforest excludes an upper and lower bound, the LOF is density based and separates out anomalies amidst in the data. The capability that LOF has to identify anomalies amidst the data together with the deep domain knowledge is an advantage, when detecting anomalies in smart meter data. This work has shown that the instance based models can compete with models of higher complexity, yet some methods in preprocessing (such as circular coding) does not function for an instance based learner such as k-Nearest Neighbor, and hence kNN can not option for this kind of complexity even in the feature engineering of the model. It will be interesting for the future work of electrical load forecasting to develop solution that combines a high complexity in the feature engineering and have the explainability of instance based models.publishedVersio
    • 

    corecore