1,607 research outputs found

    Representation Learning: A Review and New Perspectives

    Full text link
    The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

    Radar signal categorization using a neural network

    Get PDF
    Neural networks were used to analyze a complex simulated radar environment which contains noisy radar pulses generated by many different emitters. The neural network used is an energy minimizing network (the BSB model) which forms energy minima - attractors in the network dynamical system - based on learned input data. The system first determines how many emitters are present (the deinterleaving problem). Pulses from individual simulated emitters give rise to separate stable attractors in the network. Once individual emitters are characterized, it is possible to make tentative identifications of them based on their observed parameters. As a test of this idea, a neural network was used to form a small data base that potentially could make emitter identifications

    Using visualization, variable selection and feature extraction to learn from industrial data

    Get PDF
    Although the engineers of industry have access to process data, they seldom use advanced statistical tools to solve process control problems. Why this reluctance? I believe that the reason is in the history of the development of statistical tools, which were developed in the era of rigorous mathematical modelling, manual computation and small data sets. This created sophisticated tools. The engineers do not understand the requirements of these algorithms related, for example, to pre-processing of data. If algorithms are fed with unsuitable data, or parameterized poorly, they produce unreliable results, which may lead an engineer to turn down statistical analysis in general. This thesis looks for algorithms that probably do not impress the champions of statistics, but serve process engineers. This thesis advocates three properties in an algorithm: supervised operation, robustness and understandability. Supervised operation allows and requires the user to explicate the goal of the analysis, which allows the algorithm to discover results that are relevant to the user. Robust algorithms allow engineers to analyse raw process data collected from the automation system of the plant. The third aspect is understandability: the user must understand how to parameterize the model, what is the principle of the algorithm, and know how to interpret the results. The above criteria are justified with the theories of human learning. The basis is the theory of constructivism, which defines learning as construction of mental models. Then I discuss the theories of organisational learning, which show how mental models influence the behaviour of groups of persons. The next level discusses statistical methodologies of data analysis, and binds them to the theories of organisational learning. The last level discusses individual statistical algorithms, and introduces the methodology and the algorithms proposed by this thesis. This methodology uses three types of algorithms: visualization, variable selection and feature extraction. The goal of the proposed methodology is to reliably and understandably provide the user with information that is related to a problem he has defined interesting. The above methodology is illustrated by an analysis of an industrial case: the concentrator of the Hitura mine. This case illustrates how to define the problem with off-line laboratory data, and how to search the on-line data for solutions. A major advantage of algorithmic study of data is efficiency: the manual approach reported in the early took approximately six man months; the automated approach of this thesis created comparable results in few weeks.reviewe

    Hierarchical Feature Learning

    Get PDF
    The success of many tasks depends on good feature representation which is often domain-specific and hand-crafted requiring substantial human effort. Such feature representation is not general, i.e. unsuitable for even the same task across multiple domains, let alone different tasks.To address these issues, a multilayered convergent neural architecture is presented for learning from repeating spatially and temporally coincident patterns in data at multiple levels of abstraction. The bottom-up weights in each layer are learned to encode a hierarchy of overcomplete and sparse feature dictionaries from space- and time-varying sensory data. Two algorithms are investigated: recursive layer-by-layer spherical clustering and sparse coding to learn feature hierarchies. The model scales to full-sized high-dimensional input data and to an arbitrary number of layers thereby having the capability to capture features at any level of abstraction. The model learns features that correspond to objects in higher layers and object-parts in lower layers.Learning features invariant to arbitrary transformations in the data is a requirement for any effective and efficient representation system, biological or artificial. Each layer in the proposed network is composed of simple and complex sublayers motivated by the layered organization of the primary visual cortex. When exposed to natural videos, the model develops simple and complex cell-like receptive field properties. The model can predict by learning lateral connections among the simple sublayer neurons. A topographic map to their spatial features emerges by minimizing the wiring length simultaneously with feature learning.The model is general-purpose, unsupervised and online. Operations in each layer of the model can be implemented in parallelized hardware, making it very efficient for real world applications

    Practical recommendations for gradient-based training of deep architectures

    Full text link
    Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures

    Machine Learning Predicts Reach-Scale Channel Types From Coarse-Scale Geospatial Data in a Large River Basin

    Get PDF
    Hydrologic and geomorphic classifications have gained traction in response to the increasing need for basin-wide water resources management. Regardless of the selected classification scheme, an open scientific challenge is how to extend information from limited field sites to classify tens of thousands to millions of channel reaches across a basin. To address this spatial scaling challenge, this study leverages machine learning to predict reach-scale geomorphic channel types using publicly available geospatial data. A bottom-up machine learning approach selects the most accurate and stable model among∼20,000 combinations of 287 coarse geospatial predictors, preprocessing methods, and algorithms in a three-tiered framework to (i) define a tractable problem and reduce predictor noise, (ii) assess model performance in statistical learning, and (iii) assess model performance in prediction. This study also addresses key issues related to the design, interpretation, and diagnosis of machine learning models in hydrologic sciences. In an application to the Sacramento River basin (California, USA), the developed framework selects a Random Forest model to predict 10 channel types previously determined from 290 field surveys over 108,943 two hundred-meter reaches. Performance in statistical learning is reasonable with a 61% median cross-validation accuracy, a sixfold increase over the 10% accuracy of the baseline random model, and the predictions coherently capture the large-scale geomorphic organization of the landscape. Interestingly, in the study area, the persistent roughness of the topography partially controls channel types and the variation in the entropy-based predictive performance is explained by imperfect training information and scale mismatch between labels and predictors
    corecore