902 research outputs found
The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs
Classification margins are commonly used to estimate the generalization
ability of machine learning models. We present an empirical study of these
margins in artificial neural networks. A global estimate of margin size is
usually used in the literature. In this work, we point out seldom considered
nuances regarding classification margins. Notably, we demonstrate that some
types of training samples are modelled with consistently small margins while
affecting generalization in different ways. By showing a link with the minimum
distance to a different-target sample and the remoteness of samples from one
another, we provide a plausible explanation for this observation. We support
our findings with an analysis of fully-connected networks trained on
noise-corrupted MNIST data, as well as convolutional networks trained on
noise-corrupted CIFAR10 data.Comment: This work is a preprint of a published paper by the same name, which
it subsumes. This preprint is an extended version: it contains additional
empirical evidence and discussio
Studies on dimension reduction and feature spaces :
Today's world produces and stores huge amounts of data, which calls for methods that can tackle both growing sizes and growing dimensionalities of data sets. Dimension reduction aims at answering the challenges posed by the latter.
Many dimension reduction methods consist of a metric transformation part followed by optimization of a cost function. Several classes of cost functions have been developed and studied, while metrics have received less attention. We promote the view that metrics should be lifted to a more independent role in dimension reduction research. The subject of this work is the interaction of metrics with dimension reduction. The work is built on a series of studies on current topics in dimension reduction and neural network research. Neural networks are used both as a tool and as a target for dimension reduction.
When the results of modeling or clustering are represented as a metric, they can be studied using dimension reduction, or they can be used to introduce new properties into a dimension reduction method. We give two examples of such use: visualizing results of hierarchical clustering, and creating supervised variants of existing dimension reduction methods by using a metric that is built on the feature space of a neural network. Combining clustering with dimension reduction results in a novel way for creating space-efficient visualizations, that tell both about hierarchical structure and about distances of clusters.
We study feature spaces used in a recently developed neural network architecture called extreme learning machine. We give a novel interpretation for such neural networks, and recognize the need to parameterize extreme learning machines with the variance of network weights. This has practical implications for use of extreme learning machines, since the current practice emphasizes the role of hidden units and ignores the variance.
A current trend in the research of deep neural networks is to use cost functions from dimension reduction methods to train the network for supervised dimension reduction. We show that equally good results can be obtained by training a bottlenecked neural network for classification or regression, which is faster than using a dimension reduction cost.
We demonstrate that, contrary to the current belief, using sparse distance matrices for creating fast dimension reduction methods is feasible, if a proper balance between short-distance and long-distance entries in the sparse matrix is maintained. This observation opens up a promising research direction, with possibility to use modern dimension reduction methods on much larger data sets than which are manageable today
Robust recognition and exploratory analysis of crystal structures using machine learning
In den Materialwissenschaften läuten Künstliche-Intelligenz Methoden einen Paradigmenwechsel in Richtung Big-data zentrierter Forschung ein. Datenbanken mit Millionen von Einträgen, sowie hochauflösende Experimente, z.B. Elektronenmikroskopie, enthalten eine Fülle wachsender Information. Um diese ungenützten, wertvollen Daten für die Entdeckung verborgener Muster und Physik zu nutzen, müssen automatische analytische Methoden entwickelt werden. Die Kristallstruktur-Klassifizierung ist essentiell für die Charakterisierung eines Materials. Vorhandene Daten bieten vielfältige atomare Strukturen, enthalten jedoch oft Defekte und sind unvollständig. Eine geeignete Methode sollte diesbezüglich robust sein und gleichzeitig viele Systeme klassifizieren können, was für verfügbare Methoden nicht zutrifft. In dieser Arbeit entwickeln wir ARISE, eine Methode, die auf Bayesian deep learning basiert und mehr als 100 Strukturklassen robust und ohne festzulegende Schwellwerte klassifiziert. Die einfach erweiterbare Strukturauswahl ist breit gefächert und umfasst nicht nur Bulk-, sondern auch zwei- und ein-dimensionale Systeme. Für die lokale Untersuchung von großen, polykristallinen Systemen, führen wir die strided pattern matching Methode ein. Obwohl nur auf perfekte Strukturen trainiert, kann ARISE stark gestörte mono- und polykristalline Systeme synthetischen als auch experimentellen Ursprungs charakterisieren. Das Model basiert auf Bayesian deep learning und ist somit probabilistisch, was die systematische Berechnung von Unsicherheiten erlaubt, welche mit der Kristallordnung von metallischen Nanopartikeln in Elektronentomographie-Experimenten korrelieren. Die Anwendung von unüberwachtem Lernen auf interne Darstellungen des neuronalen Netzes enthüllt Korngrenzen und nicht ersichtliche Regionen, die über interpretierbare geometrische Eigenschaften verknüpft sind. Diese Arbeit ermöglicht die Analyse atomarer Strukturen mit starken Rauschquellen auf bisher nicht mögliche Weise.In materials science, artificial-intelligence tools are driving a paradigm shift towards big data-centric research. Large computational databases with millions of entries and high-resolution experiments such as electron microscopy contain large and growing amount of information. To leverage this under-utilized - yet very valuable - data, automatic analytical methods need to be developed. The classification of the crystal structure of a material is essential for its characterization. The available data is structurally diverse but often defective and incomplete. A suitable method should therefore be robust with respect to sources of inaccuracy, while being able to treat multiple systems. Available methods do not fulfill both criteria at the same time. In this work, we introduce ARISE, a Bayesian-deep-learning based framework that can treat more than 100 structural classes in robust fashion, without any predefined threshold. The selection of structural classes, which can be easily extended on demand, encompasses a wide range of materials, in particular, not only bulk but also two- and one-dimensional systems. For the local study of large, polycrystalline samples, we extend ARISE by introducing so-called strided pattern matching. While being trained on ideal structures only, ARISE correctly characterizes strongly perturbed single- and polycrystalline systems, from both synthetic and experimental resources. The probabilistic nature of the Bayesian-deep-learning model allows to obtain principled uncertainty estimates which are found to be correlated with crystalline order of metallic nanoparticles in electron-tomography experiments. Applying unsupervised learning to the internal neural-network representations reveals grain boundaries and (unapparent) structural regions sharing easily interpretable geometrical properties. This work enables the hitherto hindered analysis of noisy atomic structural data
Deep learning topological phases of matter
This thesis is aimed at showing how to set up a typical problem of Condensed Matter physics in a Deep Learning framework. In order to do this we will introduce the Kitaev model (a superconducting quantum wire with topological properties) with nearest neighbor coupling, next to nearest neighbor coupling and an interacting
term. Then we will present the Machine Learning techniques we are going to use. Finally we will apply them to train a Neural Network and a Convolutional Neural Network on recognizing the topological phases of matter of the non-interacting model to test it on the classification of interacting data
3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection
As 3D object detection on point clouds relies on the geometrical
relationships between the points, non-standard object shapes can hinder a
method's detection capability. However, in safety-critical settings, robustness
to out-of-domain and long-tail samples is fundamental to circumvent dangerous
issues, such as the misdetection of damaged or rare cars. In this work, we
substantially improve the generalization of 3D object detectors to
out-of-domain data by deforming point clouds during training. We achieve this
with 3D-VField: a novel data augmentation method that plausibly deforms objects
via vector fields learned in an adversarial fashion. Our approach constrains 3D
points to slide along their sensor view rays while neither adding nor removing
any of them. The obtained vectors are transferable, sample-independent and
preserve shape and occlusions. Despite training only on a standard dataset,
such as KITTI, augmenting with our vector fields significantly improves the
generalization to differently shaped objects and scenes. Towards this end, we
propose and share CrashD: a synthetic dataset of realistic damaged and rare
cars, with a variety of crash scenarios. Extensive experiments on KITTI, Waymo,
our CrashD and SUN RGB-D show the generalizability of our techniques to
out-of-domain data, different models and sensors, namely LiDAR and ToF cameras,
for both indoor and outdoor scenes. Our CrashD dataset is available at
https://crashd-cars.github.io.Comment: CVPR 2022. Project page: https://3d-vfield.github.i
Model-Augmented Estimation of Conditional Mutual Information for Feature Selection
Markov blanket feature selection, while theoretically optimal, is generally
challenging to implement. This is due to the shortcomings of existing
approaches to conditional independence (CI) testing, which tend to struggle
either with the curse of dimensionality or computational complexity. We propose
a novel two-step approach which facilitates Markov blanket feature selection in
high dimensions. First, neural networks are used to map features to
low-dimensional representations. In the second step, CI testing is performed by
applying the -NN conditional mutual information estimator to the learned
feature maps. The mappings are designed to ensure that mapped samples both
preserve information and share similar information about the target variable if
and only if they are close in Euclidean distance. We show that these properties
boost the performance of the -NN estimator in the second step. The
performance of the proposed method is evaluated on both synthetic and real
data.Comment: Accepted to UAI 202
Recommended from our members
A study of instance-based algorithms for supervised learning tasks : mathematical, empirical, and psychological evaluations
This dissertation introduces a framework for specifying instance-based algorithms that can solve supervised learning tasks. These algorithms input a sequence of instances and yield a partial concept description, which is represented by a set of stored instances and associated information. This description can be used to predict values for subsequently presented instances. The thesis of this framework is that extensional concept descriptions and lazy generalization strategies can support efficient supervised learning behavior.The instance-based learning framework consists of three components. The pre-processor component transforms an instance into a more palatable form for the performance component, which computes the instance's similarity with a set of stored instances and yields a prediction for its target value(s). Therefore, the similarity and prediction functions impose generalizations on the stored instances to inductively derive predictions. The learning component assesses the accuracy of these prediction(s) and updates partial concept descriptions to improve their predictive accuracy.This framework is evaluated in four ways. First, its generality is evaluated by mathematically determining the classes of symbolic concepts and numeric functions that can be closely approximated by IB_1, a simple algorithm specified by this framework. Second, this framework is empirically evaluated for its ability to specify algorithms that improve IB_1's learning efficiency. Significant efficiency improvements are obtained by instance-based algorithms that reduce storage requirements, tolerate noisy data, and learn domain-specific similarity functions respectively. Alternative component definitions for these algorithms are empirically analyzed in a set of five high-level parameter studies. Third, this framework is evaluated for its ability to specify psychologically plausible process models for categorization tasks. Results from subject experiments indicate a positive correlation between a models' ability to utilize attribute correlation information and its ability to explain psychological phenomena. Finally, this framework is evaluated for its ability to explain and relate a dozen prominent instance-based learning systems. The survey shows that this framework requires only slight modifications to fit these highly diverse systems. Relationships with edited nearest neighbor algorithms, case-based reasoners, and artificial neural networks are also described
Statistical Significance of the Netflix Challenge
Inspired by the legacy of the Netflix contest, we provide an overview of what
has been learned---from our own efforts, and those of others---concerning the
problems of collaborative filtering and recommender systems. The data set
consists of about 100 million movie ratings (from 1 to 5 stars) involving some
480 thousand users and some 18 thousand movies; the associated ratings matrix
is about 99% sparse. The goal is to predict ratings that users will give to
movies; systems which can do this accurately have significant commercial
applications, particularly on the world wide web. We discuss, in some detail,
approaches to "baseline" modeling, singular value decomposition (SVD), as well
as kNN (nearest neighbor) and neural network models; temporal effects,
cross-validation issues, ensemble methods and other considerations are
discussed as well. We compare existing models in a search for new models, and
also discuss the mission-critical issues of penalization and parameter
shrinkage which arise when the dimensions of a parameter space reaches into the
millions. Although much work on such problems has been carried out by the
computer science and machine learning communities, our goal here is to address
a statistical audience, and to provide a primarily statistical treatment of the
lessons that have been learned from this remarkable set of data.Comment: Published in at http://dx.doi.org/10.1214/11-STS368 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …