167,575 research outputs found
Classification methods for noise transients in advanced gravitational-wave detectors
Noise of non-astrophysical origin will contaminate science data taken by the
Advanced Laser Interferometer Gravitational-wave Observatory (aLIGO) and
Advanced Virgo gravitational-wave detectors. Prompt characterization of
instrumental and environmental noise transients will be critical for improving
the sensitivity of the advanced detectors in the upcoming science runs. During
the science runs of the initial gravitational-wave detectors, noise transients
were manually classified by visually examining the time-frequency scan of each
event. Here, we present three new algorithms designed for the automatic
classification of noise transients in advanced detectors. Two of these
algorithms are based on Principal Component Analysis. They are Principal
Component Analysis for Transients (PCAT), and an adaptation of LALInference
Burst (LIB). The third algorithm is a combination of an event generator called
Wavelet Detection Filter (WDF) and machine learning techniques for
classification. We test these algorithms on simulated data sets, and we show
their ability to automatically classify transients by frequency, SNR and
waveform morphology
New insights into the classification and nomenclature of cortical GABAergic interneurons.
A systematic classification and accepted nomenclature of neuron types is much needed but is currently lacking. This article describes a possible taxonomical solution for classifying GABAergic interneurons of the cerebral cortex based on a novel, web-based interactive system that allows experts to classify neurons with pre-determined criteria. Using Bayesian analysis and clustering algorithms on the resulting data, we investigated the suitability of several anatomical terms and neuron names for cortical GABAergic interneurons. Moreover, we show that supervised classification models could automatically categorize interneurons in agreement with experts' assignments. These results demonstrate a practical and objective approach to the naming, characterization and classification of neurons based on community consensus
A high-reproducibility and high-accuracy method for automated topic classification
Much of human knowledge sits in large databases of unstructured text.
Leveraging this knowledge requires algorithms that extract and record metadata
on unstructured text documents. Assigning topics to documents will enable
intelligent search, statistical characterization, and meaningful
classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in
topic classification. Here, we perform a systematic theoretical and numerical
analysis that demonstrates that current optimization techniques for LDA often
yield results which are not accurate in inferring the most suitable model
parameters. Adapting approaches for community detection in networks, we propose
a new algorithm which displays high-reproducibility and high-accuracy, and also
has high computational efficiency. We apply it to a large set of documents in
the English Wikipedia and reveal its hierarchical structure. Our algorithm
promises to make "big data" text analysis systems more reliable.Comment: 23 pages, 24 figure
Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms
Inductive learning is based on inferring a general rule from a finite data
set and using it to label new data. In transduction one attempts to solve the
problem of using a labeled training set to label a set of unlabeled points,
which are given to the learner prior to learning. Although transduction seems
at the outset to be an easier task than induction, there have not been many
provably useful algorithms for transduction. Moreover, the precise relation
between induction and transduction has not yet been determined. The main
theoretical developments related to transduction were presented by Vapnik more
than twenty years ago. One of Vapnik's basic results is a rather tight error
bound for transductive classification based on an exact computation of the
hypergeometric tail. While tight, this bound is given implicitly via a
computational routine. Our first contribution is a somewhat looser but explicit
characterization of a slightly extended PAC-Bayesian version of Vapnik's
transductive bound. This characterization is obtained using concentration
inequalities for the tail of sums of random variables obtained by sampling
without replacement. We then derive error bounds for compression schemes such
as (transductive) support vector machines and for transduction algorithms based
on clustering. The main observation used for deriving these new error bounds
and algorithms is that the unlabeled test points, which in the transductive
setting are known in advance, can be used in order to construct useful data
dependent prior distributions over the hypothesis space
Random deep neural networks are biased towards simple functions
We prove that the binary classifiers of bit strings generated by random wide
deep neural networks with ReLU activation function are biased towards simple
functions. The simplicity is captured by the following two properties. For any
given input bit string, the average Hamming distance of the closest input bit
string with a different classification is at least sqrt(n / (2{\pi} log n)),
where n is the length of the string. Moreover, if the bits of the initial
string are flipped randomly, the average number of flips required to change the
classification grows linearly with n. These results are confirmed by numerical
experiments on deep neural networks with two hidden layers, and settle the
conjecture stating that random deep neural networks are biased towards simple
functions. This conjecture was proposed and numerically explored in [Valle
P\'erez et al., ICLR 2019] to explain the unreasonably good generalization
properties of deep learning algorithms. The probability distribution of the
functions generated by random deep neural networks is a good choice for the
prior probability distribution in the PAC-Bayesian generalization bounds. Our
results constitute a fundamental step forward in the characterization of this
distribution, therefore contributing to the understanding of the generalization
properties of deep learning algorithms
Overview: Computer vision and machine learning for microstructural characterization and analysis
The characterization and analysis of microstructure is the foundation of
microstructural science, connecting the materials structure to its composition,
process history, and properties. Microstructural quantification traditionally
involves a human deciding a priori what to measure and then devising a
purpose-built method for doing so. However, recent advances in data science,
including computer vision (CV) and machine learning (ML) offer new approaches
to extracting information from microstructural images. This overview surveys CV
approaches to numerically encode the visual information contained in a
microstructural image, which then provides input to supervised or unsupervised
ML algorithms that find associations and trends in the high-dimensional image
representation. CV/ML systems for microstructural characterization and analysis
span the taxonomy of image analysis tasks, including image classification,
semantic segmentation, object detection, and instance segmentation. These tools
enable new approaches to microstructural analysis, including the development of
new, rich visual metrics and the discovery of
processing-microstructure-property relationships.Comment: submitted to Materials and Metallurgical Transactions
Piecewise linear regularized solution paths
We consider the generic regularized optimization problem
. Efron, Hastie,
Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407--499] have shown that for
the LASSO--that is, if is squared error loss and is
the norm of --the optimal coefficient path is piecewise linear,
that is, is piecewise
constant. We derive a general characterization of the properties of (loss ,
penalty ) pairs which give piecewise linear coefficient paths. Such pairs
allow for efficient generation of the full regularized coefficient paths. We
investigate the nature of efficient path following algorithms which arise. We
use our results to suggest robust versions of the LASSO for regression and
classification, and to develop new, efficient algorithms for existing problems
in the literature, including Mammen and van de Geer's locally adaptive
regression splines.Comment: Published at http://dx.doi.org/10.1214/009053606000001370 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …