167,575 research outputs found

    Classification methods for noise transients in advanced gravitational-wave detectors

    Get PDF
    Noise of non-astrophysical origin will contaminate science data taken by the Advanced Laser Interferometer Gravitational-wave Observatory (aLIGO) and Advanced Virgo gravitational-wave detectors. Prompt characterization of instrumental and environmental noise transients will be critical for improving the sensitivity of the advanced detectors in the upcoming science runs. During the science runs of the initial gravitational-wave detectors, noise transients were manually classified by visually examining the time-frequency scan of each event. Here, we present three new algorithms designed for the automatic classification of noise transients in advanced detectors. Two of these algorithms are based on Principal Component Analysis. They are Principal Component Analysis for Transients (PCAT), and an adaptation of LALInference Burst (LIB). The third algorithm is a combination of an event generator called Wavelet Detection Filter (WDF) and machine learning techniques for classification. We test these algorithms on simulated data sets, and we show their ability to automatically classify transients by frequency, SNR and waveform morphology

    New insights into the classification and nomenclature of cortical GABAergic interneurons.

    Get PDF
    A systematic classification and accepted nomenclature of neuron types is much needed but is currently lacking. This article describes a possible taxonomical solution for classifying GABAergic interneurons of the cerebral cortex based on a novel, web-based interactive system that allows experts to classify neurons with pre-determined criteria. Using Bayesian analysis and clustering algorithms on the resulting data, we investigated the suitability of several anatomical terms and neuron names for cortical GABAergic interneurons. Moreover, we show that supervised classification models could automatically categorize interneurons in agreement with experts' assignments. These results demonstrate a practical and objective approach to the naming, characterization and classification of neurons based on community consensus

    A high-reproducibility and high-accuracy method for automated topic classification

    Full text link
    Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent search, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state-of-the-art in topic classification. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results which are not accurate in inferring the most suitable model parameters. Adapting approaches for community detection in networks, we propose a new algorithm which displays high-reproducibility and high-accuracy, and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure. Our algorithm promises to make "big data" text analysis systems more reliable.Comment: 23 pages, 24 figure

    Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms

    Full text link
    Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful algorithms for transduction. Moreover, the precise relation between induction and transduction has not yet been determined. The main theoretical developments related to transduction were presented by Vapnik more than twenty years ago. One of Vapnik's basic results is a rather tight error bound for transductive classification based on an exact computation of the hypergeometric tail. While tight, this bound is given implicitly via a computational routine. Our first contribution is a somewhat looser but explicit characterization of a slightly extended PAC-Bayesian version of Vapnik's transductive bound. This characterization is obtained using concentration inequalities for the tail of sums of random variables obtained by sampling without replacement. We then derive error bounds for compression schemes such as (transductive) support vector machines and for transduction algorithms based on clustering. The main observation used for deriving these new error bounds and algorithms is that the unlabeled test points, which in the transductive setting are known in advance, can be used in order to construct useful data dependent prior distributions over the hypothesis space

    Random deep neural networks are biased towards simple functions

    Full text link
    We prove that the binary classifiers of bit strings generated by random wide deep neural networks with ReLU activation function are biased towards simple functions. The simplicity is captured by the following two properties. For any given input bit string, the average Hamming distance of the closest input bit string with a different classification is at least sqrt(n / (2{\pi} log n)), where n is the length of the string. Moreover, if the bits of the initial string are flipped randomly, the average number of flips required to change the classification grows linearly with n. These results are confirmed by numerical experiments on deep neural networks with two hidden layers, and settle the conjecture stating that random deep neural networks are biased towards simple functions. This conjecture was proposed and numerically explored in [Valle P\'erez et al., ICLR 2019] to explain the unreasonably good generalization properties of deep learning algorithms. The probability distribution of the functions generated by random deep neural networks is a good choice for the prior probability distribution in the PAC-Bayesian generalization bounds. Our results constitute a fundamental step forward in the characterization of this distribution, therefore contributing to the understanding of the generalization properties of deep learning algorithms

    Overview: Computer vision and machine learning for microstructural characterization and analysis

    Full text link
    The characterization and analysis of microstructure is the foundation of microstructural science, connecting the materials structure to its composition, process history, and properties. Microstructural quantification traditionally involves a human deciding a priori what to measure and then devising a purpose-built method for doing so. However, recent advances in data science, including computer vision (CV) and machine learning (ML) offer new approaches to extracting information from microstructural images. This overview surveys CV approaches to numerically encode the visual information contained in a microstructural image, which then provides input to supervised or unsupervised ML algorithms that find associations and trends in the high-dimensional image representation. CV/ML systems for microstructural characterization and analysis span the taxonomy of image analysis tasks, including image classification, semantic segmentation, object detection, and instance segmentation. These tools enable new approaches to microstructural analysis, including the development of new, rich visual metrics and the discovery of processing-microstructure-property relationships.Comment: submitted to Materials and Metallurgical Transactions

    Piecewise linear regularized solution paths

    Full text link
    We consider the generic regularized optimization problem β^(λ)=argminβL(y,Xβ)+λJ(β)\hat{\mathsf{\beta}}(\lambda)=\arg \min_{\beta}L({\sf{y}},X{\sf{\beta}})+\lambda J({\sf{\beta}}). Efron, Hastie, Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407--499] have shown that for the LASSO--that is, if LL is squared error loss and J(β)=β1J(\beta)=\|\beta\|_1 is the 1\ell_1 norm of β\beta--the optimal coefficient path is piecewise linear, that is, β^(λ)/λ\partial \hat{\beta}(\lambda)/\partial \lambda is piecewise constant. We derive a general characterization of the properties of (loss LL, penalty JJ) pairs which give piecewise linear coefficient paths. Such pairs allow for efficient generation of the full regularized coefficient paths. We investigate the nature of efficient path following algorithms which arise. We use our results to suggest robust versions of the LASSO for regression and classification, and to develop new, efficient algorithms for existing problems in the literature, including Mammen and van de Geer's locally adaptive regression splines.Comment: Published at http://dx.doi.org/10.1214/009053606000001370 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore