1,601,711 research outputs found
Proof-Pattern Recognition and Lemma Discovery in ACL2
We present a novel technique for combining statistical machine learning for
proof-pattern recognition with symbolic methods for lemma discovery. The
resulting tool, ACL2(ml), gathers proof statistics and uses statistical
pattern-recognition to pre-processes data from libraries, and then suggests
auxiliary lemmas in new proofs by analogy with already seen examples. This
paper presents the implementation of ACL2(ml) alongside theoretical
descriptions of the proof-pattern recognition and lemma discovery methods
involved in it
Novel statistical approaches for non-normal censored immunological data: analysis of cytokine and gene expression data
Background: For several immune-mediated diseases, immunological analysis will become more complex in the future with datasets in which cytokine and gene expression data play a major role. These data have certain characteristics that require sophisticated statistical analysis such as strategies for non-normal distribution and censoring. Additionally, complex and multiple immunological relationships need to be adjusted for potential confounding and interaction effects.
Objective: We aimed to introduce and apply different methods for statistical analysis of non-normal censored cytokine and gene expression data. Furthermore, we assessed the performance and accuracy of a novel regression approach in order to allow adjusting for covariates and potential confounding.
Methods: For non-normally distributed censored data traditional means such as the Kaplan-Meier method or the generalized Wilcoxon test are described. In order to adjust for covariates the novel approach named Tobit regression on ranks was introduced. Its performance and accuracy for analysis of non-normal censored cytokine/gene expression data was evaluated by a simulation study and a statistical experiment applying permutation and bootstrapping.
Results: If adjustment for covariates is not necessary traditional statistical methods are adequate for non-normal censored data. Comparable with these and appropriate if additional adjustment is required, Tobit regression on ranks is a valid method. Its power, type-I error rate and accuracy were comparable to the classical Tobit regression.
Conclusion: Non-normally distributed censored immunological data require appropriate statistical methods. Tobit regression on ranks meets these requirements and can be used for adjustment for covariates and potential confounding in large and complex immunological datasets
Statistical inference using SGD
We present a novel method for frequentist statistical inference in
-estimation problems, based on stochastic gradient descent (SGD) with a
fixed step size: we demonstrate that the average of such SGD sequences can be
used for statistical inference, after proper scaling. An intuitive analysis
using the Ornstein-Uhlenbeck process suggests that such averages are
asymptotically normal. From a practical perspective, our SGD-based inference
procedure is a first order method, and is well-suited for large scale problems.
To show its merits, we apply it to both synthetic and real datasets, and
demonstrate that its accuracy is comparable to classical statistical methods,
while requiring potentially far less computation.Comment: To appear in AAAI 201
Statistical Models of Reconstructed Phase Spaces for Signal Classification
This paper introduces a novel approach to the analysis and classification of time series signals using statistical models of reconstructed phase spaces. With sufficient dimension, such reconstructed phase spaces are, with probability one, guaranteed to be topologically equivalent to the state dynamics of the generating system, and, therefore, may contain information that is absent in analysis and classification methods rooted in linear assumptions. Parametric and nonparametric distributions are introduced as statistical representations over the multidimensional reconstructed phase space, with classification accomplished through methods such as Bayes maximum likelihood and artificial neural networks (ANNs). The technique is demonstrated on heart arrhythmia classification and speech recognition. This new approach is shown to be a viable and effective alternative to traditional signal classification approaches, particularly for signals with strong nonlinear characteristics
Fast and Guaranteed Tensor Decomposition via Sketching
Tensor CANDECOMP/PARAFAC (CP) decomposition has wide applications in
statistical learning of latent variable models and in data mining. In this
paper, we propose fast and randomized tensor CP decomposition algorithms based
on sketching. We build on the idea of count sketches, but introduce many novel
ideas which are unique to tensors. We develop novel methods for randomized
computation of tensor contractions via FFTs, without explicitly forming the
tensors. Such tensor contractions are encountered in decomposition methods such
as tensor power iterations and alternating least squares. We also design novel
colliding hashes for symmetric tensors to further save time in computing the
sketches. We then combine these sketching ideas with existing whitening and
tensor power iterative techniques to obtain the fastest algorithm on both
sparse and dense tensors. The quality of approximation under our method does
not depend on properties such as sparsity, uniformity of elements, etc. We
apply the method for topic modeling and obtain competitive results.Comment: 29 pages. Appeared in Proceedings of Advances in Neural Information
Processing Systems (NIPS), held at Montreal, Canada in 201
Joint and individual analysis of breast cancer histologic images and genomic covariates
A key challenge in modern data analysis is understanding connections between
complex and differing modalities of data. For example, two of the main
approaches to the study of breast cancer are histopathology (analyzing visual
characteristics of tumors) and genetics. While histopathology is the gold
standard for diagnostics and there have been many recent breakthroughs in
genetics, there is little overlap between these two fields. We aim to bridge
this gap by developing methods based on Angle-based Joint and Individual
Variation Explained (AJIVE) to directly explore similarities and differences
between these two modalities. Our approach exploits Convolutional Neural
Networks (CNNs) as a powerful, automatic method for image feature extraction to
address some of the challenges presented by statistical analysis of
histopathology image data. CNNs raise issues of interpretability that we
address by developing novel methods to explore visual modes of variation
captured by statistical algorithms (e.g. PCA or AJIVE) applied to CNN features.
Our results provide many interpretable connections and contrasts between
histopathology and genetics
- …
