30,537 research outputs found
Proof-Pattern Recognition and Lemma Discovery in ACL2
We present a novel technique for combining statistical machine learning for
proof-pattern recognition with symbolic methods for lemma discovery. The
resulting tool, ACL2(ml), gathers proof statistics and uses statistical
pattern-recognition to pre-processes data from libraries, and then suggests
auxiliary lemmas in new proofs by analogy with already seen examples. This
paper presents the implementation of ACL2(ml) alongside theoretical
descriptions of the proof-pattern recognition and lemma discovery methods
involved in it
A deep matrix factorization method for learning attribute representations
Semi-Non-negative Matrix Factorization is a technique that learns a
low-dimensional representation of a dataset that lends itself to a clustering
interpretation. It is possible that the mapping between this new representation
and our original data matrix contains rather complex hierarchical information
with implicit lower-level hidden attributes, that classical one level
clustering methodologies can not interpret. In this work we propose a novel
model, Deep Semi-NMF, that is able to learn such hidden representations that
allow themselves to an interpretation of clustering according to different,
unknown attributes of a given dataset. We also present a semi-supervised
version of the algorithm, named Deep WSF, that allows the use of (partial)
prior information for each of the known attributes of a dataset, that allows
the model to be used on datasets with mixed attribute knowledge. Finally, we
show that our models are able to learn low-dimensional representations that are
better suited for clustering, but also classification, outperforming
Semi-Non-negative Matrix Factorization, but also other state-of-the-art
methodologies variants.Comment: Submitted to TPAMI (16-Mar-2015
Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering
User information needs vary significantly across different tasks, and
therefore their queries will also differ considerably in their expressiveness
and semantics. Many studies have been proposed to model such query diversity by
obtaining query types and building query-dependent ranking models. These
studies typically require either a labeled query dataset or clicks from
multiple users aggregated over the same document. These techniques, however,
are not applicable when manual query labeling is not viable, and aggregated
clicks are unavailable due to the private nature of the document collection,
e.g., in email search scenarios. In this paper, we study how to obtain query
type in an unsupervised fashion and how to incorporate this information into
query-dependent ranking models. We first develop a hierarchical clustering
algorithm based on truncated SVD and varimax rotation to obtain coarse-to-fine
query types. Then, we study three query-dependent ranking models, including two
neural models that leverage query type information as additional features, and
one novel multi-task neural model that views query type as the label for the
auxiliary query cluster prediction task. This multi-task model is trained to
simultaneously rank documents and predict query types. Our experiments on tens
of millions of real-world email search queries demonstrate that the proposed
multi-task model can significantly outperform the baseline neural ranking
models, which either do not incorporate query type information or just simply
feed query type as an additional feature.Comment: CIKM 201
Classifying the unknown: discovering novel gravitational-wave detector glitches using similarity learning
The observation of gravitational waves from compact binary coalescences by
LIGO and Virgo has begun a new era in astronomy. A critical challenge in making
detections is determining whether loud transient features in the data are
caused by gravitational waves or by instrumental or environmental sources. The
citizen-science project \emph{Gravity Spy} has been demonstrated as an
efficient infrastructure for classifying known types of noise transients
(glitches) through a combination of data analysis performed by both citizen
volunteers and machine learning. We present the next iteration of this project,
using similarity indices to empower citizen scientists to create large data
sets of unknown transients, which can then be used to facilitate supervised
machine-learning characterization. This new evolution aims to alleviate a
persistent challenge that plagues both citizen-science and instrumental
detector work: the ability to build large samples of relatively rare events.
Using two families of transient noise that appeared unexpectedly during LIGO's
second observing run (O2), we demonstrate the impact that the similarity
indices could have had on finding these new glitch types in the Gravity Spy
program
BlockTag: Design and applications of a tagging system for blockchain analysis
Annotating blockchains with auxiliary data is useful for many applications.
For example, e-crime investigations of illegal Tor hidden services, such as
Silk Road, often involve linking Bitcoin addresses, from which money is sent or
received, to user accounts and related online activities. We present BlockTag,
an open-source tagging system for blockchains that facilitates such tasks. We
describe BlockTag's design and present three analyses that illustrate its
capabilities in the context of privacy research and law enforcement
The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices
This paper proposes scalable and fast algorithms for solving the Robust PCA
problem, namely recovering a low-rank matrix with an unknown fraction of its
entries being arbitrarily corrupted. This problem arises in many applications,
such as image processing, web data ranking, and bioinformatic data analysis. It
was recently shown that under surprisingly broad conditions, the Robust PCA
problem can be exactly solved via convex optimization that minimizes a
combination of the nuclear norm and the -norm . In this paper, we apply
the method of augmented Lagrange multipliers (ALM) to solve this convex
program. As the objective function is non-smooth, we show how to extend the
classical analysis of ALM to such new objective functions and prove the
optimality of the proposed algorithms and characterize their convergence rate.
Empirically, the proposed new algorithms can be more than five times faster
than the previous state-of-the-art algorithms for Robust PCA, such as the
accelerated proximal gradient (APG) algorithm. Moreover, the new algorithms
achieve higher precision, yet being less storage/memory demanding. We also show
that the ALM technique can be used to solve the (related but somewhat simpler)
matrix completion problem and obtain rather promising results too. We further
prove the necessary and sufficient condition for the inexact ALM to converge
globally. Matlab code of all algorithms discussed are available at
http://perception.csl.illinois.edu/matrix-rank/home.htmlComment: Please cite "Zhouchen Lin, Risheng Liu, and Zhixun Su, Linearized
Alternating Direction Method with Adaptive Penalty for Low Rank
Representation, NIPS 2011." (available at arXiv:1109.0367) instead for a more
general method called Linearized Alternating Direction Method This manuscript
first appeared as University of Illinois at Urbana-Champaign technical report
#UILU-ENG-09-2215 in October 2009 Zhouchen Lin, Risheng Liu, and Zhixun Su,
Linearized Alternating Direction Method with Adaptive Penalty for Low Rank
Representation, NIPS 2011. (available at http://arxiv.org/abs/1109.0367
- …