37 research outputs found
Narrowing the Gap: Random Forests In Theory and In Practice
Despite widespread interest and practical use, the theoretical properties of
random forests are still not well understood. In this paper we contribute to
this understanding in two ways. We present a new theoretically tractable
variant of random regression forests and prove that our algorithm is
consistent. We also provide an empirical evaluation, comparing our algorithm
and other theoretically tractable random forest models to the random forest
algorithm used in practice. Our experiments provide insight into the relative
importance of different simplifications that theoreticians have made to obtain
tractable models for analysis.Comment: Under review by the International Conference on Machine Learning
(ICML) 201
Linear and Parallel Learning of Markov Random Fields
We introduce a new embarrassingly parallel parameter learning algorithm for
Markov random fields with untied parameters which is efficient for a large
class of practical models. Our algorithm parallelizes naturally over cliques
and, for graphs of bounded degree, its complexity is linear in the number of
cliques. Unlike its competitors, our algorithm is fully parallel and for
log-linear models it is also data efficient, requiring only the local
sufficient statistics of the data to estimate parameters
Distributed Parameter Estimation in Probabilistic Graphical Models
This paper presents foundational theoretical results on distributed parameter
estimation for undirected probabilistic graphical models. It introduces a
general condition on composite likelihood decompositions of these models which
guarantees the global consistency of distributed estimators, provided the local
estimators are consistent
Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network
Capturing the compositional process which maps the meaning of words to that
of documents is a central challenge for researchers in Natural Language
Processing and Information Retrieval. We introduce a model that is able to
represent the meaning of documents by embedding them in a low dimensional
vector space, while preserving distinctions of word and sentence order crucial
for capturing nuanced semantics. Our model is based on an extended Dynamic
Convolution Neural Network, which learns convolution filters at both the
sentence and document level, hierarchically learning to capture and compose low
level lexical features into high level semantic concepts. We demonstrate the
effectiveness of this model on a range of document modelling tasks, achieving
strong results with no feature engineering and with a more compact model.
Inspired by recent advances in visualising deep convolution networks for
computer vision, we present a novel visualisation technique for our document
networks which not only provides insight into their learning process, but also
can be interpreted to produce a compelling automatic summarisation system for
texts