52,033 research outputs found
Are you going to the party: depends, who else is coming? [Learning hidden group dynamics via conditional latent tree models]
Scalable probabilistic modeling and prediction in high dimensional
multivariate time-series is a challenging problem, particularly for systems
with hidden sources of dependence and/or homogeneity. Examples of such problems
include dynamic social networks with co-evolving nodes and edges and dynamic
student learning in online courses. Here, we address these problems through the
discovery of hierarchical latent groups. We introduce a family of Conditional
Latent Tree Models (CLTM), in which tree-structured latent variables
incorporate the unknown groups. The latent tree itself is conditioned on
observed covariates such as seasonality, historical activity, and node
attributes. We propose a statistically efficient framework for learning both
the hierarchical tree structure and the parameters of the CLTM. We demonstrate
competitive performance in multiple real world datasets from different domains.
These include a dataset on students' attempts at answering questions in a
psychology MOOC, Twitter users participating in an emergency management
discussion and interacting with one another, and windsurfers interacting on a
beach in Southern California. In addition, our modeling framework provides
valuable and interpretable information about the hidden group structures and
their effect on the evolution of the time series
Decoding the urban grid: or why cities are neither trees nor perfect grids
In a previous paper (Figueiredo and Amorim, 2005), we introduced the continuity
lines, a compressed description that encapsulates topological and geometrical
properties of urban grids. In this paper, we applied this technique to a large
database of maps that included cities of 22 countries. We explore how this
representation encodes into networks universal features of urban grids and, at the
same time, retrieves differences that reflect classes of cities. Then, we propose an
emergent taxonomy for urban grids
Cayley Trees and Bethe Lattices, a concise analysis for mathematicians and physicists
We review critically the concepts and the applications of Cayley Trees and
Bethe Lattices in statistical mechanics in a tentative effort to remove
widespread misuse of these simple, but yet important - and different - ideal
graphs. We illustrate, in particular, two rigorous techniques to deal with
Bethe Lattices, based respectively on self-similarity and on the Kolmogorov
consistency theorem, linking the latter with the Cavity and Belief Propagation
methods, more known to the physics community.Comment: 10 pages, 2 figure
Impact of Biases in Big Data
The underlying paradigm of big data-driven machine learning reflects the
desire of deriving better conclusions from simply analyzing more data, without
the necessity of looking at theory and models. Is having simply more data
always helpful? In 1936, The Literary Digest collected 2.3M filled in
questionnaires to predict the outcome of that year's US presidential election.
The outcome of this big data prediction proved to be entirely wrong, whereas
George Gallup only needed 3K handpicked people to make an accurate prediction.
Generally, biases occur in machine learning whenever the distributions of
training set and test set are different. In this work, we provide a review of
different sorts of biases in (big) data sets in machine learning. We provide
definitions and discussions of the most commonly appearing biases in machine
learning: class imbalance and covariate shift. We also show how these biases
can be quantified and corrected. This work is an introductory text for both
researchers and practitioners to become more aware of this topic and thus to
derive more reliable models for their learning problems
Surface networks
© Copyright CASA, UCL. The desire to understand and exploit the structure of continuous surfaces is common to researchers in a range of disciplines. Few examples of the varied surfaces forming an integral part of modern subjects include terrain, population density, surface atmospheric pressure, physico-chemical surfaces, computer graphics, and metrological surfaces. The focus of the work here is a group of data structures called Surface Networks, which abstract 2-dimensional surfaces by storing only the most important (also called fundamental, critical or surface-specific) points and lines in the surfaces. Surface networks are intelligent and ânatural â data structures because they store a surface as a framework of âsurface â elements unlike the DEM or TIN data structures. This report presents an overview of the previous works and the ideas being developed by the authors of this report. The research on surface networks has fou
- âŠ