200,864 research outputs found
A Dynamic Data Structure to Efficiently Find the Points below a Line and Estimate Their Number
A basic question in computational geometry is how to find the relationship between a set of points and a line in a real plane. In this paper, we present multidimensional data structures for N points that allow answering the following queries for any given input line: (1) estimate in O(log N) time the number of points below the line; (2) return in O(log N + k) time the k ≤ N points that are below the line; and (3) return in O(log N) time the point that is closest to the line. We illustrate the utility of this computational question with GIS applications in air defense and traffic control
Models and algorithms for the next generation of glass transition studies
Successful computer studies of glass-forming materials need to overcome both
the natural tendency to structural ordering and the dramatic increase of
relaxation times at low temperatures. We present a comprehensive analysis of
eleven glass-forming models to demonstrate that both challenges can be
efficiently tackled using carefully designed models of size polydisperse
supercooled liquids together with an efficient Monte Carlo algorithm where
translational particle displacements are complemented by swaps of particle
pairs. We study a broad range of size polydispersities, using both discrete and
continuous mixtures, and we systematically investigate the role of particle
softness, attractivity and non-additivity of the interactions. Each system is
characterized by its robustness against structural ordering and by the
efficiency of the swap Monte Carlo algorithm. We show that the combined
optimisation of the potential's softness, polydispersity and non-additivity
leads to novel computer models with excellent glass-forming ability. For such
models, we achieve over ten orders of magnitude gain in the equilibration
timescale using the swap Monte Carlo algorithm, thus paving the way to
computational studies of static and thermodynamic properties under experimental
conditions. In addition, we provide microscopic insights into the performance
of the swap algorithm which should help optimizing models and algorithms even
further.Comment: 22 pages, 15 fig
The group fused Lasso for multiple change-point detection
We present the group fused Lasso for detection of multiple change-points
shared by a set of co-occurring one-dimensional signals. Change-points are
detected by approximating the original signals with a constraint on the
multidimensional total variation, leading to piecewise-constant approximations.
Fast algorithms are proposed to solve the resulting optimization problems,
either exactly or approximately. Conditions are given for consistency of both
algorithms as the number of signals increases, and empirical evidence is
provided to support the results on simulated and array comparative genomic
hybridization data
Coarse Molecular Dynamics of a Peptide Fragment: Free Energy, Kinetics, and Long-Time Dynamics Computations
We present a ``coarse molecular dynamics'' approach and apply it to studying
the kinetics and thermodynamics of a peptide fragment dissolved in water. Short
bursts of appropriately initialized simulations are used to infer the
deterministic and stochastic components of the peptide motion parametrized by
an appropriate set of coarse variables. Techniques from traditional numerical
analysis (Newton-Raphson, coarse projective integration) are thus enabled;
these techniques help analyze important features of the free-energy landscape
(coarse transition states, eigenvalues and eigenvectors, transition rates,
etc.). Reverse integration of (irreversible) expected coarse variables backward
in time can assist escape from free energy minima and trace low-dimensional
free energy surfaces. To illustrate the ``coarse molecular dynamics'' approach,
we combine multiple short (0.5-ps) replica simulations to map the free energy
surface of the ``alanine dipeptide'' in water, and to determine the ~ 1/(1000
ps) rate of interconversion between the two stable configurational basins
corresponding to the alpha-helical and extended minima.Comment: The article has been submitted to "The Journal of Chemical Physics.
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
Are you going to the party: depends, who else is coming? [Learning hidden group dynamics via conditional latent tree models]
Scalable probabilistic modeling and prediction in high dimensional
multivariate time-series is a challenging problem, particularly for systems
with hidden sources of dependence and/or homogeneity. Examples of such problems
include dynamic social networks with co-evolving nodes and edges and dynamic
student learning in online courses. Here, we address these problems through the
discovery of hierarchical latent groups. We introduce a family of Conditional
Latent Tree Models (CLTM), in which tree-structured latent variables
incorporate the unknown groups. The latent tree itself is conditioned on
observed covariates such as seasonality, historical activity, and node
attributes. We propose a statistically efficient framework for learning both
the hierarchical tree structure and the parameters of the CLTM. We demonstrate
competitive performance in multiple real world datasets from different domains.
These include a dataset on students' attempts at answering questions in a
psychology MOOC, Twitter users participating in an emergency management
discussion and interacting with one another, and windsurfers interacting on a
beach in Southern California. In addition, our modeling framework provides
valuable and interpretable information about the hidden group structures and
their effect on the evolution of the time series
Analysis of Dynamic Brain Imaging Data
Modern imaging techniques for probing brain function, including functional
Magnetic Resonance Imaging, intrinsic and extrinsic contrast optical imaging,
and magnetoencephalography, generate large data sets with complex content. In
this paper we develop appropriate techniques of analysis and visualization of
such imaging data, in order to separate the signal from the noise, as well as
to characterize the signal. The techniques developed fall into the general
category of multivariate time series analysis, and in particular we extensively
use the multitaper framework of spectral analysis. We develop specific
protocols for the analysis of fMRI, optical imaging and MEG data, and
illustrate the techniques by applications to real data sets generated by these
imaging modalities. In general, the analysis protocols involve two distinct
stages: `noise' characterization and suppression, and `signal' characterization
and visualization. An important general conclusion of our study is the utility
of a frequency-based representation, with short, moving analysis windows to
account for non-stationarity in the data. Of particular note are (a) the
development of a decomposition technique (`space-frequency singular value
decomposition') that is shown to be a useful means of characterizing the image
data, and (b) the development of an algorithm, based on multitaper methods, for
the removal of approximately periodic physiological artifacts arising from
cardiac and respiratory sources.Comment: 40 pages; 26 figures with subparts including 3 figures as .gif files.
Originally submitted to the neuro-sys archive which was never publicly
announced (was 9804003
Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs
We develop data structures for dynamic closest pair problems with arbitrary
distance functions, that do not necessarily come from any geometric structure
on the objects. Based on a technique previously used by the author for
Euclidean closest pairs, we show how to insert and delete objects from an
n-object set, maintaining the closest pair, in O(n log^2 n) time per update and
O(n) space. With quadratic space, we can instead use a quadtree-like structure
to achieve an optimal time bound, O(n) per update. We apply these data
structures to hierarchical clustering, greedy matching, and TSP heuristics, and
discuss other potential applications in machine learning, Groebner bases, and
local improvement algorithms for partition and placement problems. Experiments
show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at
the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp.
619-628. For source code and experimental results, see
http://www.ics.uci.edu/~eppstein/projects/pairs
Approximate Profile Maximum Likelihood
We propose an efficient algorithm for approximate computation of the profile
maximum likelihood (PML), a variant of maximum likelihood maximizing the
probability of observing a sufficient statistic rather than the empirical
sample. The PML has appealing theoretical properties, but is difficult to
compute exactly. Inspired by observations gleaned from exactly solvable cases,
we look for an approximate PML solution, which, intuitively, clumps comparably
frequent symbols into one symbol. This amounts to lower-bounding a certain
matrix permanent by summing over a subgroup of the symmetric group rather than
the whole group during the computation. We extensively experiment with the
approximate solution, and find the empirical performance of our approach is
competitive and sometimes significantly better than state-of-the-art
performance for various estimation problems
- …