114,234 research outputs found
Efficient Management of Short-Lived Data
Motivated by the increasing prominence of loosely-coupled systems, such as
mobile and sensor networks, which are characterised by intermittent
connectivity and volatile data, we study the tagging of data with so-called
expiration times. More specifically, when data are inserted into a database,
they may be tagged with time values indicating when they expire, i.e., when
they are regarded as stale or invalid and thus are no longer considered part of
the database. In a number of applications, expiration times are known and can
be assigned at insertion time. We present data structures and algorithms for
online management of data tagged with expiration times. The algorithms are
based on fully functional, persistent treaps, which are a combination of binary
search trees with respect to a primary attribute and heaps with respect to a
secondary attribute. The primary attribute implements primary keys, and the
secondary attribute stores expiration times in a minimum heap, thus keeping a
priority queue of tuples to expire. A detailed and comprehensive experimental
study demonstrates the well-behavedness and scalability of the approach as well
as its efficiency with respect to a number of competitors.Comment: switched to TimeCenter latex styl
Mapping Topographic Structure in White Matter Pathways with Level Set Trees
Fiber tractography on diffusion imaging data offers rich potential for
describing white matter pathways in the human brain, but characterizing the
spatial organization in these large and complex data sets remains a challenge.
We show that level set trees---which provide a concise representation of the
hierarchical mode structure of probability density functions---offer a
statistically-principled framework for visualizing and analyzing topography in
fiber streamlines. Using diffusion spectrum imaging data collected on
neurologically healthy controls (N=30), we mapped white matter pathways from
the cortex into the striatum using a deterministic tractography algorithm that
estimates fiber bundles as dimensionless streamlines. Level set trees were used
for interactive exploration of patterns in the endpoint distributions of the
mapped fiber tracks and an efficient segmentation of the tracks that has
empirical accuracy comparable to standard nonparametric clustering methods. We
show that level set trees can also be generalized to model pseudo-density
functions in order to analyze a broader array of data types, including entire
fiber streamlines. Finally, resampling methods show the reliability of the
level set tree as a descriptive measure of topographic structure, illustrating
its potential as a statistical descriptor in brain imaging analysis. These
results highlight the broad applicability of level set trees for visualizing
and analyzing high-dimensional data like fiber tractography output
Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation
Black-box risk scoring models permeate our lives, yet are typically
proprietary or opaque. We propose Distill-and-Compare, a model distillation and
comparison approach to audit such models. To gain insight into black-box
models, we treat them as teachers, training transparent student models to mimic
the risk scores assigned by black-box models. We compare the student model
trained with distillation to a second un-distilled transparent model trained on
ground-truth outcomes, and use differences between the two models to gain
insight into the black-box model. Our approach can be applied in a realistic
setting, without probing the black-box model API. We demonstrate the approach
on four public data sets: COMPAS, Stop-and-Frisk, Chicago Police, and Lending
Club. We also propose a statistical test to determine if a data set is missing
key features used to train the black-box model. Our test finds that the
ProPublica data is likely missing key feature(s) used in COMPAS.Comment: Camera-ready version for AAAI/ACM AIES 2018. Data and pseudocode at
https://github.com/shftan/auditblackbox. Previously titled "Detecting Bias in
Black-Box Models Using Transparent Model Distillation". A short version was
presented at NIPS 2017 Symposium on Interpretable Machine Learnin
Galacticus: A Semi-Analytic Model of Galaxy Formation
We describe a new, free and open source semi-analytic model of galaxy
formation, Galacticus. The Galacticus model was designed to be highly modular
to facilitate expansion and the exploration of alternative descriptions of key
physical ingredients. We detail the Galacticus engine for evolving galaxies
through a merging hierarchy of dark matter halos and give details of the
specific implementations of physics currently available in Galacticus. Finally,
we show results from an example model that is in reasonably good agreement with
several observational datasets. We use this model to explore numerical
convergence and to demonstrate the types of information which can be extracted
from Galacticus.Comment: 35 pages, submitted to New Astronom
- …