13,488 research outputs found
Distributed multinomial regression
This article introduces a model-based approach to distributed computing for
multinomial logistic (softmax) regression. We treat counts for each response
category as independent Poisson regressions via plug-in estimates for fixed
effects shared across categories. The work is driven by the
high-dimensional-response multinomial models that are used in analysis of a
large number of random counts. Our motivating applications are in text
analysis, where documents are tokenized and the token counts are modeled as
arising from a multinomial dependent upon document attributes. We estimate such
models for a publicly available data set of reviews from Yelp, with text
regressed onto a large set of explanatory variables (user, business, and rating
information). The fitted models serve as a basis for exploring the connection
between words and variables of interest, for reducing dimension into supervised
factor scores, and for prediction. We argue that the approach herein provides
an attractive option for social scientists and other text analysts who wish to
bring familiar regression tools to bear on text data.Comment: Published at http://dx.doi.org/10.1214/15-AOAS831 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A modular and interactive OLED-based lighting system
The concept of a flexible, large-area, organic light emitting diode (OLED)-based lighting system with a modular structure and built-in intelligent light management is introduced. Such a flexible, thin, portable lighting system with discreetly integrated electronics is important in order to allow the implementation of the lighting system into a variety of places, such as cars and temporary expedition areas. A modular construction of an OLED lighting panel makes it possible to control each OLED cell individually. This not only enables us to counteract aging or degradation effects in the OLED cells but it also allows individual OLED module brightness control to support human or ambient interaction based on integrated or centralized sensors. Moreover, integrating the driving electronics in the backplane of an OLED module improves the energy efficiency of operating large OLED panels. The thin, modular construction and individual, dynamic control are successfully demonstrated
How proofs are prepared at Camelot
We study a design framework for robust, independently verifiable, and
workload-balanced distributed algorithms working on a common input. An
algorithm based on the framework is essentially a distributed encoding
procedure for a Reed--Solomon code, which enables (a) robustness against
byzantine failures with intrinsic error-correction and identification of failed
nodes, and (b) independent randomized verification to check the entire
computation for correctness, which takes essentially no more resources than
each node individually contributes to the computation. The framework builds on
recent Merlin--Arthur proofs of batch evaluation of Williams~[{\em Electron.\
Colloq.\ Comput.\ Complexity}, Report TR16-002, January 2016] with the
observation that {\em Merlin's magic is not needed} for batch evaluation---mere
Knights can prepare the proof, in parallel, and with intrinsic
error-correction.
The contribution of this paper is to show that in many cases the verifiable
batch evaluation framework admits algorithms that match in total resource
consumption the best known sequential algorithm for solving the problem. As our
main result, we show that the -cliques in an -vertex graph can be counted
{\em and} verified in per-node time and space on
compute nodes, for any constant and
positive integer divisible by , where is the
exponent of matrix multiplication. This matches in total running time the best
known sequential algorithm, due to Ne{\v{s}}et{\v{r}}il and Poljak [{\em
Comment.~Math.~Univ.~Carolin.}~26 (1985) 415--419], and considerably improves
its space usage and parallelizability. Further results include novel algorithms
for counting triangles in sparse graphs, computing the chromatic polynomial of
a graph, and computing the Tutte polynomial of a graph.Comment: 42 p
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Early Accurate Results for Advanced Analytics on MapReduce
Approximate results based on samples often provide the only way in which
advanced analytical applications on very massive data sets can satisfy their
time and resource constraints. Unfortunately, methods and tools for the
computation of accurate early results are currently not supported in
MapReduce-oriented systems although these are intended for `big data'.
Therefore, we proposed and implemented a non-parametric extension of Hadoop
which allows the incremental computation of early results for arbitrary
work-flows, along with reliable on-line estimates of the degree of accuracy
achieved so far in the computation. These estimates are based on a technique
called bootstrapping that has been widely employed in statistics and can be
applied to arbitrary functions and data distributions. In this paper, we
describe our Early Accurate Result Library (EARL) for Hadoop that was designed
to minimize the changes required to the MapReduce framework. Various tests of
EARL of Hadoop are presented to characterize the frequent situations where EARL
can provide major speed-ups over the current version of Hadoop.Comment: VLDB201
Monte Carlo Radiative Transfer
I outline methods for calculating the solution of Monte Carlo Radiative
Transfer (MCRT) in scattering, absorption and emission processes of dust and
gas, including polarization. I provide a bibliography of relevant papers on
methods with astrophysical applications.Comment: To appear in the Chandra Centennial issue of the Bulletin of the
Astronomical Society of India, volume 39 (2011), eds D.J. Saikia and Virginia
Trimble; 27 pages, 1 figur
Recommended from our members
Inference of single-cell phylogenies from lineage tracing data using Cassiopeia.
The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia
- …