3,670 research outputs found
A Dynamic I/O-Efficient Structure for One-Dimensional Top-k Range Reporting
We present a structure in external memory for "top-k range reporting", which
uses linear space, answers a query in O(lg_B n + k/B) I/Os, and supports an
update in O(lg_B n) amortized I/Os, where n is the input size, and B is the
block size. This improves the state of the art which incurs O(lg^2_B n)
amortized I/Os per update.Comment: In PODS'1
Learning by stochastic serializations
Complex structures are typical in machine learning. Tailoring learning
algorithms for every structure requires an effort that may be saved by defining
a generic learning procedure adaptive to any complex structure. In this paper,
we propose to map any complex structure onto a generic form, called
serialization, over which we can apply any sequence-based density estimator. We
then show how to transfer the learned density back onto the space of original
structures. To expose the learning procedure to the structural particularities
of the original structures, we take care that the serializations reflect
accurately the structures' properties. Enumerating all serializations is
infeasible. We propose an effective way to sample representative serializations
from the complete set of serializations which preserves the statistics of the
complete set. Our method is competitive or better than state of the art
learning algorithms that have been specifically designed for given structures.
In addition, since the serialization involves sampling from a combinatorial
process it provides considerable protection from overfitting, which we clearly
demonstrate on a number of experiments.Comment: Submission to NeurIPS 201
Accounting for Individual Differences in Bradley-Terry Models by Means of Recursive Partitioning
The preference scaling of a group of subjects may not be homogeneous, but different
groups of subjects with certain characteristics may show different preference scalings,
each of which can be derived from paired comparisons by means of the Bradley-Terry model.
Usually, either different models are fit in predefined subsets of the
sample, or the effects of subject covariates are explicitly specified in a parametric
model. In both cases, categorical covariates can be employed directly to distinguish
between the different groups, while numeric covariates are typically discretized
prior to modeling.
Here, a semi-parametric approach for recursive partitioning of Bradley-Terry models is
introduced as a means for identifying groups of subjects with homogeneous preference scalings
in a data-driven way. In this approach, the covariates that -- in main effects or
interactions -- distinguish between groups of subjects with different preference
orderings, are detected automatically from the set of candidate covariates. One main
advantage of this approach is that sensible partitions in numeric covariates are
also detected automatically
Scraping the Social? Issues in live social research
What makes scraping methodologically interesting for social and cultural research? This paper seeks to contribute to debates about digital social research by exploring how a ‘medium-specific’ technique for online data capture may be rendered analytically productive for social research. As a device that is currently being imported into social research, scraping has the capacity to re-structure social research, and this in at least two ways. Firstly, as a technique that is not native to social research, scraping risks to introduce ‘alien’ methodological assumptions into social research (such as an pre-occupation with freshness). Secondly, to scrape is to risk importing into our inquiry categories that are prevalent in the social practices enabled by the media: scraping makes available already formatted data for social research. Scraped data, and online social data more generally, tend to come with ‘external’ analytics already built-in. This circumstance is often approached as a ‘problem’ with online data capture, but we propose it may be turned into virtue, insofar as data formats that have currency in the areas under scrutiny may serve as a source of social data themselves. Scraping, we propose, makes it possible to render traffic between the object and process of social research analytically productive. It enables a form of ‘real-time’ social research, in which the formats and life cycles of online data may lend structure to the analytic objects and findings of social research. By way of a conclusion, we demonstrate this point in an exercise of online issue profiling, and more particularly, by relying on Twitter to profile the issue of ‘austerity’. Here we distinguish between two forms of real-time research, those dedicated to monitoring live content (which terms are current?) and those concerned with analysing the liveliness of issues (which topics are happening?)
Programming with process groups: Group and multicast semantics
Process groups are a natural tool for distributed programming and are increasingly important in distributed computing environments. Discussed here is a new architecture that arose from an effort to simplify Isis process group semantics. The findings include a refined notion of how the clients of a group should be treated, what the properties of a multicast primitive should be when systems contain large numbers of overlapping groups, and a new construct called the causality domain. A system based on this architecture is now being implemented in collaboration with the Chorus and Mach projects
Vincia for Hadron Colliders
We present the first public implementation of antenna-based QCD initial- and
final-state showers. The shower kernels are antenna functions, which
capture not only the collinear dynamics but also the leading soft (coherent)
singularities of QCD matrix elements. We define the evolution measure to be
inversely proportional to the leading poles, hence gluon emissions are evolved
in a measure inversely proportional to the eikonal, while processes
that only contain a single pole (e.g., ) are evolved in
virtuality. Non-ordered emissions are allowed, suppressed by an additional
power of . Recoils and kinematics are governed by exact on-shell phase-space factorisations. This first implementation is limited to massless
QCD partons and colourless resonances. Tree-level matrix-element corrections
are included for QCD up to (4 jets), and for
Drell-Yan and Higgs production up to ( + 3
jets). The resulting algorithm has been made publicly available in Vincia 2.0
- …