8,146,173 research outputs found
Darwinian Data Structure Selection
Data structure selection and tuning is laborious but can vastly improve an
application's performance and memory footprint. Some data structures share a
common interface and enjoy multiple implementations. We call them Darwinian
Data Structures (DDS), since we can subject their implementations to survival
of the fittest. We introduce ARTEMIS a multi-objective, cloud-based
search-based optimisation framework that automatically finds optimal, tuned DDS
modulo a test suite, then changes an application to use that DDS. ARTEMIS
achieves substantial performance improvements for \emph{every} project in
Java projects from DaCapo benchmark, popular projects and uniformly
sampled projects from GitHub. For execution time, CPU usage, and memory
consumption, ARTEMIS finds at least one solution that improves \emph{all}
measures for () of the projects. The median improvement across
the best solutions is , , for runtime, memory and CPU
usage.
These aggregate results understate ARTEMIS's potential impact. Some of the
benchmarks it improves are libraries or utility functions. Two examples are
gson, a ubiquitous Java serialization framework, and xalan, Apache's XML
transformation tool. ARTEMIS improves gson by \%, and for
memory, runtime, and CPU; ARTEMIS improves xalan's memory consumption by
\%. \emph{Every} client of these projects will benefit from these
performance improvements.Comment: 11 page
Recommended from our members
Structure identification in relational data
This paper presents several investigations into the prospects for identifying meaningful structures in empirical data, namely, structures permitting effective organization of the data to meet requirements of future queries. We propose a general framework whereby the notion of identifiability is given a precise formal definition similar to that of learnability. Using this framework, we then explore if a tractable procedure exists for deciding whether a given relation is decomposable into a constraint network or a CNF theory with desirable topology and, if the answer is positive, identifying the desired decomposition. Finally, we address the problem of expressing a given relation as a Horn theory and, if this is impossible, finding the best k-Horn approximation to the given relation. We show that both problems can be solved in time polynomial in the length of the data
Structure Selection from Streaming Relational Data
Statistical relational learning techniques have been successfully applied in
a wide range of relational domains. In most of these applications, the human
designers capitalized on their background knowledge by following a
trial-and-error trajectory, where relational features are manually defined by a
human engineer, parameters are learned for those features on the training data,
the resulting model is validated, and the cycle repeats as the engineer adjusts
the set of features. This paper seeks to streamline application development in
large relational domains by introducing a light-weight approach that
efficiently evaluates relational features on pieces of the relational graph
that are streamed to it one at a time. We evaluate our approach on two social
media tasks and demonstrate that it leads to more accurate models that are
learned faster
The Skip Quadtree: A Simple Dynamic Data Structure for Multidimensional Data
We present a new multi-dimensional data structure, which we call the skip
quadtree (for point data in R^2) or the skip octree (for point data in R^d,
with constant d>2). Our data structure combines the best features of two
well-known data structures, in that it has the well-defined "box"-shaped
regions of region quadtrees and the logarithmic-height search and update
hierarchical structure of skip lists. Indeed, the bottom level of our structure
is exactly a region quadtree (or octree for higher dimensional data). We
describe efficient algorithms for inserting and deleting points in a skip
quadtree, as well as fast methods for performing point location and approximate
range queries.Comment: 12 pages, 3 figures. A preliminary version of this paper appeared in
the 21st ACM Symp. Comp. Geom., Pisa, 2005, pp. 296-30
Static Data Structure Lower Bounds Imply Rigidity
We show that static data structure lower bounds in the group (linear) model
imply semi-explicit lower bounds on matrix rigidity. In particular, we prove
that an explicit lower bound of on the cell-probe
complexity of linear data structures in the group model, even against
arbitrarily small linear space , would already imply a
semi-explicit () construction of rigid matrices with
significantly better parameters than the current state of art (Alon, Panigrahy
and Yekhanin, 2009). Our results further assert that polynomial () data structure lower bounds against near-optimal space, would
imply super-linear circuit lower bounds for log-depth linear circuits (a
four-decade open question). In the succinct space regime , we show
that any improvement on current cell-probe lower bounds in the linear model
would also imply new rigidity bounds. Our results rely on a new connection
between the "inner" and "outer" dimensions of a matrix (Paturi and Pudlak,
2006), and on a new reduction from worst-case to average-case rigidity, which
is of independent interest
H1 Diffractive Structure Functions Measurement from new data
New measurements of the reduced cross section for the
diffractive process in the kinematic domain
GeV, and \xpom<0.1 are presented. Data events
recorded by the H1 detector during the years 1999--2000 and 2004 have been
used, corresponding to a total integrated luminosity of 68 pb. The
measurements are derived in the same range as previous H1 data, namely GeV and GeV. Two different analysis methods, rapidity gap
and , are used and similar results are obtained in the kinematic domain of
overlap. Finally, together with previous data, the diffractive structure
function measurements are analysed with a model based on the dipole formulation
of diffractive scattering. It is found to give a very good description of the
data over the whole kinematic range.Comment: 4 pages, 4 figure; To appear in the proceedings of 14th International
Workshop on Deep Inelastic Scattering (DIS 2006), Tsukuba, Japan, 20-24 Apr
200
An Optimized Data Structure for High Throughput 3D Proteomics Data: mzRTree
As an emerging field, MS-based proteomics still requires software tools for
efficiently storing and accessing experimental data. In this work, we focus on
the management of LC-MS data, which are typically made available in standard
XML-based portable formats. The structures that are currently employed to
manage these data can be highly inefficient, especially when dealing with
high-throughput profile data. LC-MS datasets are usually accessed through 2D
range queries. Optimizing this type of operation could dramatically reduce the
complexity of data analysis. We propose a novel data structure for LC-MS
datasets, called mzRTree, which embodies a scalable index based on the R-tree
data structure. mzRTree can be efficiently created from the XML-based data
formats and it is suitable for handling very large datasets. We experimentally
show that, on all range queries, mzRTree outperforms other known structures
used for LC-MS data, even on those queries these structures are optimized for.
Besides, mzRTree is also more space efficient. As a result, mzRTree reduces
data analysis computational costs for very large profile datasets.Comment: Paper details: 10 pages, 7 figures, 2 tables. To be published in
Journal of Proteomics. Source code available at
http://www.dei.unipd.it/mzrtre
- …
