Search CORE

8,146,173 research outputs found

Darwinian Data Structure Selection

Author: Basios M.
Binder R. V.
Dan H.
Li L.
Li L.
Li L.
Nagel F.
Nanavati J.
Petke J.
Vlissides J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2018
Field of study

Data structure selection and tuning is laborious but can vastly improve an application's performance and memory footprint. Some data structures share a common interface and enjoy multiple implementations. We call them Darwinian Data Structures (DDS), since we can subject their implementations to survival of the fittest. We introduce ARTEMIS a multi-objective, cloud-based search-based optimisation framework that automatically finds optimal, tuned DDS modulo a test suite, then changes an application to use that DDS. ARTEMIS achieves substantial performance improvements for \emph{every} project in

5

Java projects from DaCapo benchmark,

8

popular projects and

30

uniformly sampled projects from GitHub. For execution time, CPU usage, and memory consumption, ARTEMIS finds at least one solution that improves \emph{all} measures for

86\%

(

37/43

) of the projects. The median improvement across the best solutions is

4.8\%

10.1\%

5.1\%

for runtime, memory and CPU usage. These aggregate results understate ARTEMIS's potential impact. Some of the benchmarks it improves are libraries or utility functions. Two examples are gson, a ubiquitous Java serialization framework, and xalan, Apache's XML transformation tool. ARTEMIS improves gson by

16.5

\%,

1\%

and

2.2\%

for memory, runtime, and CPU; ARTEMIS improves xalan's memory consumption by

23.5

\%. \emph{Every} client of these projects will benefit from these performance improvements.Comment: 11 page

arXiv.org e-Print Archive

Crossref

UCL Discovery

Recommended from our members

Structure identification in relational data

Author: Dechter Rina
Pearl Judea
Publication venue: eScholarship, University of California
Publication date: 08/07/1992
Field of study

This paper presents several investigations into the prospects for identifying meaningful structures in empirical data, namely, structures permitting effective organization of the data to meet requirements of future queries. We propose a general framework whereby the notion of identifiability is given a precise formal definition similar to that of learnability. Using this framework, we then explore if a tractable procedure exists for deciding whether a given relation is decomposable into a constraint network or a CNF theory with desirable topology and, if the answer is positive, identifying the desired decomposition. Finally, we address the problem of expressing a given relation as a Horn theory and, if this is impossible, finding the best k-Horn approximation to the given relation. We show that both problems can be solved in time polynomial in the length of the data

eScholarship - University of California

Structure Selection from Streaming Relational Data

Author: Mihalkova Lilyana
Moustafa Walaa Eldin
Publication venue
Publication date: 01/01/2011
Field of study

Statistical relational learning techniques have been successfully applied in a wide range of relational domains. In most of these applications, the human designers capitalized on their background knowledge by following a trial-and-error trajectory, where relational features are manually defined by a human engineer, parameters are learned for those features on the training data, the resulting model is validated, and the cycle repeats as the engineer adjusts the set of features. This paper seeks to streamline application development in large relational domains by introducing a light-weight approach that efficiently evaluates relational features on pieces of the relational graph that are streamed to it one at a time. We evaluate our approach on two social media tasks and demonstrate that it leads to more accurate models that are learned faster

arXiv.org e-Print Archive

CiteSeerX

The Skip Quadtree: A Simple Dynamic Data Structure for Multidimensional Data

Author: Eppstein David
Goodrich Michael T.
Sun Jonathan Z.
Publication venue
Publication date: 01/01/2005
Field of study

We present a new multi-dimensional data structure, which we call the skip quadtree (for point data in R^2) or the skip octree (for point data in R^d, with constant d>2). Our data structure combines the best features of two well-known data structures, in that it has the well-defined "box"-shaped regions of region quadtrees and the logarithmic-height search and update hierarchical structure of skip lists. Indeed, the bottom level of our structure is exactly a region quadtree (or octree for higher dimensional data). We describe efficient algorithms for inserting and deleting points in a skip quadtree, as well as fast methods for performing point location and approximate range queries.Comment: 12 pages, 3 figures. A preliminary version of this paper appeared in the 21st ACM Symp. Comp. Geom., Pisa, 2005, pp. 296-30

arXiv.org e-Print Archive

CiteSeerX

Static Data Structure Lower Bounds Imply Rigidity

Author: Agarwal Pankaj K.
Deterministic
Higher
Lower
Lower
Miltersen Peter Bro
Noisy
Valiant Leslie G.
van Emde Boas Peter
şcu Mihai P
Publication venue
Publication date: 13/02/2019
Field of study

We show that static data structure lower bounds in the group (linear) model imply semi-explicit lower bounds on matrix rigidity. In particular, we prove that an explicit lower bound of

t \geq \omega(\log^2 n)

on the cell-probe complexity of linear data structures in the group model, even against arbitrarily small linear space

(s= (1+\varepsilon)n)

, would already imply a semi-explicit (

\bf P^{NP}\rm

) construction of rigid matrices with significantly better parameters than the current state of art (Alon, Panigrahy and Yekhanin, 2009). Our results further assert that polynomial (

t\geq n^{\delta}

) data structure lower bounds against near-optimal space, would imply super-linear circuit lower bounds for log-depth linear circuits (a four-decade open question). In the succinct space regime

(s=n+o(n))

, we show that any improvement on current cell-probe lower bounds in the linear model would also imply new rigidity bounds. Our results rely on a new connection between the "inner" and "outer" dimensions of a matrix (Paturi and Pudlak, 2006), and on a new reduction from worst-case to average-case rigidity, which is of independent interest

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

H1 Diffractive Structure Functions Measurement from new data

Author: Sauvan E.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2006
Field of study

New measurements of the reduced cross section

\sigma_r^{D(3)}

for the diffractive process

ep \to eXY

in the kinematic domain

12 \leq Q^2 \leq 90

GeV

^2

0.01 \leq \beta \leq 0.65

and \xpom<0.1 are presented. Data events recorded by the H1 detector during the years 1999--2000 and 2004 have been used, corresponding to a total integrated luminosity of 68 pb

^{-1}

. The measurements are derived in the same range as previous H1 data, namely

M_Y < 1.6

GeV and

|t| < 1.0

GeV

^2

. Two different analysis methods, rapidity gap and

M_X

, are used and similar results are obtained in the kinematic domain of overlap. Finally, together with previous data, the diffractive structure function measurements are analysed with a model based on the dipole formulation of diffractive scattering. It is found to give a very good description of the data over the whole kinematic range.Comment: 4 pages, 4 figure; To appear in the proceedings of 14th International Workshop on Deep Inelastic Scattering (DIS 2006), Tsukuba, Japan, 20-24 Apr 200

arXiv.org e-Print Archive

An Optimized Data Structure for High Throughput 3D Proteomics Data: mzRTree

Author: Aebersold
Andrea Pietracaprina
Barbara Di Camillo
Deutsch
Francesco Silvestri
Francesco Tisiot
Gianna Maria Toffolo
Guttman
Hartler
Khan
Kyriacos
Lin
Martens
Orchard
Sara Nasso
Schulz-Trieglaff
Taylor
Vitter
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

As an emerging field, MS-based proteomics still requires software tools for efficiently storing and accessing experimental data. In this work, we focus on the management of LC-MS data, which are typically made available in standard XML-based portable formats. The structures that are currently employed to manage these data can be highly inefficient, especially when dealing with high-throughput profile data. LC-MS datasets are usually accessed through 2D range queries. Optimizing this type of operation could dramatically reduce the complexity of data analysis. We propose a novel data structure for LC-MS datasets, called mzRTree, which embodies a scalable index based on the R-tree data structure. mzRTree can be efficiently created from the XML-based data formats and it is suitable for handling very large datasets. We experimentally show that, on all range queries, mzRTree outperforms other known structures used for LC-MS data, even on those queries these structures are optimized for. Besides, mzRTree is also more space efficient. As a result, mzRTree reduces data analysis computational costs for very large profile datasets.Comment: Paper details: 10 pages, 7 figures, 2 tables. To be published in Journal of Proteomics. Source code available at http://www.dei.unipd.it/mzrtre

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova