Search CORE

5,010,766 research outputs found

Darwinian Data Structure Selection

Author: Basios M.
Binder R. V.
Dan H.
Li L.
Li L.
Li L.
Nagel F.
Nanavati J.
Petke J.
Vlissides J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2018
Field of study

Data structure selection and tuning is laborious but can vastly improve an application's performance and memory footprint. Some data structures share a common interface and enjoy multiple implementations. We call them Darwinian Data Structures (DDS), since we can subject their implementations to survival of the fittest. We introduce ARTEMIS a multi-objective, cloud-based search-based optimisation framework that automatically finds optimal, tuned DDS modulo a test suite, then changes an application to use that DDS. ARTEMIS achieves substantial performance improvements for \emph{every} project in

5

Java projects from DaCapo benchmark,

8

popular projects and

30

uniformly sampled projects from GitHub. For execution time, CPU usage, and memory consumption, ARTEMIS finds at least one solution that improves \emph{all} measures for

86\%

(

37/43

) of the projects. The median improvement across the best solutions is

4.8\%

10.1\%

5.1\%

for runtime, memory and CPU usage. These aggregate results understate ARTEMIS's potential impact. Some of the benchmarks it improves are libraries or utility functions. Two examples are gson, a ubiquitous Java serialization framework, and xalan, Apache's XML transformation tool. ARTEMIS improves gson by

16.5

\%,

1\%

and

2.2\%

for memory, runtime, and CPU; ARTEMIS improves xalan's memory consumption by

23.5

\%. \emph{Every} client of these projects will benefit from these performance improvements.Comment: 11 page

arXiv.org e-Print Archive

Crossref

UCL Discovery

Structure Selection from Streaming Relational Data

Author: Mihalkova Lilyana
Moustafa Walaa Eldin
Publication venue
Publication date: 01/01/2011
Field of study

Statistical relational learning techniques have been successfully applied in a wide range of relational domains. In most of these applications, the human designers capitalized on their background knowledge by following a trial-and-error trajectory, where relational features are manually defined by a human engineer, parameters are learned for those features on the training data, the resulting model is validated, and the cycle repeats as the engineer adjusts the set of features. This paper seeks to streamline application development in large relational domains by introducing a light-weight approach that efficiently evaluates relational features on pieces of the relational graph that are streamed to it one at a time. We evaluate our approach on two social media tasks and demonstrate that it leads to more accurate models that are learned faster

arXiv.org e-Print Archive

CiteSeerX

Massively-Parallel Feature Selection for Big Data

Author: Borboudakis Giorgos
Christophides Vassilis
Katsogridakis Pavlos
Pratikakis Polyvios
Tsamardinos Ioannis
Publication venue
Publication date: 23/08/2017
Field of study

We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of

p

-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Information-based objective functions for active data selection

Author: MacKay David J. C.
Publication venue: 'MIT Press - Journals'
Publication date: 01/07/1992
Field of study

Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed that measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain information about lead to three different criteria for data selection. All these criteria depend on the assumption that the hypothesis space is correct, which may prove to be their main weakness

Caltech Authors

Feature subset selection and ranking for data dimensionality reduction

Author: Billings S.A.
Wei H.L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

A new unsupervised forward orthogonal search (FOS) algorithm is introduced for feature selection and ranking. In the new algorithm, features are selected in a stepwise way, one at a time, by estimating the capability of each specified candidate feature subset to represent the overall features in the measurement space. A squared correlation function is employed as the criterion to measure the dependency between features and this makes the new algorithm easy to implement. The forward orthogonalization strategy, which combines good effectiveness with high efficiency, enables the new algorithm to produce efficient feature subsets with a clear physical interpretation

Crossref

White Rose Research Online