Search CORE

16,037 research outputs found

Evolutionary improvement of programs

Author: Arcuri A.
Clark J.A.
White D.R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2011
Field of study

Most applications of genetic programming (GP) involve the creation of an entirely new function, program or expression to solve a specific problem. In this paper, we propose a new approach that applies GP to improve existing software by optimizing its non-functional properties such as execution time, memory usage, or power consumption. In general, satisfying non-functional requirements is a difficult task and often achieved in part by optimizing compilers. However, modern compilers are in general not always able to produce semantically equivalent alternatives that optimize non-functional properties, even if such alternatives are known to exist: this is usually due to the limited local nature of such optimizations. In this paper, we discuss how best to combine and extend the existing evolutionary methods of GP, multiobjective optimization, and coevolution in order to improve existing software. Given as input the implementation of a function, we attempt to evolve a semantically equivalent version, in this case optimized to reduce execution time subject to a given probability distribution of inputs. We demonstrate that our framework is able to produce non-obvious optimizations that compilers are not yet able to generate on eight example functions. We employ a coevolved population of test cases to encourage the preservation of the function's semantics. We exploit the original program both through seeding of the population in order to focus the search, and as an oracle for testing purposes. As well as discussing the issues that arise when attempting to improve software, we employ rigorous experimental method to provide interesting and practical insights to suggest how to address these issues

Enlighten

Large Scale Clustering with Variational EM for Gaussian Mixture Models

Author: Forster Dennis
Hirschberger Florian
Lücke Jörg
Publication venue
Publication date: 07/06/2019
Field of study

How can we efficiently find large numbers of clusters in large data sets with high-dimensional data points? Our aim is to explore the current efficiency and large-scale limits in fitting a parametric model for clustering to data distributions. To do so, we combine recent lines of research which have previously focused on separate specific methods for complexity reduction. We first show theoretically how the clustering objective of variational EM (which reduces complexity for many clusters) can be combined with coreset objectives (which reduce complexity for many data points). Secondly, we realize a concrete highly efficient iterative procedure which combines and translates the theoretical complexity gains of truncated variational EM and coresets into a practical algorithm. For very large scales, the high efficiency of parameter updates then requires (A) highly efficient coreset construction and (B) highly efficient initialization procedures (seeding) in order to avoid computational bottlenecks. Fortunately very efficient coreset construction has become available in the form of light-weight coresets, and very efficient initialization has become available in the form of AFK-MC

^2

seeding. The resulting algorithm features balanced computational costs across all constituting components. In applications to standard large-scale benchmarks for clustering, we investigate the algorithm's efficiency/quality trade-off. Compared to the best recent approaches, we observe speedups of up to one order of magnitude, and up to two orders of magnitude compared to the

k

-means++ baseline. To demonstrate that the observed efficiency enables previously considered unfeasible applications, we cluster the entire and unscaled 80 Mio. Tiny Images dataset into up to 32,000 clusters. To the knowledge of the authors, this represents the largest scale fit of a parametric data model for clustering reported so far

arXiv.org e-Print Archive

A tight lower bound instance for k-means++ in constant dimension

Author: A. Aggarwal
B. Bahmani
D. Arthur
D. Arthur
M. Agarwal
M.R. Ackermann
R. Jaiswal
Publication venue
Publication date: 01/01/2014
Field of study

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial

k

centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from the given points. For

i > 1

, pick a point to be the

i^{th}

center with probability proportional to the square of the Euclidean distance of this point to the closest previously

(i-1)

chosen centers. The k-means++ seeding algorithm is not only simple and fast but also gives an

O(\log{k})

approximation in expectation as shown by Arthur and Vassilvitskii. There are datasets on which this seeding algorithm gives an approximation factor of

\Omega(\log{k})

in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably high probability (say

1/poly(k)

). Brunsch and R\"{o}glin gave a dataset where the k-means++ seeding algorithm achieves an

O(\log{k})

approximation ratio with probability that is exponentially small in

k

. However, this and all other known lower-bound examples are high dimensional. So, an open problem was to understand the behavior of the algorithm on low dimensional datasets. In this work, we give a simple two dimensional dataset on which the seeding algorithm achieves an

O(\log{k})

approximation ratio with probability exponentially small in

k

. This solves open problems posed by Mahajan et al. and by Brunsch and R\"{o}glin.Comment: To appear in TAMC 2014. arXiv admin note: text overlap with arXiv:1306.420

arXiv.org e-Print Archive

CiteSeerX

Crossref

Multi-objective improvement of software using co-evolution and smart seeding

Author: Abbass H
Arcuri Andrea
Branke JR
Ciesielski V
Clark J
Deb K
Green D
Hendtlass T
Kirley M
Li X
Michalewicz Z
Shi Y
Tan KC
White DR
Yao Xin
Zhang M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Optimising non-functional properties of software is an important part of the implementation process. One such property is execution time, and compilers target a reduction in execution time using a variety of optimisation techniques. Compiler optimisation is not always able to produce semantically equivalent alternatives that improve execution times, even if such alternatives are known to exist. Often, this is due to the local nature of such optimisations. In this paper we present a novel framework for optimising existing software using a hybrid of evolutionary optimisation techniques. Given as input the implementation of a program or function, we use Genetic Programming to evolve a new semantically equivalent version, optimised to reduce execution time subject to a given probability distribution of inputs. We employ a co-evolved population of test cases to encourage the preservation of the program’s semantics, and exploit the original program through seeding of the population in order to focus the search. We carry out experiments to identify the important factors in maximising efficiency gains. Although in this work we have optimised execution time, other non-functional criteria could be optimised in a similar manner

CiteSeerX

University of Birmingham Research Portal

Enlighten

Knowledge management support for enterprise distributed systems

Author: Chen-Burger Jessica
Kalfoglou Yannis
Publication venue: Information Science Reference
Publication date: 01/01/2008
Field of study

Explosion of information and increasing demands on semantic processing web applications have software systems to their limits. To address the problem we propose a semantic based formal framework (ADP) that makes use of promising technologies to enable knowledge generation and retrieval. We argue that this approach is cost effective, as it reuses and builds on existing knowledge and structure. It is also a good starting point for creating an organisational memory and providing knowledge management functions

Southampton (e-Prints Soton)