16,037 research outputs found
Evolutionary improvement of programs
Most applications of genetic programming (GP) involve the creation of an entirely new function, program or expression to solve a specific problem. In this paper, we propose a new approach that applies GP to improve existing software by optimizing its non-functional properties such as execution time, memory usage, or power consumption. In general, satisfying non-functional requirements is a difficult task and often achieved in part by optimizing compilers. However, modern compilers are in general not always able to produce semantically equivalent alternatives that optimize non-functional properties, even if such alternatives are known to exist: this is usually due to the limited local nature of such optimizations. In this paper, we discuss how best to combine and extend the existing evolutionary methods of GP, multiobjective optimization, and coevolution in order to improve existing software. Given as input the implementation of a function, we attempt to evolve a semantically equivalent version, in this case optimized to reduce execution time subject to a given probability distribution of inputs. We demonstrate that our framework is able to produce non-obvious optimizations that compilers are not yet able to generate on eight example functions. We employ a coevolved population of test cases to encourage the preservation of the function's semantics. We exploit the original program both through seeding of the population in order to focus the search, and as an oracle for testing purposes. As well as discussing the issues that arise when attempting to improve software, we employ rigorous experimental method to provide interesting and practical insights to suggest how to address these issues
Large Scale Clustering with Variational EM for Gaussian Mixture Models
How can we efficiently find large numbers of clusters in large data sets with
high-dimensional data points? Our aim is to explore the current efficiency and
large-scale limits in fitting a parametric model for clustering to data
distributions. To do so, we combine recent lines of research which have
previously focused on separate specific methods for complexity reduction. We
first show theoretically how the clustering objective of variational EM (which
reduces complexity for many clusters) can be combined with coreset objectives
(which reduce complexity for many data points). Secondly, we realize a concrete
highly efficient iterative procedure which combines and translates the
theoretical complexity gains of truncated variational EM and coresets into a
practical algorithm. For very large scales, the high efficiency of parameter
updates then requires (A) highly efficient coreset construction and (B) highly
efficient initialization procedures (seeding) in order to avoid computational
bottlenecks. Fortunately very efficient coreset construction has become
available in the form of light-weight coresets, and very efficient
initialization has become available in the form of AFK-MC seeding. The
resulting algorithm features balanced computational costs across all
constituting components. In applications to standard large-scale benchmarks for
clustering, we investigate the algorithm's efficiency/quality trade-off.
Compared to the best recent approaches, we observe speedups of up to one order
of magnitude, and up to two orders of magnitude compared to the -means++
baseline. To demonstrate that the observed efficiency enables previously
considered unfeasible applications, we cluster the entire and unscaled 80 Mio.
Tiny Images dataset into up to 32,000 clusters. To the knowledge of the
authors, this represents the largest scale fit of a parametric data model for
clustering reported so far
A tight lower bound instance for k-means++ in constant dimension
The k-means++ seeding algorithm is one of the most popular algorithms that is
used for finding the initial centers when using the k-means heuristic. The
algorithm is a simple sampling procedure and can be described as follows: Pick
the first center randomly from the given points. For , pick a point to
be the center with probability proportional to the square of the
Euclidean distance of this point to the closest previously chosen
centers.
The k-means++ seeding algorithm is not only simple and fast but also gives an
approximation in expectation as shown by Arthur and Vassilvitskii.
There are datasets on which this seeding algorithm gives an approximation
factor of in expectation. However, it is not clear from these
results if the algorithm achieves good approximation factor with reasonably
high probability (say ). Brunsch and R\"{o}glin gave a dataset where
the k-means++ seeding algorithm achieves an approximation ratio
with probability that is exponentially small in . However, this and all
other known lower-bound examples are high dimensional. So, an open problem was
to understand the behavior of the algorithm on low dimensional datasets. In
this work, we give a simple two dimensional dataset on which the seeding
algorithm achieves an approximation ratio with probability
exponentially small in . This solves open problems posed by Mahajan et al.
and by Brunsch and R\"{o}glin.Comment: To appear in TAMC 2014. arXiv admin note: text overlap with
arXiv:1306.420
Multi-objective improvement of software using co-evolution and smart seeding
Optimising non-functional properties of software is an important part of the implementation process. One such property is execution time, and compilers target a reduction in execution time using a variety of optimisation techniques. Compiler optimisation is not always able to produce semantically equivalent alternatives that improve execution times, even if such alternatives are known to exist. Often, this is due to the local nature of such optimisations. In this paper we present a novel framework for optimising existing software using a hybrid of evolutionary optimisation techniques. Given as input the implementation of a program or function, we use Genetic Programming to evolve a new semantically equivalent version, optimised to reduce execution time subject to a given probability distribution of inputs. We employ a co-evolved population of test cases to encourage the preservation of the program’s semantics, and exploit the original program through seeding of the population in order to focus the search. We carry out experiments to identify the important factors in maximising efficiency gains. Although in this work we have optimised execution time, other non-functional criteria could be optimised in a similar manner
Knowledge management support for enterprise distributed systems
Explosion of information and increasing demands on semantic processing web applications have software systems to their limits. To address the problem we propose a semantic based formal framework (ADP) that makes use of promising technologies to enable knowledge generation and retrieval. We argue that this approach is cost effective, as it reuses and builds on existing knowledge and structure. It is also a good starting point for creating an organisational memory and providing knowledge management functions
- …