6 research outputs found
Automated Software Transplantation
Automated program repair has excited researchers for more than a decade, yet it has yet to find full scale deployment in industry. We report our experience with SAPFIX: the first deployment of automated end-to-end fault fixing, from test case design through to deployed repairs in production code. We have used SAPFIX at Facebook to repair 6 production systems, each consisting of tens of millions of lines of code, and which are collectively used by hundreds of millions of people worldwide. In its first three months of operation, SAPFIX produced 55 repair candidates for 57 crashes reported to SAPFIX, of which 27 have been deem as correct by developers and 14 have been landed into production automatically by SAPFIX. SAPFIX has thus demonstrated the potential of the search-based repair research agenda by deploying, to hundreds of millions of users worldwide, software systems that have been automatically tested and repaired. Automated software transplantation (autotransplantation) is a form of automated software engineering, where we use search based software engineering to be able to automatically move a functionality of interest from a âdonorâ program that implements it into a âhostâ program that lacks it. Autotransplantation is a kind of automated program repair where we repair the âhostâ program by augmenting it with the missing functionality. Automated software transplantation would open many exciting avenues for software development: suppose we could autotransplant code from one system into another, entirely unrelated, system, potentially written in a different programming language. Being able to do so might greatly enhance the software engineering practice, while reducing the costs. Automated software transplantation manifests in two different flavors: monolingual, when the languages of the host and donor programs is the same, or multilingual when the languages differ. This thesis introduces a theory of automated software transplantation, and two algorithms implemented in two tools that achieve this: ”SCALPEL for monolingual software transplantation and ÏSCALPEL for multilingual software transplantation. Leveraging lightweight annotation, program analysis identifies an organ (interesting behavior to transplant); testing validates that the organ exhibits the desired behavior during its extraction and after its implantation into a host. We report encouraging results: in 14 of 17 monolingual transplantation experiments involving 6 donors and 4 hosts, popular real-world systems, we successfully autotransplanted 6 new functionalities; and in 10 out of 10 multlingual transplantation experiments involving 10 donors and 10 hosts, popular real-world systems written in 4 different programming languages, we successfully autotransplanted 10 new functionalities. That is, we have passed all the test suites that validates the new functionalities behaviour and the fact that the initial program behaviour is preserved. Additionally, we have manually checked the behaviour exercised by the organ. Autotransplantation is also very useful: in just 26 hours computation time we successfully autotransplanted the H.264 video encoding functionality from the x264 system to the VLC media player, a task that is currently done manually by the developers of VLC, since 12 years ago. We autotransplanted call graph generation and indentation for C programs into Kate, (a popular KDE based test editor used as an IDE by a lot of C developers) two features currently missing from Kate, but requested by the users of Kate. Autotransplantation is also efficient: the total runtime across 15 monolingual transplants is 5 hours and a half; the total runtime across 10 multilingual transplants is 33 hours
Efficient clustering techniques for big data
Clustering is an essential data mining technique that divides observations into
groups where each group contains similar observations. K-Means is one of the
most popular and widely used clustering algorithms that has been used for over
fifty years. The majority of the running time in the original K-Means algorithm
(known as Lloydâs algorithm) is spent on computing distances from each data
point to all cluster centres to find the closest centre to each data point. Due to
the current exponential growth of the data, it became a necessity to improve KMeans
even further to cope with large-scale datasets, known as Big Data. Hence,
the main aim of this thesis is to improve the efficiency and scalability of Lloydâs
K-Means.
One of the most efficient techniques to accelerate K-Means is to use triangle
inequality. Implementing such efficient techniques on a reliable distributed model
creates a powerful combination. This combination can lead to an efficient and
highly scalable parallel version of K-Means that offers a practical solution to the
problem of clustering Big Data.
MapReduce, and its popular open-source implementation known as Hadoop,
provides a distributed computing framework that efficiently stores, manages, and
processes large-scale datasets over a large cluster of commodity machines. Many
studies introduced a parallel implementation of Lloydâs K-Means on Hadoop in
order to improve the algorithmâs scalability. This research examines methods
based on triangle inequality to achieve further improvements on the efficiency of
the parallel Lloydâs K-Means on Hadoop.
Variants of K-Means that use triangle inequality usually require extra information,
such as distance bounds and cluster assignments, from the previous iteration
to work efficiently. This is a challenging task to achieve on Hadoop for two reasons:
1) Hadoop does not directly support iterative algorithms; and 2) Hadoop does not
allow information to be exchanged between two consecutive iterations. Hence, two
techniques are proposed to give Hadoop the ability to pass information from an
iteration to the next. The first technique uses a data structure referred to as an
Extended Vector (EV), that appends the extra information to the original data
vector. The second technique stores the extra information on files where each file
is referred to as a Bounds File (BF).
To evaluate the two proposed techniques, two K-Means variants are implemented
on Hadoop using the two techniques. Each variant is tested against variable
number of clusters, dimensions, data points, and mappers. Furthermore, the
performance of various implementations of K-Means on Hadoop and Spark is investigated.
The results show a significant improvement on the efficiency of the
new implementations compared to the Lloydâs K-Means on Hadoop with real and
artificial datasets
Recommended from our members
Federal Register
Daily publication of the U.S. Office of the Federal Register contains rules and regulations, proposed legislation and rule changes, and other notices, including "Presidential proclamations and Executive Orders, Federal agency documents having general applicability and legal effect, documents required to be published by act of Congress, and other Federal agency documents of public interest" (p. ii). Table of Contents starts on page iii
Proceedings of the WABER 2017 Conference
The scientific information published in peer-reviewed outlets carries special status, and
confers unique responsibilities on editors and authors. We must protect the integrity of the
scientific process by publishing only manuscripts that have been properly peer-reviewed by
scientific reviewers and confirmed by editors to be of sufficient quality.
I confirm that all papers in the WABER 2017 Conference Proceedings have been through a
peer review process involving initial screening of abstracts, review of full papers by at least
two referees, reporting of comments to authors, revision of papers by authors, and reevaluation
of re-submitted papers to ensure quality of content.
It is the policy of the West Africa Built Environment Research (WABER) Conference that all
papers must go through a systematic peer review process involving examination by at least
two referees who are knowledgeable on the subject. A paper is only accepted for publication
in the conference proceedings based on the recommendation of the reviewers and decision of
the editors.
The names and affiliation of members of the Scientific Committee & Review Panel for
WABER 2017 Conference are published in the Conference Proceedings and on our website
www.waberconference.com
Papers in the WABER Conference Proceedings are published open access on the conference
website www.waberconference.com to facilitate public access to the research papers and
wider dissemination of the scientific knowledge