626 research outputs found
An agent-driven semantical identifier using radial basis neural networks and reinforcement learning
Due to the huge availability of documents in digital form, and the deception
possibility raise bound to the essence of digital documents and the way they
are spread, the authorship attribution problem has constantly increased its
relevance. Nowadays, authorship attribution,for both information retrieval and
analysis, has gained great importance in the context of security, trust and
copyright preservation. This work proposes an innovative multi-agent driven
machine learning technique that has been developed for authorship attribution.
By means of a preprocessing for word-grouping and time-period related analysis
of the common lexicon, we determine a bias reference level for the recurrence
frequency of the words within analysed texts, and then train a Radial Basis
Neural Networks (RBPNN)-based classifier to identify the correct author. The
main advantage of the proposed approach lies in the generality of the semantic
analysis, which can be applied to different contexts and lexical domains,
without requiring any modification. Moreover, the proposed system is able to
incorporate an external input, meant to tune the classifier, and then
self-adjust by means of continuous learning reinforcement.Comment: Published on: Proceedings of the XV Workshop "Dagli Oggetti agli
Agenti" (WOA 2014), Catania, Italy, Sepember. 25-26, 201
Using Clustering Techniques to Guide Refactoring of Object-Oriented Classes
Much of the cost of software development is maintenance. Well structured
software tends to be cheaper to maintain than poorly structured software, because
it is easier to analyze and modify. The research described in this thesis concentrates
on determining how to improve the structure of object-oriented classes, the
fundamental unit of organization for object-oriented programs.
Some refactoring tools can mechanically restructure object-oriented classes,
given the appropriate inputs regarding what attributes and methods belong in the
revised classes. We address the research question of determining what belongs in
those classes, i.e., determining which methods and attributes most belong together
and how those methods and attributes can be organized into classes. Clustering
techniques can be useful for grouping entities that belong together; however, doing
so requires matching an appropriate algorithm to the domain task and choosing
appropriate inputs.
This thesis identifies clustering techniques suitable for determining the
redistribution of existing attributes and methods among object-oriented classes,
and discusses the strengths and weaknesses of these techniques. It then describes
experiments using these techniques as the basis for refactoring open source Java
classes and the changes in the class quality metrics that resulted. Based on these
results and on others reported in the literature, it recommends particular clustering
techniques for particular refactoring problems.
These clustering techniques have been incorporated into an open source
refactoring tool that provides low-cost assistance to programmers maintaining
object-oriented classes. Such maintenance can reduce the total cost of software
development
30 Years of Software Refactoring Research: A Systematic Literature Review
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/155872/4/30YRefactoring.pd
30 Years of Software Refactoring Research:A Systematic Literature Review
Due to the growing complexity of software systems, there has been a dramatic
increase and industry demand for tools and techniques on software refactoring
in the last ten years, defined traditionally as a set of program
transformations intended to improve the system design while preserving the
behavior. Refactoring studies are expanded beyond code-level restructuring to
be applied at different levels (architecture, model, requirements, etc.),
adopted in many domains beyond the object-oriented paradigm (cloud computing,
mobile, web, etc.), used in industrial settings and considered objectives
beyond improving the design to include other non-functional requirements (e.g.,
improve performance, security, etc.). Thus, challenges to be addressed by
refactoring work are, nowadays, beyond code transformation to include, but not
limited to, scheduling the opportune time to carry refactoring, recommendations
of specific refactoring activities, detection of refactoring opportunities, and
testing the correctness of applied refactorings. Therefore, the refactoring
research efforts are fragmented over several research communities, various
domains, and objectives. To structure the field and existing research results,
this paper provides a systematic literature review and analyzes the results of
3183 research papers on refactoring covering the last three decades to offer
the most scalable and comprehensive literature review of existing refactoring
research studies. Based on this survey, we created a taxonomy to classify the
existing research, identified research trends, and highlighted gaps in the
literature and avenues for further research.Comment: 23 page
A Model Driven Approach for Refactoring Heterogeneous Software Artefacts
Refactoring is the process of transforming a software system to improve its overall structure while
preserving its observable behaviour. Refactoring engines are normally used to perform these transformations
for efficiency and in order to avoid introducing behavioural changes into the program
due to human error. Although these engines do not verify that behaviour is preserved, it is widely
accepted that automated transformations are less likely to introduce errors in comparison to manual
refactoring. Despite the advantages provided by refactoring engines they fall foul of certain
weaknesses.
Here we hypothesise that Model Driven Engineering can be used to produce improved refactoring
engines that are less vulnerable to those weaknesses. We develop a Domain Specific Transformation
Language for defining new composite refactorings from a set of built–in primitives and
to script their application. We also develop an interpreter for the language, effectively providing
an operational semantics, in the guise of an extensible transformation framework. We evaluate our
approach with a case study examining the correlation between actual and predicted measurements
of the Coupling Between Objects metric for classes that undergo the extract class refactoring. The
results show that our approach is promising
Class-Level Refactoring Prediction by Ensemble Learning with Various Feature Selection Techniques
Background: Refactoring is changing a software system without affecting the software functionality. The current researchers aim i to identify the appropriate method(s) or class(s) that needs to be refactored in object-oriented software. Ensemble learning helps to reduce prediction errors by amalgamating different classifiers and their respective performances over the original feature data. Other motives are added in this paper regarding several ensemble learners, errors, sampling techniques, and feature selection techniques for refactoring prediction at the class level. Objective: This work aims to develop an ensemble-based refactoring prediction model with structural identification of source code metrics using different feature selection techniques and data sampling techniques to distribute the data uniformly. Our model finds the best classifier after achieving fewer errors during refactoring prediction at the class level. Methodology: At first, our proposed model extracts a total of 125 software metrics computed from object-oriented software systems processed for a robust multi-phased feature selection method encompassing Wilcoxon significant text, Pearson correlation test, and principal component analysis (PCA). The proposed multi-phased feature selection method retains the optimal features characterizing inheritance, size, coupling, cohesion, and complexity. After obtaining the optimal set of software metrics, a novel heterogeneous ensemble classifier is developed using techniques such as ANN-Gradient Descent, ANN-Levenberg Marquardt, ANN-GDX, ANN-Radial Basis Function; support vector machine with different kernel functions such as LSSVM-Linear, LSSVM-Polynomial, LSSVM-RBF, Decision Tree algorithm, Logistic Regression algorithm and extreme learning machine (ELM) model are used as the base classifier. In our paper, we have calculated four different errors i.e., Mean Absolute Error (MAE), Mean magnitude of Relative Error (MORE), Root Mean Square Error (RMSE), and Standard Error of Mean (SEM). Result: In our proposed model, the maximum voting ensemble (MVE) achieves better accuracy, recall, precision, and F-measure values (99.76, 99.93, 98.96, 98.44) as compared to the base trained ensemble (BTE) and it experiences less errors (MAE = 0.0057, MORE = 0.0701, RMSE = 0.0068, and SEM = 0.0107) during its implementation to develop the refactoring model. Conclusions: Our experimental result recommends that MVE with upsampling can be implemented to improve the performance of the refactoring prediction model at the class level. Furthermore, the performance of our model with different data sampling techniques and feature selection techniques has been shown in the form boxplot diagram of accuracy, F-measure, precision, recall, and area under the curve (AUC) parameters.publishedVersio
Dealing with clones in software : a practical approach from detection towards management
Despite the fact that duplicated fragments of code also called code clones are considered one of the prominent code smells that may exist in software, cloning is widely practiced in industrial development. The larger the system, the more people involved in its development and the more parts developed by different teams result in an increased possibility of having cloned code in the system. While there are particular benefits of code cloning in software development, research shows that it might be a source of various troubles in evolving software. Therefore, investigating and understanding clones in a software system is important to manage the clones efficiently. However, when the system is fairly large, it is challenging to identify and manage those clones properly. Among the various types of clones that may exist in software, research shows detection of near-miss clones where there might be minor to significant differences (e.g., renaming of identifiers and additions/deletions/modifications of statements) among the cloned fragments is costly in terms of time and memory. Thus, there is a great demand of state-of-the-art technologies in dealing with clones in software.
Over the years, several tools have been developed to detect and visualize exact and similar clones. However, usually the tools are standalone and do not integrate well with a software developer's workflow. In this thesis, first, a study is presented on the effectiveness of a fingerprint based data similarity measurement technique named 'simhash' in detecting clones in large scale code-base. Based on the positive outcome of the study, a time efficient detection approach is proposed to find exact and near-miss clones in software, especially in large scale software systems. The novel detection approach has been made available as a highly configurable and fully fledged standalone clone detection tool named 'SimCad', which can be configured for detection of clones in both source code and non-source code based data. Second, we show a robust use of the clone detection approach studied earlier by assembling its detection service as a portable library named 'SimLib'. This library can provide tightly coupled (integrated) clone detection functionality to other applications as opposed to loosely coupled service provided by a typical standalone tool. Because of being highly configurable and easily extensible, this library allows the user to customize its clone detection process for detecting clones in data having diverse characteristics. We performed a user study to get some feedback on installation and use of the 'SimLib' API (Application Programming Interface) and to uncover its potential use as a third-party clone detection library. Third, we investigated on what tools and techniques are currently in use to detect and manage clones and understand their evolution. The goal was to find how those tools and techniques can be made available to a developer's own software development platform for convenient identification, tracking and management of clones in the software. Based on that, we developed a clone-aware software development platform named 'SimEclipse' to promote the practical use of code clone research and to provide better support for clone management in software. Finally, we evaluated 'SimEclipse' by conducting a user study on its effectiveness, usability and information management. We believe that both researchers and developers would enjoy and utilize the benefit of using these tools in different aspect of code clone research and manage cloned code in software systems
- …