25 research outputs found
RELIABLE COGNITIVE DIMENSIONAL DOCUMENT RANKING BY WEIGHTED STANDARD CAUCHY DISTRIBUTION
Categorization of cognitively uniform and consistent documents such as University question papers are in demand by e-learners. Literature indicates that Standard Cauchy distribution and the derived values are extensively used for checking uniformity and consistency of documents. The paper attempts to apply this technique for categorizing question papers according to four selective cognitive dimensions. For this purpose cognitive dimensional keyword sets of these four categories (also termed as portrayal concepts) are assumed and an automatic procedure is developed to quantify these dimensions in question papers. The categorization is relatively accurate when checked with manual methods. Hence simple and well established term frequency / inverse document frequency âtf/ IDFâ technique is considered for automating the categorization process. After the documents categorization, standard Cauchy formula is applied to rank order the documents that have the least differences among Cauchy value, (according to Cauchy theorem) so as obtain consistent and uniform documents in an order or ranked. For the purpose of experiments and social survey, seven question papers (documents) have been designed with various consistencies. To validate this proposed technique social survey is administered on selective samples of e-learners of Tamil Nadu, India. Results are encouraging and conclusions drawn out of the experiments will be useful to researchers of concept mining and categorizing documents according to concepts. Findings have also contributed utility value to e-learning system designers
Learning natural coding conventions
Coding conventions are ubiquitous in software engineering practice. Maintaining a uniform
coding style allows software development teams to communicate through code by
making the code clear and, thus, readable and maintainableâtwo important properties
of good code since developers spend the majority of their time maintaining software
systems. This dissertation introduces a set of probabilistic machine learning models
of source code that learn coding conventions directly from source code written in a
mostly conventional style. This alleviates the coding convention enforcement problem,
where conventions need to first be formulated clearly into unambiguous rules and then
be coded in order to be enforced; a tedious and costly process.
First, we introduce the problem of inferring a variableâs name given its usage context
and address this problem by creating Naturalize â a machine learning framework
that learns to suggest conventional variable names. Two machine learning models, a
simple n-gram language model and a specialized neural log-bilinear context model are
trained to understand the role and function of each variable and suggest new stylistically
consistent variable names. The neural log-bilinear model can even suggest previously
unseen names by composing them from subtokens (i.e. sub-components of code identifiers).
The suggestions of the models achieve 90% accuracy when suggesting variable
names at the top 20% most confident locations, rendering the suggestion system usable
in practice.
We then turn our attention to the significantly harder method naming problem.
Learning to name methods, by looking only at the code tokens within their body, requires
a good understating of the semantics of the code contained in a single method.
To achieve this, we introduce a novel neural convolutional attention network that learns
to generate the name of a method by sequentially predicting its subtokens. This is
achieved by focusing on different parts of the code and potentially directly using body
(sub)tokens even when they have never been seen before. This model achieves an F1
score of 51% on the top five suggestions when naming methods of real-world open-source
projects.
Learning about naming code conventions uses the syntactic structure of the code
to infer names that implicitly relate to code semantics. However, syntactic similarities
and differences obscure code semantics. Therefore, to capture features of semantic
operations with machine learning, we need methods that learn semantic continuous
logical representations. To achieve this ambitious goal, we focus our investigation on
logic and algebraic symbolic expressions and design a neural equivalence network architecture
that learns semantic vector representations of expressions in a syntax-driven
way, while solely retaining semantics. We show that equivalence networks learn significantly
better semantic vector representations compared to other, existing, neural
network architectures.
Finally, we present an unsupervised machine learning model for mining syntactic
and semantic code idioms. Code idioms are conventional âmental chunksâ of code that
serve a single semantic purpose and are commonly used by practitioners. To achieve
this, we employ Bayesian nonparametric inference on tree substitution grammars. We
present a wide range of evidence that the resulting syntactic idioms are meaningful,
demonstrating that they do indeed recur across software projects and that they occur
more frequently in illustrative code examples collected from a Q&A site. These syntactic
idioms can be used as a form of automatic documentation of coding practices
of a programming language or an API. We also mine semantic loop idioms, i.e. highly
abstracted but semantic-preserving idioms of loop operations. We show that semantic
idioms provide data-driven guidance during the creation of software engineering tools
by mining common semantic patterns, such as candidate refactoring locations. This
gives data-based evidence to tool, API and language designers about general, domain
and project-specific coding patterns, who instead of relying solely on their intuition, can
use semantic idioms to achieve greater coverage of their tool or new API or language
feature. We demonstrate this by creating a tool that suggests loop refactorings into
functional constructs in LINQ. Semantic loop idioms also provide data-driven evidence
for introducing new APIs or programming language features
Search-Based Software Maintenance and Testing
2012 - 2013In software engineering there are many expensive tasks that are performed during development
and maintenance activities. Therefore, there has been a lot of e ort to try to automate these
tasks in order to signi cantly reduce the development and maintenance cost of software, since
the automation would require less human resources. One of the most used way to make such
an automation is the Search-Based Software Engineering (SBSE), which reformulates traditional
software engineering tasks as search problems. In SBSE the set of all candidate solutions to the
problem de nes the search space while a tness function di erentiates between candidate solutions
providing a guidance to the optimization process. After the reformulation of software engineering
tasks as optimization problems, search algorithms are used to solve them. Several search algorithms
have been used in literature, such as genetic algorithms, genetic programming, simulated annealing,
hill climbing (gradient descent), greedy algorithms, particle swarm and ant colony.
This thesis investigates and proposes the usage of search based approaches to reduce the e ort
of software maintenance and software testing with particular attention to four main activities: (i)
program comprehension; (ii) defect prediction; (iii) test data generation and (iv) test suite optimiza-
tion for regression testing. For program comprehension and defect prediction, this thesis provided
their rst formulations as optimization problems and then proposed the usage of genetic algorithms
to solve them. More precisely, this thesis investigates the peculiarity of source code against textual
documents written in natural language and proposes the usage of Genetic Algorithms (GAs) in
order to calibrate and assemble IR-techniques for di erent software engineering tasks. This thesis
also investigates and proposes the usage of Multi-Objective Genetic Algorithms (MOGAs) in or-
der to build multi-objective defect prediction models that allows to identify defect-prone software
components by taking into account multiple and practical software engineering criteria.
Test data generation and test suite optimization have been extensively investigated as search-
based problems in literature . However, despite the huge body of works on search algorithms
applied to software testing, both (i) automatic test data generation and (ii) test suite optimization
present several limitations and not always produce satisfying results. The success of evolutionary
software testing techniques in general, and GAs in particular, depends on several factors. One of
these factors is the level of diversity among the individuals in the population, which directly a ects
the exploration ability of the search. For example, evolutionary test case generation techniques that
employ GAs could be severely a ected by genetic drift, i.e., a loss of diversity between solutions,
which lead to a premature convergence of GAs towards some local optima. For these reasons,
this thesis investigate the role played by diversity preserving mechanisms on the performance of
GAs and proposed a novel diversity mechanism based on Singular Value Decomposition and linear
algebra. Then, this mechanism has been integrated within the standard GAs and evaluated for
evolutionary test data generation. It has been also integrated within MOGAs and empirically
evaluated for regression testing. [edited by author]XII n.s
Positioning and power in academic publishing: players, agents and agendas
The field of electronic publishing has grown exponentially in the last two decades, but we are still in the middle of this digital transformation. With technologies coming and going for all kinds of reasons, the distribution of economic, technological and discursive power continues to be negotiated.
This book presents the proceedings of the 20th Conference on Electronic Publishing (Elpub), held in Göttingen, Germany, in June 2016. This yearâs conference explores issues of positioning and power in academic publishing, and it brings together world leading stakeholders such as academics, practitioners, policymakers, students and entrepreneurs from a wide variety of fields to exchange information and discuss the advent of innovations in the areas of electronic publishing, as well as reflect on the development in the field over the last 20 years. Topics covered in the papers include how to maintain the quality of electronic publications, modeling processes and the increasingly prevalent issue of open access, as well as new systems, database repositories and datasets.
This overview of the field will be of interest to all those who work in or make use of electronic publishing
Proceedings of the 2nd Conference on Production Systems and Logistics (CPSL 2021)
Proceedings of the CPSL 202