16,553 research outputs found
Improving Change Prediction Models with Code Smell-Related Information
Code smells represent sub-optimal implementation choices applied by
developers when evolving software systems. The negative impact of code smells
has been widely investigated in the past: besides developers' productivity and
ability to comprehend source code, researchers empirically showed that the
presence of code smells heavily impacts the change-proneness of the affected
classes. On the basis of these findings, in this paper we conjecture that code
smell-related information can be effectively exploited to improve the
performance of change prediction models, ie models having as goal that of
indicating to developers which classes are more likely to change in the future,
so that they may apply preventive maintenance actions. Specifically, we exploit
the so-called intensity index - a previously defined metric that captures the
severity of a code smell - and evaluate its contribution when added as
additional feature in the context of three state of the art change prediction
models based on product, process, and developer-based features. We also compare
the performance achieved by the proposed model with the one of an alternative
technique that considers the previously defined antipattern metrics, namely a
set of indicators computed considering the history of code smells in files. Our
results report that (i) the prediction performance of the intensity-including
models is statistically better than that of the baselines and (ii) the
intensity is a more powerful metric with respect to the alternative
smell-related ones
Modelling Complexity: the case of Climate Science
We briefly review some of the scientific challenges and epistemological
issues related to climate science. We discuss the formulation and testing of
theories and numerical models, which, given the presence of unavoidable
uncertainties in observational data, the non-repeatability of
world-experiments, and the fact that relevant processes occur in a large
variety of spatial and temporal scales, require a rather different approach
than in other scientific contexts. A brief discussion of the intrinsic
limitations of geo-engineering solutions to global warming is presented, and a
framework of investigation based upon non-equilibrium thermodynamics is
proposed. We also critically discuss recently proposed perspectives of
development of climate science based purely upon massive use of supercomputer
and centralized planning of scientific priorities.Comment: 17 pages, 7 figs, Proceeding of the Conference "Modelling Complexity:
the case of Climate Science", Hamburg, 201
Studying Collective Human Decision Making and Creativity with Evolutionary Computation
We report a summary of our interdisciplinary research project "Evolutionary
Perspective on Collective Decision Making" that was conducted through close
collaboration between computational, organizational and social scientists at
Binghamton University. We redefined collective human decision making and
creativity as evolution of ecologies of ideas, where populations of ideas
evolve via continual applications of evolutionary operators such as
reproduction, recombination, mutation, selection, and migration of ideas, each
conducted by participating humans. Based on this evolutionary perspective, we
generated hypotheses about collective human decision making using agent-based
computer simulations. The hypotheses were then tested through several
experiments with real human subjects. Throughout this project, we utilized
evolutionary computation (EC) in non-traditional ways---(1) as a theoretical
framework for reinterpreting the dynamics of idea generation and selection, (2)
as a computational simulation model of collective human decision making
processes, and (3) as a research tool for collecting high-resolution
experimental data of actual collaborative design and decision making from human
subjects. We believe our work demonstrates untapped potential of EC for
interdisciplinary research involving human and social dynamics.Comment: 20 pages, 7 figures, 1 table (Supplemental materials not included
Review on Graph Feature Learning and Feature Extraction Techniques for Link Prediction
The problem of link prediction has recently attracted considerable attention
by research community. Given a graph, which is an abstraction of the
relationships among entities, the task of link prediction is to anticipate
future connections among entities in the graph, concerning its current state.
Extensive studies have examined this problem from different aspects and
proposed various methods, some of which might work very well for a specific
application but not as a global solution. This work presents an extensive
review of state-of-art methods and algorithms proposed on this subject and
categorizes them into four main categories: similarity-based methods,
probabilistic methods, relational models, and learning-based methods.
Additionally, a collection of network data sets has been presented in this
paper, which can be used to study link prediction. To the best of our
knowledge, this survey is the first comprehensive study that considers all of
the mentioned challenges and solutions for link prediction in graphs with the
improvements in the recent years, including the unsupervised and supervised
techniques and their evolution over the recent years.Comment: 31 pages, 7 figure
On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types
We present progress on the experimental validation of a fundamental and
universally applicable vulnerability analysis framework that is capable of
identifying new types of vulnerabilities before attackers innovate attacks.
This new framework proactively identifies system components that are vulnerable
based upon their Kolmogorov Complexity estimates and it facilitates prediction
of previously unknown vulnerabilities that are likely to be exploited by future
attack methods. A tool that utilizes a growing library of complexity estimators
is presented. This work is an incremental step towards validation of the
concept of complexity-based vulnerability analysis. In particular, results
indicate that data types (semantic types) can be identified by estimates of
their complexity. Thus, a map of complexity can identify suspicious types, such
as executable data embedded within passive data types, without resorting to
predefined headers, signatures, or other limiting a priori information
Nestedness in complex networks: Observation, emergence, and implications
The observed architecture of ecological and socio-economic networks differs
significantly from that of random networks. From a network science standpoint,
non-random structural patterns observed in real networks call for an
explanation of their emergence and an understanding of their potential systemic
consequences. This article focuses on one of these patterns: nestedness. Given
a network of interacting nodes, nestedness can be described as the tendency for
nodes to interact with subsets of the interaction partners of better-connected
nodes. Known since more than years in biogeography, nestedness has been
found in systems as diverse as ecological mutualistic organizations, world
trade, inter-organizational relations, among many others. This review article
focuses on three main pillars: the existing methodologies to observe nestedness
in networks; the main theoretical mechanisms conceived to explain the emergence
of nestedness in ecological and socio-economic networks; the implications of a
nested topology of interactions for the stability and feasibility of a given
interacting system. We survey results from variegated disciplines, including
statistical physics, graph theory, ecology, and theoretical economics.
Nestedness was found to emerge both in bipartite networks and, more recently,
in unipartite ones; this review is the first comprehensive attempt to unify
both streams of studies, usually disconnected from each other. We believe that
the truly interdisciplinary endeavour -- while rooted in a complex systems
perspective -- may inspire new models and algorithms whose realm of application
will undoubtedly transcend disciplinary boundaries.Comment: In press. 140 pages, 34 figure
Combining complex networks and data mining: why and how
The increasing power of computer technology does not dispense with the need
to extract meaningful in- formation out of data sets of ever growing size, and
indeed typically exacerbates the complexity of this task. To tackle this
general problem, two methods have emerged, at chronologically different times,
that are now commonly used in the scientific community: data mining and complex
network theory. Not only do complex network analysis and data mining share the
same general goal, that of extracting information from complex systems to
ultimately create a new compact quantifiable representation, but they also
often address similar problems too. In the face of that, a surprisingly low
number of researchers turn out to resort to both methodologies. One may then be
tempted to conclude that these two fields are either largely redundant or
totally antithetic. The starting point of this review is that this state of
affairs should be put down to contingent rather than conceptual differences,
and that these two fields can in fact advantageously be used in a synergistic
manner. An overview of both fields is first provided, some fundamental concepts
of which are illustrated. A variety of contexts in which complex network theory
and data mining have been used in a synergistic manner are then presented.
Contexts in which the appropriate integration of complex network metrics can
lead to improved classification rates with respect to classical data mining
algorithms and, conversely, contexts in which data mining can be used to tackle
important issues in complex network theory applications are illustrated.
Finally, ways to achieve a tighter integration between complex networks and
data mining, and open lines of research are discussed.Comment: 58 pages, 19 figure
A Study on Software Metrics and its Impact on Software Quality
Software metrics offer a quantitative basis for predicting the software
development process. In this way, software quality can be improved very easily.
Software quality should be achieved to satisfy the customer with decreasing the
software cost and improve there liability of the software product. In this
research, we have discussed how the software metrics affect the quality of the
software and which stages of its development software metrics have applied. We
discussed the different software metrics and how these metrics have an impact
on software quality and reliability. These techniques have been used for
improving the quality of software and increase the revenue.Comment: 14 pages, 10 figures, 11 table
Predicting Good Configurations for GitHub and Stack Overflow Topic Models
Software repositories contain large amounts of textual data, ranging from
source code comments and issue descriptions to questions, answers, and comments
on Stack Overflow. To make sense of this textual data, topic modelling is
frequently used as a text-mining tool for the discovery of hidden semantic
structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used
topic model that aims to explain the structure of a corpus by grouping texts.
LDA requires multiple parameters to work well, and there are only rough and
sometimes conflicting guidelines available on how these parameters should be
set. In this paper, we contribute (i) a broad study of parameters to arrive at
good local optima for GitHub and Stack Overflow text corpora, (ii) an
a-posteriori characterisation of text corpora related to eight programming
languages, and (iii) an analysis of corpus feature importance via per-corpus
LDA configuration. We find that (1) popular rules of thumb for topic modelling
parameter configuration are not applicable to the corpora used in our
experiments, (2) corpora sampled from GitHub and Stack Overflow have different
characteristics and require different configurations to achieve good model fit,
and (3) we can predict good configurations for unseen corpora reliably. These
findings support researchers and practitioners in efficiently determining
suitable configurations for topic modelling when analysing textual data
contained in software repositories.Comment: to appear as full paper at MSR 2019, the 16th International
Conference on Mining Software Repositorie
Sensory Metrics of Neuromechanical Trust
Today digital sources supply an unprecedented component of human sensorimotor
data, the consumption of which is correlated with poorly understood maladies
such as Internet Addiction Disorder and Internet Gaming Disorder. This paper
offers a mathematical understanding of human sensorimotor processing as
multiscale, continuous-time vibratory interaction. We quantify human
informational needs using the signal processing metrics of entropy, noise,
dimensionality, continuity, latency, and bandwidth. Using these metrics, we
define the trust humans experience as a primitive statistical algorithm
processing finely grained sensorimotor data from neuromechanical interaction.
This definition of neuromechanical trust implies that artificial sensorimotor
inputs and interactions that attract low-level attention through frequent
discontinuities and enhanced coherence will decalibrate a brain's
representation of its world over the long term by violating the implicit
statistical contract for which self-calibration evolved. This approach allows
us to model addiction in general as the result of homeostatic regulation gone
awry in novel environments and digital dependency as a sub-case in which the
decalibration caused by digital sensorimotor data spurs yet more consumption of
them. We predict that institutions can use these sensorimotor metrics to
quantify media richness to improve employee well-being; that dyads and
family-size groups will bond and heal best through low-latency, high-resolution
multisensory interaction such as shared meals and reciprocated touch; and that
individuals can improve sensory and sociosensory resolution through deliberate
sensory reintegration practices. We conclude that we humans are the victims of
our own success, our hands so skilled they fill the world with captivating
things, our eyes so innocent they follow eagerly.Comment: 59 pages, 14 figure
- …