Search CORE

16,553 research outputs found

Improving Change Prediction Models with Code Smell-Related Information

Author: Catolino Gemma
De Lucia Andrea
Ferrucci Filomena
Fontana Francesca Arcelli
Palomba Fabio
Zaidman Andy
Publication venue
Publication date: 26/05/2019
Field of study

Code smells represent sub-optimal implementation choices applied by developers when evolving software systems. The negative impact of code smells has been widely investigated in the past: besides developers' productivity and ability to comprehend source code, researchers empirically showed that the presence of code smells heavily impacts the change-proneness of the affected classes. On the basis of these findings, in this paper we conjecture that code smell-related information can be effectively exploited to improve the performance of change prediction models, ie models having as goal that of indicating to developers which classes are more likely to change in the future, so that they may apply preventive maintenance actions. Specifically, we exploit the so-called intensity index - a previously defined metric that captures the severity of a code smell - and evaluate its contribution when added as additional feature in the context of three state of the art change prediction models based on product, process, and developer-based features. We also compare the performance achieved by the proposed model with the one of an alternative technique that considers the previously defined antipattern metrics, namely a set of indicators computed considering the history of code smells in files. Our results report that (i) the prediction performance of the intensity-including models is statistically better than that of the baselines and (ii) the intensity is a more powerful metric with respect to the alternative smell-related ones

arXiv.org e-Print Archive

Modelling Complexity: the case of Climate Science

Author: Lucarini Valerio
Publication venue
Publication date: 07/06/2011
Field of study

We briefly review some of the scientific challenges and epistemological issues related to climate science. We discuss the formulation and testing of theories and numerical models, which, given the presence of unavoidable uncertainties in observational data, the non-repeatability of world-experiments, and the fact that relevant processes occur in a large variety of spatial and temporal scales, require a rather different approach than in other scientific contexts. A brief discussion of the intrinsic limitations of geo-engineering solutions to global warming is presented, and a framework of investigation based upon non-equilibrium thermodynamics is proposed. We also critically discuss recently proposed perspectives of development of climate science based purely upon massive use of supercomputer and centralized planning of scientific priorities.Comment: 17 pages, 7 figs, Proceeding of the Conference "Modelling Complexity: the case of Climate Science", Hamburg, 201

arXiv.org e-Print Archive

Studying Collective Human Decision Making and Creativity with Evolutionary Computation

Author: Dionne Shelley D.
Sayama Hiroki
Publication venue
Publication date: 24/06/2014
Field of study

We report a summary of our interdisciplinary research project "Evolutionary Perspective on Collective Decision Making" that was conducted through close collaboration between computational, organizational and social scientists at Binghamton University. We redefined collective human decision making and creativity as evolution of ecologies of ideas, where populations of ideas evolve via continual applications of evolutionary operators such as reproduction, recombination, mutation, selection, and migration of ideas, each conducted by participating humans. Based on this evolutionary perspective, we generated hypotheses about collective human decision making using agent-based computer simulations. The hypotheses were then tested through several experiments with real human subjects. Throughout this project, we utilized evolutionary computation (EC) in non-traditional ways---(1) as a theoretical framework for reinterpreting the dynamics of idea generation and selection, (2) as a computational simulation model of collective human decision making processes, and (3) as a research tool for collecting high-resolution experimental data of actual collaborative design and decision making from human subjects. We believe our work demonstrates untapped potential of EC for interdisciplinary research involving human and social dynamics.Comment: 20 pages, 7 figures, 1 table (Supplemental materials not included

arXiv.org e-Print Archive

Review on Graph Feature Learning and Feature Extraction Techniques for Link Prediction

Author: Garibay Ivan
Mutlu Ece C.
Oghaz Toktam A.
Rajabi Amirarsalan
Publication venue
Publication date: 26/07/2020
Field of study

The problem of link prediction has recently attracted considerable attention by research community. Given a graph, which is an abstraction of the relationships among entities, the task of link prediction is to anticipate future connections among entities in the graph, concerning its current state. Extensive studies have examined this problem from different aspects and proposed various methods, some of which might work very well for a specific application but not as a global solution. This work presents an extensive review of state-of-art methods and algorithms proposed on this subject and categorizes them into four main categories: similarity-based methods, probabilistic methods, relational models, and learning-based methods. Additionally, a collection of network data sets has been presented in this paper, which can be used to study link prediction. To the best of our knowledge, this survey is the first comprehensive study that considers all of the mentioned challenges and solutions for link prediction in graphs with the improvements in the recent years, including the unsupervised and supervised techniques and their evolution over the recent years.Comment: 31 pages, 7 figure

arXiv.org e-Print Archive

On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types

Author: Bush Stephen F.
Hughes Todd
Publication venue
Publication date: 22/12/2005
Field of study

We present progress on the experimental validation of a fundamental and universally applicable vulnerability analysis framework that is capable of identifying new types of vulnerabilities before attackers innovate attacks. This new framework proactively identifies system components that are vulnerable based upon their Kolmogorov Complexity estimates and it facilitates prediction of previously unknown vulnerabilities that are likely to be exploited by future attack methods. A tool that utilizes a growing library of complexity estimators is presented. This work is an incremental step towards validation of the concept of complexity-based vulnerability analysis. In particular, results indicate that data types (semantic types) can be identified by estimates of their complexity. Thus, a map of complexity can identify suspicious types, such as executable data embedded within passive data types, without resorting to predefined headers, signatures, or other limiting a priori information

arXiv.org e-Print Archive

Nestedness in complex networks: Observation, emergence, and implications

Author: Bascompte Jordi
Mariani Manuel Sebastian
Ren Zhuo-Ming
Tessone Claudio Juan
Publication venue: 'Elsevier BV'
Publication date: 18/05/2019
Field of study

The observed architecture of ecological and socio-economic networks differs significantly from that of random networks. From a network science standpoint, non-random structural patterns observed in real networks call for an explanation of their emergence and an understanding of their potential systemic consequences. This article focuses on one of these patterns: nestedness. Given a network of interacting nodes, nestedness can be described as the tendency for nodes to interact with subsets of the interaction partners of better-connected nodes. Known since more than

80

years in biogeography, nestedness has been found in systems as diverse as ecological mutualistic organizations, world trade, inter-organizational relations, among many others. This review article focuses on three main pillars: the existing methodologies to observe nestedness in networks; the main theoretical mechanisms conceived to explain the emergence of nestedness in ecological and socio-economic networks; the implications of a nested topology of interactions for the stability and feasibility of a given interacting system. We survey results from variegated disciplines, including statistical physics, graph theory, ecology, and theoretical economics. Nestedness was found to emerge both in bipartite networks and, more recently, in unipartite ones; this review is the first comprehensive attempt to unify both streams of studies, usually disconnected from each other. We believe that the truly interdisciplinary endeavour -- while rooted in a complex systems perspective -- may inspire new models and algorithms whose realm of application will undoubtedly transcend disciplinary boundaries.Comment: In press. 140 pages, 34 figure

arXiv.org e-Print Archive

Combining complex networks and data mining: why and how

Author: Boccaletti S.
Kubik E.
Menasalvas E.
Nicchi A.
Papo D.
Sousa P. A.
Zanin M.
Publication venue: 'Elsevier BV'
Publication date: 19/05/2016
Field of study

The increasing power of computer technology does not dispense with the need to extract meaningful in- formation out of data sets of ever growing size, and indeed typically exacerbates the complexity of this task. To tackle this general problem, two methods have emerged, at chronologically different times, that are now commonly used in the scientific community: data mining and complex network theory. Not only do complex network analysis and data mining share the same general goal, that of extracting information from complex systems to ultimately create a new compact quantifiable representation, but they also often address similar problems too. In the face of that, a surprisingly low number of researchers turn out to resort to both methodologies. One may then be tempted to conclude that these two fields are either largely redundant or totally antithetic. The starting point of this review is that this state of affairs should be put down to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. An overview of both fields is first provided, some fundamental concepts of which are illustrated. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Contexts in which the appropriate integration of complex network metrics can lead to improved classification rates with respect to classical data mining algorithms and, conversely, contexts in which data mining can be used to tackle important issues in complex network theory applications are illustrated. Finally, ways to achieve a tighter integration between complex networks and data mining, and open lines of research are discussed.Comment: 58 pages, 19 figure

arXiv.org e-Print Archive

A Study on Software Metrics and its Impact on Software Quality

Author: Mahmood Toqeer
Nisar Muhamad Wasif
Rashid Junaid
Publication venue
Publication date: 30/05/2019
Field of study

Software metrics offer a quantitative basis for predicting the software development process. In this way, software quality can be improved very easily. Software quality should be achieved to satisfy the customer with decreasing the software cost and improve there liability of the software product. In this research, we have discussed how the software metrics affect the quality of the software and which stages of its development software metrics have applied. We discussed the different software metrics and how these metrics have an impact on software quality and reliability. These techniques have been used for improving the quality of software and increase the revenue.Comment: 14 pages, 10 figures, 11 table

arXiv.org e-Print Archive

Predicting Good Configurations for GitHub and Stack Overflow Topic Models

Author: Treude Christoph
Wagner Markus
Publication venue
Publication date: 10/03/2019
Field of study

Software repositories contain large amounts of textual data, ranging from source code comments and issue descriptions to questions, answers, and comments on Stack Overflow. To make sense of this textual data, topic modelling is frequently used as a text-mining tool for the discovery of hidden semantic structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used topic model that aims to explain the structure of a corpus by grouping texts. LDA requires multiple parameters to work well, and there are only rough and sometimes conflicting guidelines available on how these parameters should be set. In this paper, we contribute (i) a broad study of parameters to arrive at good local optima for GitHub and Stack Overflow text corpora, (ii) an a-posteriori characterisation of text corpora related to eight programming languages, and (iii) an analysis of corpus feature importance via per-corpus LDA configuration. We find that (1) popular rules of thumb for topic modelling parameter configuration are not applicable to the corpora used in our experiments, (2) corpora sampled from GitHub and Stack Overflow have different characteristics and require different configurations to achieve good model fit, and (3) we can predict good configurations for unseen corpora reliably. These findings support researchers and practitioners in efficiently determining suitable configurations for topic modelling when analysing textual data contained in software repositories.Comment: to appear as full paper at MSR 2019, the 16th International Conference on Mining Software Repositorie

arXiv.org e-Print Archive

Sensory Metrics of Neuromechanical Trust

Author: Benford Criscillia
Softky William
Publication venue: 'MIT Press - Journals'
Publication date: 19/10/2017
Field of study

Today digital sources supply an unprecedented component of human sensorimotor data, the consumption of which is correlated with poorly understood maladies such as Internet Addiction Disorder and Internet Gaming Disorder. This paper offers a mathematical understanding of human sensorimotor processing as multiscale, continuous-time vibratory interaction. We quantify human informational needs using the signal processing metrics of entropy, noise, dimensionality, continuity, latency, and bandwidth. Using these metrics, we define the trust humans experience as a primitive statistical algorithm processing finely grained sensorimotor data from neuromechanical interaction. This definition of neuromechanical trust implies that artificial sensorimotor inputs and interactions that attract low-level attention through frequent discontinuities and enhanced coherence will decalibrate a brain's representation of its world over the long term by violating the implicit statistical contract for which self-calibration evolved. This approach allows us to model addiction in general as the result of homeostatic regulation gone awry in novel environments and digital dependency as a sub-case in which the decalibration caused by digital sensorimotor data spurs yet more consumption of them. We predict that institutions can use these sensorimotor metrics to quantify media richness to improve employee well-being; that dyads and family-size groups will bond and heal best through low-latency, high-resolution multisensory interaction such as shared meals and reciprocated touch; and that individuals can improve sensory and sociosensory resolution through deliberate sensory reintegration practices. We conclude that we humans are the victims of our own success, our hands so skilled they fill the world with captivating things, our eyes so innocent they follow eagerly.Comment: 59 pages, 14 figure

arXiv.org e-Print Archive