70 research outputs found

    Using Machine Learning and Graph Mining Approaches to Improve Software Requirements Quality: An Empirical Investigation

    Get PDF
    Software development is prone to software faults due to the involvement of multiple stakeholders especially during the fuzzy phases (requirements and design). Software inspections are commonly used in industry to detect and fix problems in requirements and design artifacts, thereby mitigating the fault propagation to later phases where the same faults are harder to find and fix. The output of an inspection process is list of faults that are present in software requirements specification document (SRS). The artifact author must manually read through the reviews and differentiate between true-faults and false-positives before fixing the faults. The first goal of this research is to automate the detection of useful vs. non-useful reviews. Next, post-inspection, requirements author has to manually extract key problematic topics from useful reviews that can be mapped to individual requirements in an SRS to identify fault-prone requirements. The second goal of this research is to automate this mapping by employing Key phrase extraction (KPE) algorithms and semantic analysis (SA) approaches to identify fault-prone requirements. During fault-fixations, the author has to manually verify the requirements that could have been impacted by a fix. The third goal of my research is to assist the authors post-inspection to handle change impact analysis (CIA) during fault fixation using NL processing with semantic analysis and mining solutions from graph theory. The selection of quality inspectors during inspections is pertinent to be able to carry out post-inspection tasks accurately. The fourth goal of this research is to identify skilled inspectors using various classification and feature selection approaches. The dissertation has led to the development of automated solution that can identify useful reviews, help identify skilled inspectors, extract most prominent topics/keyphrases from fault logs; and help RE author during the fault-fixation post inspection

    Visual analytics for relationships in scientific data

    Get PDF
    Domain scientists hope to address grand scientific challenges by exploring the abundance of data generated and made available through modern high-throughput techniques. Typical scientific investigations can make use of novel visualization tools that enable dynamic formulation and fine-tuning of hypotheses to aid the process of evaluating sensitivity of key parameters. These general tools should be applicable to many disciplines: allowing biologists to develop an intuitive understanding of the structure of coexpression networks and discover genes that reside in critical positions of biological pathways, intelligence analysts to decompose social networks, and climate scientists to model extrapolate future climate conditions. By using a graph as a universal data representation of correlation, our novel visualization tool employs several techniques that when used in an integrated manner provide innovative analytical capabilities. Our tool integrates techniques such as graph layout, qualitative subgraph extraction through a novel 2D user interface, quantitative subgraph extraction using graph-theoretic algorithms or by querying an optimized B-tree, dynamic level-of-detail graph abstraction, and template-based fuzzy classification using neural networks. We demonstrate our system using real-world workflows from several large-scale studies. Parallel coordinates has proven to be a scalable visualization and navigation framework for multivariate data. However, when data with thousands of variables are at hand, we do not have a comprehensive solution to select the right set of variables and order them to uncover important or potentially insightful patterns. We present algorithms to rank axes based upon the importance of bivariate relationships among the variables and showcase the efficacy of the proposed system by demonstrating autonomous detection of patterns in a modern large-scale dataset of time-varying climate simulation

    Community detection in graphs

    Full text link
    The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks.Comment: Review article. 103 pages, 42 figures, 2 tables. Two sections expanded + minor modifications. Three figures + one table + references added. Final version published in Physics Report

    Graph ambiguity

    Get PDF
    In this paper, we propose a rigorous way to define the concept of ambiguity in the domain of graphs. In past studies, the classical definition of ambiguity has been derived starting from fuzzy set and fuzzy information theories. Our aim is to show that also in the domain of the graphs it is possible to derive a formulation able to capture the same semantic and mathematical concept. To strengthen the theoretical results, we discuss the application of the graph ambiguity concept to the graph classification setting, conceiving a new kind of inexact graph matching procedure. The results prove that the graph ambiguity concept is a characterizing and discriminative property of graphs. (C) 2013 Elsevier B.V. All rights reserved

    The modular structure of brain functional connectivity networks: a graph theoretical approach

    Get PDF
    Complex networks theory offers a framework for the analysis of brain functional connectivity as measured by magnetic resonance imaging. Within this approach the brain is represented as a graph comprising nodes connected by links, with nodes corresponding to brain regions and the links to measures of inter-regional interaction. A number of graph theoretical methods have been proposed to analyze the modular structure of these networks. The most widely used metric is Newman's Modularity, which identifies modules within which links are more abundant than expected on the basis of a random network. However, Modularity is limited in its ability to detect relatively small communities, a problem known as ``resolution limit''. As a consequence, unambiguously identifiable modules, like complete sub-graphs, may be unduly merged into larger communities when they are too small compared to the size of the network. This limit, first demonstrated for Newman's Modularity, is quite general and affects, to a different extent, all methods that seek to identify the community structure of a network through the optimization of a global quality function. Hence, the resolution limit may represent a critical shortcoming for the study of brain networks, and is likely to have affected many of the studies reported in the literature. This work pioneers the use of Surprise and Asymptotical Surprise, two quality functions rooted in probability theory that aims at overcoming the resolution limit for both binary and weighted networks. Hereby, heuristics for their optimization are developed and tested, showing that the resulting optimal partitioning can highlight anatomically and functionally plausible modules from brain connectivity datasets, on binary and weighted networks. This novel approach is applied to the partitioning of two different human brain networks that have been extensively characterized in the literature, to address the resolution-limit issue in the study of the brain modular structure. Surprise maximization in human resting state networks revealed the presence of a rich structure of modules with heterogeneous size distribution undetectable by current methods. Moreover, Surprise led to different, more accurate classification of the network's connector hubs, the elements that integrate the brain modules into a cohesive structure. In synthetic networks, Asymptotical Surprise showed high sensitivity and specificity in the detection of ground-truth structures, particularly in the presence of noise and variability such as those observed in experimental functional MRI data. Finally, the methodological advances hereby introduced are shown to be a helpful tool to better discern differences between the modular organization of functional connectivity of healthy subjects and schizophrenic patients. Importantly, these differences may point to new clinical hypotheses on the etiology of schizophrenia, and they would have gone unnoticed with resolution-limited methods. This may call for a revisitation of some of the current models of the modular organization of the healthy and diseased brain

    Physical Models in Community Detection with Applications to Identifying Structure in Complex Amorphous Systems

    Get PDF
    We present an exceptionally accurate spin-glass-type Potts model for the graph theoretic problem of community detection. With a simple algorithm, we find that our approach is exceptionally accurate, robust to the effects of noise, and competitive with the best currently available algorithms in terms of speed and the size of solvable systems. Being a local measure of community structure, our Potts model is free from a resolution limit that hinders community solutions for some popular community detection models. It further remains a local measure on weighted and directed graphs. We apply our community detection method to accurately and quantitatively evaluate the multi-scale: multiresolution ) structure of a graph. Our multiresolution algorithm calculates correlations among multiple copies: replicas ) of the same graph over a range of resolutions. Significant multiresolution structures are identified by strongly correlated replicas. The average normalized mutual information and variation of information give a quantitative estimate of the best resolutions and indicate the relative strength of the structures in the graph. We further investigate a phase transition effect in community detection, and we elaborate on its relation to analogous physical phase transitions. Finally, we apply our community detection methods to ascertain the most natural complex amorphous structures in two model glasses in an unbiased manner. We construct a model graph for the physical systems using the potential energy to generate weighted edge relationships for all pairs of atoms. We then solve for the communities within the model network and associate the best communities with the natural structures in the physical systems

    Algorithms and Software for the Analysis of Large Complex Networks

    Get PDF
    The work presented intersects three main areas, namely graph algorithmics, network science and applied software engineering. Each computational method discussed relates to one of the main tasks of data analysis: to extract structural features from network data, such as methods for community detection; or to transform network data, such as methods to sparsify a network and reduce its size while keeping essential properties; or to realistically model networks through generative models
    • 

    corecore