29 research outputs found

    Community structure of complex software systems: Analysis and applications

    Full text link
    Due to notable discoveries in the fast evolving field of complex networks, recent research in software engineering has also focused on representing software systems with networks. Previous work has observed that these networks follow scale-free degree distributions and reveal small-world phenomena, while we here explore another property commonly found in different complex networks, i.e. community structure. We adopt class dependency networks, where nodes represent software classes and edges represent dependencies among them, and show that these networks reveal a significant community structure, characterized by similar properties as observed in other complex networks. However, although intuitive and anticipated by different phenomena, identified communities do not exactly correspond to software packages. We empirically confirm our observations on several networks constructed from Java and various third party libraries, and propose different applications of community detection to software engineering

    Supporting feature-level software maintenance

    Get PDF
    Software maintenance is the process of modifying a software system to fix defects, improve performance, add new functionality, or adapt the system to a new environment. A maintenance task is often initiated by a bug report or a request for new functionality. Bug reports typically describe problems with incorrect behaviors or functionalities. These behaviors or functionalities are known as features. Even in very well-designed systems, the source code that implements features is often not completely modularized. The delocalized nature of features makes maintaining them challenging. Since maintenance tasks are expressed in terms of features, the goal of this dissertation is to support software maintenance at the feature-level. We focus on two tasks in particular: feature location and impact analysis via feature coupling.;Feature location is the process of identifying the source code that implements a feature, and it is an essential first step to any maintenance task. There are many existing techniques for feature location that incorporate various types of analyses such as static, dynamic, and textual. In this dissertation, we recognize the advantages of leveraging several types of analyses and introduce a new approach to feature location based on combining dynamic analysis, textual analysis, and web mining algorithms applied to software. The use of web mining for feature location is a novel contribution, and we show that our new techniques based on web mining are significantly more effective than the current state of the art.;After using feature location to identify a feature\u27s source code, maintenance can be completed on that feature. Impact analysis should then be performed to revalidate the system and determine which other features may have been affected by the modifications. We define three feature coupling metrics that capture the relationship between features based on structural information, textual information, and their combination. Our novel feature coupling metrics can be used for impact analysis to quantify the strength of coupling between pairs of features. We performed three empirical studies on open-source software systems to assess the feature coupling metrics and established three major results. First, there is a moderate to strong statistically significant correlation between feature coupling and faults. Second, feature coupling can be used to correctly determine about half of the other features that would be affected by a change to a given feature. Finally, we found that the metrics align with developers\u27 opinions about pairs of features that are actually coupled

    Mining temporal rules for software maintenance

    Get PDF
    10.1002/smr.375Journal of Software Maintenance and Evolution204227-247JSME

    Classes-Chave em sistemas orientados a objetos: detecção e uso

    Get PDF
    Several object-oriented systems, such as Lucene, Tomcat, Javac have their respective design documented using key-classes, defined as important/central classes to understand the object-oriented design. Considering this fact, and considering that, in general, software architecture is not formally documented to help developers understanding and assessing software design, Keecle is proposed as an approach based on dynamic and static analysis for detection of key classes in a semi-automatic way. The application of filtering mechanisms on the search space of the dynamic data is proposed in order to obtain a reduced set of key classes. The approach is evaluated with fourteen proprietary and open source systems in order to verify that the found classes correspond to the key classes of the ground-truth, which is defined from the documentation or defined by the developers. The results were analyzed in terms of precision and recall, and have shown to be superior to the state-of-the-art approach. The role of key classes in assessing design has also been investigated. The organization of the key classes in a dependency graph, which highlights explicit dependency relations in the source code, was evaluated to be adequate for design comprehension and assessment. Key classes were evaluated whether they are more prone to bad smells, and whether specific types of bad smells are associated with different levels of cohesion and coupling metrics. In addition, the ownership of key classes was shown to be more concentrated in a reduced set of developers. Finally, we conducted an experimental study with students and a survey with developers to evaluate documentation based on key classes. The results indicate that the documentation based on key classes are a feasible alternative for use as complementary documentation to the existing one, or for use as main documentation in environments where documentation is not available.FAPEG - Fundação de Amparo à Pesquisa do Estado de GoiásTese (Doutorado)Vários sistemas orientados a objetos, tais como Lucene, Tomcat, Javac tem seus respectivos projetos (designs) documentados usando classes-chave, definidas como sendo classes importantes/centrais para compreender o projeto de sistemas orientados a objetos. Considerando este fato, e considerando que geralmente a arquitetura não é formalmente documentada para auxiliar os desenvolvedores a entenderem e avaliarem o projeto do software, é proposta Keecle, uma abordagem baseada em análise dinâmica e estática para detecção de classes-chave de maneira semi-automática. É proposta a aplicação de mecanismos de filtragem no espaço de busca dos dados dinâmicos, para obter um conjunto reduzido de classes-chave. A abordagem é avaliada com quatorze sistemas de código aberto e proprietários, a fim de verificar se as classes encontradas correspondem às classes-chave definidas na documentação ou definidas pelos desenvolvedores. Os resultados foram analisados em termos de precisão e recall e são superiores às abordagens da literatura. O papel das classes-chave para avaliar o projeto também foi investigado. Foi avaliado se a organização das classes-chave em um grafo de dependências, o qual destaca relações de dependência explícitas no código fonte, é um mecanismo adequado para avaliar o design. Foi analisado estatisticamente, se classes-chave são mais propensas a bad smells, e se tipos específicos de bad smells estão associados a diferentes níveis de métricas de coesão e acoplamento. Além disso, a propriedade (ownership) das classes-chave foi analisada, indicando concentração em um conjunto reduzido de desenvolvedores. Por fim, foram conduzidos um estudo experimental com estudantes e um survey com desenvolvedores para avaliar a documentação baseada em classes-chave. Os resultados demonstram que a documentação baseada em classes-chave apresenta resultados que indicam a viabilidade de uso como documentação complementar à existente ou como documentação principal em ambientes onde a documentação não está disponível

    Supporting Source Code Feature Analysis Using Execution Trace Mining

    Get PDF
    Software maintenance is a significant phase of a software life-cycle. Once a system is developed the main focus shifts to maintenance to keep the system up to date. A system may be changed for various reasons such as fulfilling customer requirements, fixing bugs or optimizing existing code. Code needs to be studied and understood before any modification is done to it. Understanding code is a time intensive and often complicated part of software maintenance that is supported by documentation and various tools such as profilers, debuggers and source code analysis techniques. However, most of the tools fail to assist in locating the portions of the code that implement the functionality the software developer is focusing. Mining execution traces can help developers identify parts of the source code specific to the functionality of interest and at the same time help them understand the behaviour of the code. We propose a use-driven hybrid framework of static and dynamic analyses to mine and manage execution traces to support software developers in understanding how the system's functionality is implemented through feature analysis. We express a system's use as a set of tests. In our approach, we develop a set of uses that represents how a system is used or how a user uses some specific functionality. Each use set describes a user's interaction with the system. To manage large and complex traces we organize them by system use and segment them by user interface events. The segmented traces are also clustered based on internal and external method types. The clusters are further categorized into groups based on application programming interfaces and active clones. To further support comprehension we propose a taxonomy of metrics which are used to quantify the trace. To validate the framework we built a tool called TrAM that implements trace mining and provides visualization features. It can quantify the trace method information, mine similar code fragments called active clones, cluster methods based on types, categorise them based on groups and quantify their behavioural aspects using a set of metrics. The tool also lets the users visualize the design and implementation of a system using images, filtering, grouping, event and system use, and present them with values calculated using trace, group, clone and method metrics. We also conducted a case study on five different subject systems using the tool to determine the dynamic properties of the source code clones at runtime and answer three research questions using our findings. We compared our tool with trace mining tools and profilers in terms of features, and scenarios. Finally, we evaluated TrAM by conducting a user study on its effectiveness, usability and information management

    Regression test selection for distributed Java RMI programs by means of formal concept analysis

    Get PDF
    Software maintenance is the process of modifying an existing system to ensure that it meets current and future requirements. As a result, performing regression testing becomes an essential but time consuming aspect of any maintenance activity. Regression testing is initiated after a programmer has made changes to a program that may have inadvertently introduced errors. It is a quality control approach to ensure that the newly modified code still complies with its specified requirements and that unmodified code has not been affected by the maintenance activity. In the literature various types of test selection techniques have been proposed to reduce the effort associated with re-executing the required test cases. However, the majority of these approach has been focusing only on sequential programs, and provide no or only very limited support for distributed programs or database-driven applications. The thesis presents a lightweight methodology, which applies Formal Concept Analysis to support a regression test selection analysis, in combination with execution trace collection and external data sharing analysis, for distributed Java RMI programs. Two Eclipse plug-ins were developed to automate the regression test selection process and to evaluate our methodology

    Network analysis of large scale object oriented software systems

    Get PDF
    PhD ThesisThe evolution of software engineering knowledge, technology, tools, and practices has seen progressive adoption of new design paradigms. Currently, the predominant design paradigm is object oriented design. Despite the advocated and demonstrated benefits of object oriented design, there are known limitations of static software analysis techniques for object oriented systems, and there are many current and legacy object oriented software systems that are difficult to maintain using the existing reverse engineering techniques and tools. Consequently, there is renewed interest in dynamic analysis of object oriented systems, and the emergence of large and highly interconnected systems has fuelled research into the development of new scalable techniques and tools to aid program comprehension and software testing. In dynamic analysis, a key research problem is efficient interpretation and analysis of large volumes of precise program execution data to facilitate efficient handling of software engineering tasks. Some of the techniques, employed to improve the efficiency of analysis, are inspired by empirical approaches developed in other fields of science and engineering that face comparable data analysis challenges. This research is focused on application of empirical network analysis measures to dynamic analysis data of object oriented software. The premise of this research is that the methods that contribute significantly to the object collaboration network's structural integrity are also important for delivery of the software system’s function. This thesis makes two key contributions. First, a definition is proposed for the concept of the functional importance of methods of object oriented software. Second, the thesis proposes and validates a conceptual link between object collaboration networks and the properties of a network model with power law connectivity distribution. Results from empirical software engineering experiments on JHotdraw and Google Chrome are presented. The results indicate that five considered standard centrality based network measures can be used to predict functionally important methods with a significant level of accuracy. The search for functional importance of software elements is an essential starting point for program comprehension and software testing activities. The proposed definition and application of network analysis has the potential to improve the efficiency of post release phase software engineering activities by facilitating rapid identification of potentially functionally important methods in object oriented software. These results, with some refinement, could be used to perform change impact prediction and a host of other potentially beneficial applications to improve software engineering techniques

    A unified framework for the comprehension of software's time dimension

    Get PDF
    Les logiciels sont de plus en plus complexes et leur développement est souvent fait par des équipes dispersées et changeantes. Par ailleurs, de nos jours, la majorité des logiciels sont recyclés au lieu d’être développés à partir de zéro. La tâche de compréhension, inhérente aux tâches de maintenance, consiste à analyser plusieurs dimensions du logiciel en parallèle. La dimension temps intervient à deux niveaux dans le logiciel : il change durant son évolution et durant son exécution. Ces changements prennent un sens particulier quand ils sont analysés avec d’autres dimensions du logiciel. L’analyse de données multidimensionnelles est un problème difficile à résoudre. Cependant, certaines méthodes permettent de contourner cette difficulté. Ainsi, les approches semi-automatiques, comme la visualisation du logiciel, permettent à l’usager d’intervenir durant l’analyse pour explorer et guider la recherche d’informations. Dans une première étape de la thèse, nous appliquons des techniques de visualisation pour mieux comprendre la dynamique des logiciels pendant l’évolution et l’exécution. Les changements dans le temps sont représentés par des heat maps. Ainsi, nous utilisons la même représentation graphique pour visualiser les changements pendant l’évolution et ceux pendant l’exécution. Une autre catégorie d’approches, qui permettent de comprendre certains aspects dynamiques du logiciel, concerne l’utilisation d’heuristiques. Dans une seconde étape de la thèse, nous nous intéressons à l’identification des phases pendant l’évolution ou pendant l’exécution en utilisant la même approche. Dans ce contexte, la prémisse est qu’il existe une cohérence inhérente dans les évènements, qui permet d’isoler des sous-ensembles comme des phases. Cette hypothèse de cohérence est ensuite définie spécifiquement pour les évènements de changements de code (évolution) ou de changements d’état (exécution). L’objectif de la thèse est d’étudier l’unification de ces deux dimensions du temps que sont l’évolution et l’exécution. Ceci s’inscrit dans notre volonté de rapprocher les deux domaines de recherche qui s’intéressent à une même catégorie de problèmes, mais selon deux perspectives différentes.Software systems are getting more and more complex and are developed by teams that are constantly changing and not necessarily working in the same location. Moreover, most software systems, nowadays, are recycled rather than being developed from scratch. A comprehension task is crucial when performing maintenance tasks; it consists of analyzing multiple software dimensions concurrently. Time is one of these dimensions, as software changes its state with time in two manners: during their execution and during their evolution. These changes make sense only when analyzed within the context of other software dimensions, such as structure or bug information. Multidimensional analysis is a difficult problem to solve. However, there are certain methods that bypass this difficulty, such as semi-automatic approaches. Software visualization is one of them, as it allows being part of the analysis by exploring and guiding information search. The first stage of the thesis consists of applying visualization techniques to better understand software dynamicity during execution and evolution. Changes over time are represented by heat maps. Hence, we utilize the same graphical representation to visualize both change types over time. Other approaches that permit the analysis of a program’s dynamic behavior over time involve the use of heuristics. In the thesis’ second stage, we are interested in the identification of the programs’ execution phases and evolution patterns using the same approach, i.e. search-based optimisation. In this context, the premise is the existence of internal cohesion between change events that allow the clustering in phases. This hypothesis of cohesion is defined specifically for change events in the code during software evolution and state changes during program execution. This thesis’ main objective is to study the unification of these two time dimensions, evolution and execution, in an attempt to bring together two research domains that work on the same set of problems, but from two different perspectives
    corecore