724 research outputs found

    Structural Complexity and Decay in FLOSS Systems: An Inter-Repository Study

    Get PDF
    Past software engineering literature has firmly established that software architectures and the associated code decay over time. Architectural decay is, potentially, a major issue in Free/Libre/Open Source Software (FLOSS) projects, since developers sporadically joining FLOSS projects do not always have a clear understanding of the underlying architecture, and may break the overall conceptual structure by several small changes to the code base. This paper investigates whether the structure of a FLOSS system and its decay can also be influenced by the repository in which it is retained: specifically, two FLOSS repositories are studied to understand whether the complexity of the software structure in the sampled projects is comparable, or one repository hosts more complex systems than the other. It is also studied whether the effort to counteract this complexity is dependent on the repository, and the governance it gives to the hosted projects. The results of the paper are two-fold: on one side, it is shown that the repository hosting larger and more active projects presents more complex structures. On the other side, these larger and more complex systems benefit from more anti-regressive work to reduce this complexity

    IDENTIFICATION AND QUANTIFICATION OF VARIABILITY MEASURES AFFECTING CODE REUSABILITY IN OPEN SOURCE ENVIRONMENT

    Get PDF
    Open source software (OSS) is one of the emerging areas in software engineering, and is gaining the interest of the software development community. OSS was started as a movement, and for many years software developers contributed to it as their hobby (non commercial purpose). Now, OSS components are being reused in CBSD (commercial purpose). However, recently, the use of OSS in SPL is envisioned recently by software engineering researchers, thus bringing it into a new arena. Being an emerging research area, it demands exploratory study to explore the dimensions of this phenomenon. Furthermore, there is a need to assess the reusability of OSS which is the focal point of these disciplines (CBSE, SPL, and OSS). In this research, a mixed method based approach is employed which is specifically 'partially mixed sequential dominant study'. It involves both qualitative (interviews) and quantitative phases (survey and experiment). During the qualitative phase seven respondents were involved, sample size of survey was 396, and three experiments were conducted. The main contribution of this study is results of exploration of the phenomenon 'reuse of OSS in reuse intensive software development'. The findings include 7 categories and 39 dimensions. One of the dimension factors affecting reusability was carried to the quantitative phase (survey and experiment). On basis of the findings, proposal for reusability attribute model was presented at class and package level. Variability is one of the newly identified attribute of reusability. A comprehensive theoretical analysis of variability implementation mechanisms is conducted to propose metrics for its assessment. The reusability attribute model is validated by statistical analysis of I 03 classes and 77 packages. An evolutionary reusability analysis of two open source software was conducted, where different versions of software are analyzed for their reusability. The results show a positive correlation between variability and reusability at package level and validate the other identified attributes. The results would be helpful to conduct further studies in this area

    Identification-method research for open-source software ecosystems

    Get PDF
    In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework

    An Empirical Study on the Joint Impact of Feature Selection and Data Re-sampling on Imbalance Classification

    Get PDF
    In predictive tasks, real-world datasets often present di erent degrees of imbalanced (i.e., long-tailed or skewed) distributions. While the majority (the head or the most frequent) classes have su cient samples, the minority (the tail or the less frequent or rare) classes can be under-represented by a rather limited number of samples. Data pre-processing has been shown to be very e ective in dealing with such problems. On one hand, data re-sampling is a common approach to tackling class imbalance. On the other hand, dimension reduction, which reduces the feature space, is a conventional technique for reducing noise and inconsistencies in a dataset. However, the possible synergy between feature selection and data re-sampling for high-performance imbalance classification has rarely been investigated before. To address this issue, we carry out a comprehensive empirical study on the joint influence of feature selection and re-sampling on two-class imbalance classification. Specifically, we study the performance of two opposite pipelines for imbalance classification by applying feature selection before or after data re-sampling. We conduct a large number of experiments, with a total of 9225 tests, on 52 publicly available datasets, using 9 feature selection methods, 6 resampling approaches for class imbalance learning, and 3 well-known classification algorithms. Experimental results show that there is no constant winner between the two pipelines; thus both of them should be considered to derive the best performing model for imbalance classification. We find that the performance of an imbalance classification model not only depends on the classifier adopted and the ratio between the number of majority and minority samples, but also depends on the ratio between the number of samples and features. Overall, this study should provide new reference value for researchers and practitioners in imbalance learning.TIN2017-89517-

    Successful Reuse of Software Components: A Report from the Open Source Perspective

    Get PDF
    A promising way of software reuse is Component-Based Software Development (CBSD). There is an increasing number of OSS products available that can be freely used in product development. How- ever, OSS communities themselves have not yet taken full advantage of the “reuse mechanism”. Many OSS projects duplicate effort and code, even when sharing the same application domain and topic. One suc- cessful counter-example is the FFMpeg multimedia project, since several of its components are widely and consistently reused into other OSS projects. This paper documents the history of the libavcodec library of components from the FFMpeg project, which at present is reused in more than 140 OSS projects. Most of the recipients use it as a black- box component, although a number of OSS projects keep a copy of it in their repositories, and modify it as such. In both cases, we argue that libavcodec is a successful example of reusable OSS library of compo- nents

    An empirical study of package coupling in Java open-source

    Get PDF
    Excessive coupling between object-oriented classes in systems is generally acknowledged as harmful and is recognised as a maintenance problem that can result in a higher propensity for faults in systems and a „stored up‟ future problem. Characterisation and understanding coupling at different levels of abstraction is therefore important for both the project manager and developer both of whom have a vested interest in software quality. In this Thesis, coupling trends are empirically investigated over multiple versions of seven Java open-source systems (OSS). The first investigation explores the trends in longitudinal changes to open-source systems given by six coupling metrics. Coupling trends are then explored from the perspective of: the relationship between removed classes and their coupling with other classes in the same package; the relationships between coupling and 'warnings’ in packages and the time interval between versions in Java OSS; the relationship between some of these coupling metrics are also explored. Finally, the existence of an 80/20 rule for the coupling metrics is inspected. Results suggest that developer activity comprises a set of high and low periods (peak and trough‟ effect) evident as a system evolves. Findings also demonstrate that addition of coupling may have beneficial effects on a system, particularly if they are added as new functionality through the package Java feature. The fan-in and fan-out coupling metrics reveal particular features and exhibited a wide range of traits in the classes depending on their high or low values; finally, we revealed that one metric (fan-in) is the only metric that appears consistently to exhibit an 80/20 (Pareto) relationship.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Structural Complexity and Decay in FLOSS Systems: An Inter-Repository Study

    Get PDF
    Past software engineering literature has firmly established that software architectures and the associated code decay over time. Architectural decay is, potentially, a major issue in Free/Libre/Open Source Software (FLOSS) projects, since developers sporadically joining FLOSSprojects do not always have a clear understanding of the underlying architecture, and may break the overall conceptual structure by several small changes to the code base. This paper investigates whether the structure of a FLOSS system and its decay can also be influenced by the repository in which it is retained: specifically, two FLOSS repositories are studied to understand whether the complexity of the software structure in the sampled projects is comparable, or one repository hosts more complex systems than the other. It is also studied whether the effort to counteract this complexity is dependent on the repository, and the governance it gives to the hosted projects. The results of the paper are two-fold: on one side, it is shown that the repository hosting larger and more active projects presents more complex structures. On the other side, these larger and more complex systems benefit from more anti-regressive work to reduce this complexity
    corecore