724 research outputs found
Recommended from our members
Similarities, challenges and opportunities of wikipedia content and open source projects
Copyright @ 2012 John Wiley & Sons, Ltd.Several years of research and evidence have demonstrated that Open Source Software (OSS) portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to
increasingly act as a storage for abandoned projects, and researchers and practitioners should try and point out how to take advantage of such content. Similarly, other online content portals (like Wikipedia) could be harvested for valuable content. In this paper we argue that, even with differences in the requested expertise, many projects reliant on content and contributions by users undergo a similar evolution, and follow similar patterns: when a project fails to attract contributors, it appears to be not evolving, or abandoned. Far from a negative finding, even those projects could provide valuable content that should be harvested and identified based on common characteristics: by using the attributes of “usefulness” and “modularity” we isolate valuable content in both Wikipedia pages and OSS projects
Structural Complexity and Decay in FLOSS Systems: An Inter-Repository Study
Past software engineering literature has firmly established that software architectures and the associated code decay over time. Architectural decay is, potentially, a major issue in Free/Libre/Open Source Software (FLOSS) projects, since developers sporadically joining FLOSS projects do not always have a clear understanding of the underlying architecture, and may break the overall conceptual structure by several small changes to the code base.
This paper investigates whether the structure of a FLOSS system and its decay can also be influenced by the repository in which it is retained: specifically,
two FLOSS repositories are studied to understand whether the complexity of the software structure in the sampled projects is comparable, or one repository hosts more complex systems than the other. It is also studied
whether the effort to counteract this complexity is dependent on the repository, and the governance it gives to the hosted projects.
The results of the paper are two-fold: on one side, it is shown that the repository hosting larger and more active projects presents more complex structures. On the other side, these larger and more complex systems benefit
from more anti-regressive work to reduce this complexity
Recommended from our members
An empirical study of package coupling in Java open-source
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Excessive coupling between object-oriented classes in systems is generally acknowledged as harmful and is recognised as a maintenance problem that can result in a higher propensity for faults in systems and a „stored up‟ future problem. Characterisation and understanding coupling at different levels of abstraction is therefore important for both the project manager and developer both of whom have a vested interest in software quality. In this Thesis, coupling trends are empirically investigated over multiple versions of seven Java open-source systems (OSS). The first investigation explores the trends in longitudinal changes to open-source systems given by six coupling metrics. Coupling trends are then explored from the perspective of: the relationship between removed classes and their coupling with other classes in the same package; the relationships between coupling and 'warnings’ in packages and the time interval between versions in Java OSS; the relationship between some of these coupling metrics are also explored. Finally, the existence of an 80/20 rule for the coupling metrics is inspected. Results suggest that developer activity comprises a set of high and low periods (peak and trough‟ effect) evident as a system evolves. Findings also demonstrate that addition of coupling may have beneficial effects on a system, particularly if they are added as new functionality through the package Java feature. The fan-in and fan-out coupling metrics reveal particular features and exhibited a wide range of traits in the classes depending on their high or low values; finally, we revealed that one metric (fan-in) is the only metric that appears consistently to exhibit an 80/20 (Pareto) relationship
IDENTIFICATION AND QUANTIFICATION OF VARIABILITY MEASURES AFFECTING CODE REUSABILITY IN OPEN SOURCE ENVIRONMENT
Open source software (OSS) is one of the emerging areas in software engineering, and
is gaining the interest of the software development community. OSS was started as a
movement, and for many years software developers contributed to it as their hobby
(non commercial purpose). Now, OSS components are being reused in CBSD
(commercial purpose). However, recently, the use of OSS in SPL is envisioned
recently by software engineering researchers, thus bringing it into a new arena. Being
an emerging research area, it demands exploratory study to explore the dimensions of
this phenomenon. Furthermore, there is a need to assess the reusability of OSS which
is the focal point of these disciplines (CBSE, SPL, and OSS). In this research, a mixed
method based approach is employed which is specifically 'partially mixed sequential
dominant study'. It involves both qualitative (interviews) and quantitative phases
(survey and experiment). During the qualitative phase seven respondents were
involved, sample size of survey was 396, and three experiments were conducted. The
main contribution of this study is results of exploration of the phenomenon 'reuse of
OSS in reuse intensive software development'. The findings include 7 categories and
39 dimensions. One of the dimension factors affecting reusability was carried to the
quantitative phase (survey and experiment). On basis of the findings, proposal for
reusability attribute model was presented at class and package level. Variability is one
of the newly identified attribute of reusability. A comprehensive theoretical analysis
of variability implementation mechanisms is conducted to propose metrics for its
assessment. The reusability attribute model is validated by statistical analysis of I 03
classes and 77 packages. An evolutionary reusability analysis of two open source
software was conducted, where different versions of software are analyzed for their
reusability. The results show a positive correlation between variability and reusability
at package level and validate the other identified attributes. The results would be
helpful to conduct further studies in this area
Identification-method research for open-source software ecosystems
In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework
An Empirical Study on the Joint Impact of Feature Selection and Data Re-sampling on Imbalance Classification
In predictive tasks, real-world datasets often present di erent degrees of imbalanced (i.e., long-tailed or skewed) distributions.
While the majority (the head or the most frequent) classes have su cient samples, the minority (the tail or
the less frequent or rare) classes can be under-represented by a rather limited number of samples. Data pre-processing
has been shown to be very e ective in dealing with such problems. On one hand, data re-sampling is a common
approach to tackling class imbalance. On the other hand, dimension reduction, which reduces the feature space, is a
conventional technique for reducing noise and inconsistencies in a dataset. However, the possible synergy between
feature selection and data re-sampling for high-performance imbalance classification has rarely been investigated before.
To address this issue, we carry out a comprehensive empirical study on the joint influence of feature selection and
re-sampling on two-class imbalance classification. Specifically, we study the performance of two opposite pipelines
for imbalance classification by applying feature selection before or after data re-sampling. We conduct a large number
of experiments, with a total of 9225 tests, on 52 publicly available datasets, using 9 feature selection methods, 6 resampling
approaches for class imbalance learning, and 3 well-known classification algorithms. Experimental results
show that there is no constant winner between the two pipelines; thus both of them should be considered to derive
the best performing model for imbalance classification. We find that the performance of an imbalance classification
model not only depends on the classifier adopted and the ratio between the number of majority and minority samples,
but also depends on the ratio between the number of samples and features. Overall, this study should provide new
reference value for researchers and practitioners in imbalance learning.TIN2017-89517-
Successful Reuse of Software Components: A Report from the Open Source Perspective
A promising way of software reuse is Component-Based
Software Development (CBSD). There is an increasing number of OSS
products available that can be freely used in product development. How-
ever, OSS communities themselves have not yet taken full advantage of
the “reuse mechanism”. Many OSS projects duplicate effort and code,
even when sharing the same application domain and topic. One suc-
cessful counter-example is the FFMpeg multimedia project, since several
of its components are widely and consistently reused into other OSS
projects. This paper documents the history of the libavcodec library
of components from the FFMpeg project, which at present is reused in
more than 140 OSS projects. Most of the recipients use it as a black-
box component, although a number of OSS projects keep a copy of it in
their repositories, and modify it as such. In both cases, we argue that
libavcodec is a successful example of reusable OSS library of compo-
nents
An empirical study of package coupling in Java open-source
Excessive coupling between object-oriented classes in systems is generally acknowledged as harmful and is recognised as a maintenance problem that can result in a higher propensity for faults in systems and a „stored up‟ future problem. Characterisation and understanding coupling at different levels of abstraction is therefore important for both the project manager and developer both of whom have a vested interest in software quality. In this Thesis, coupling trends are empirically investigated over multiple versions of seven Java open-source systems (OSS). The first investigation explores the trends in longitudinal changes to open-source systems given by six coupling metrics. Coupling trends are then explored from the perspective of: the relationship between removed classes and their coupling with other classes in the same package; the relationships between coupling and 'warnings’ in packages and the time interval between versions in Java OSS; the relationship between some of these coupling metrics are also explored. Finally, the existence of an 80/20 rule for the coupling metrics is inspected. Results suggest that developer activity comprises a set of high and low periods (peak and trough‟ effect) evident as a system evolves. Findings also demonstrate that addition of coupling may have beneficial effects on a system, particularly if they are added as new functionality through the package Java feature. The fan-in and fan-out coupling metrics reveal particular features and exhibited a wide range of traits in the classes depending on their high or low values; finally, we revealed that one metric (fan-in) is the only metric that appears consistently to exhibit an 80/20 (Pareto) relationship.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Structural Complexity and Decay in FLOSS Systems: An Inter-Repository Study
Past software engineering literature has firmly established that software architectures and the associated code decay over time. Architectural decay is, potentially, a major issue in Free/Libre/Open Source Software (FLOSS) projects, since developers sporadically joining FLOSSprojects do not always have a clear understanding of the underlying architecture, and may break the overall conceptual structure by several small changes to the code base.
This paper investigates whether the structure of a FLOSS system and its decay can also be influenced by the repository
in which it is retained: specifically, two FLOSS repositories are studied to understand whether the complexity of the software structure in the sampled projects is comparable, or one repository hosts more complex systems than the other. It is also studied whether the effort to counteract this
complexity is dependent on the repository, and the governance it gives to the hosted projects.
The results of the paper are two-fold: on one side, it is shown that the repository hosting larger and more active projects presents more complex structures. On the other side, these larger and more complex systems benefit from more anti-regressive work to reduce this complexity
- …