35,378 research outputs found
Faults in Linux 2.6
In August 2011, Linux entered its third decade. Ten years before, Chou et al.
published a study of faults found by applying a static analyzer to Linux
versions 1.0 through 2.4.1. A major result of their work was that the drivers
directory contained up to 7 times more of certain kinds of faults than other
directories. This result inspired numerous efforts on improving the reliability
of driver code. Today, Linux is used in a wider range of environments, provides
a wider range of services, and has adopted a new development and release model.
What has been the impact of these changes on code quality? To answer this
question, we have transported Chou et al.'s experiments to all versions of
Linux 2.6; released between 2003 and 2011. We find that Linux has more than
doubled in size during this period, but the number of faults per line of code
has been decreasing. Moreover, the fault rate of drivers is now below that of
other directories, such as arch. These results can guide further development
and research efforts for the decade to come. To allow updating these results as
Linux evolves, we define our experimental protocol and make our checkers
available
Assessing Code Authorship: The Case of the Linux Kernel
Code authorship is a key information in large-scale open source systems.
Among others, it allows maintainers to assess division of work and identify key
collaborators. Interestingly, open-source communities lack guidelines on how to
manage authorship. This could be mitigated by setting to build an empirical
body of knowledge on how authorship-related measures evolve in successful
open-source communities. Towards that direction, we perform a case study on the
Linux kernel. Our results show that: (a) only a small portion of developers (26
%) makes significant contributions to the code base; (b) the distribution of
the number of files per author is highly skewed --- a small group of top
authors (3 %) is responsible for hundreds of files, while most authors (75 %)
are responsible for at most 11 files; (c) most authors (62 %) have a specialist
profile; (d) authors with a high number of co-authorship connections tend to
collaborate with others with less connections.Comment: Accepted at 13th International Conference on Open Source Systems
(OSS). 12 page
Effort estimation of FLOSS projects: A study of the Linux kernel
This is the post-print version of the Article. The official published version can be accessed from the link below - Copyright @ 2011 SpringerEmpirical research on Free/Libre/Open Source Software (FLOSS) has shown that developers tend to cluster around two main roles: âcoreâ contributors differ from âperipheralâ developers in terms of a larger number of responsibilities and a higher productivity pattern. A further, cross-cutting characterization of developers could be achieved by associating developers with âtime slotsâ, and different patterns of activity and effort could be associated to such slots. Such analysis, if replicated, could be used not only to compare different FLOSS communities, and to evaluate their stability and maturity, but also to determine within projects, how the effort is distributed in a given period, and to estimate future needs with respect to key points in the software life-cycle (e.g., major releases). This study analyses the activity patterns within the Linux kernel project, at first focusing on the overall distribution of effort and activity within weeks and days; then, dividing each day into three 8-hour time slots, and focusing on effort and activity around major releases. Such analyses have the objective of evaluating effort, productivity and types of activity globally and around major releases. They enable a comparison of these releases and patterns of effort and activities with traditional software products and processes, and in turn, the identification of company-driven projects (i.e., working mainly during office hours) among FLOSS endeavors. The results of this research show that, overall, the effort within the Linux kernel community is constant (albeit at different levels) throughout the week, signalling the need of updated estimation models, different from those used in traditional 9amâ5pm, Monday to Friday commercial companies. It also becomes evident that the activity before a release is vastly different from after a release, and that the changes show an increase in code complexity in specific time slots (notably in the late night hours), which will later require additional maintenance efforts
On Benchmarking Embedded Linux Flash File Systems
Due to its attractive characteristics in terms of performance, weight and
power consumption, NAND flash memory became the main non volatile memory (NVM)
in embedded systems. Those NVMs also present some specific
characteristics/constraints: good but asymmetric I/O performance, limited
lifetime, write/erase granularity asymmetry, etc. Those peculiarities are
either managed in hardware for flash disks (SSDs, SD cards, USB sticks, etc.)
or in software for raw embedded flash chips. When managed in software, flash
algorithms and structures are implemented in a specific flash file system
(FFS). In this paper, we present a performance study of the most widely used
FFSs in embedded Linux: JFFS2, UBIFS,and YAFFS. We show some very particular
behaviors and large performance disparities for tested FFS operations such as
mounting, copying, and searching file trees, compression, etc.Comment: Embed With Linux, Lorient : France (2012
Empirical studies of open source evolution
Copyright @ 2008 Springer-VerlagThis chapter presents a sample of empirical studies of Open Source Software (OSS) evolution. According to these studies, the classical results from the studies of proprietary software evoltion, such as Lehmanâs laws of software evolution, might need to be revised, if not fully, at least in part, to account for the OSS observations. The book chapter also summarises what appears to be the empirical
status of each of Lehmanâs laws with respect to OSS and highlights the threads to
validity that frequently emerge in these empirical studies. The chapter also discusses
related topics for further research
Source File Set Search for Clone-and-Own Reuse Analysis
Clone-and-own approach is a natural way of source code reuse for software
developers. To assess how known bugs and security vulnerabilities of a cloned
component affect an application, developers and security analysts need to
identify an original version of the component and understand how the cloned
component is different from the original one. Although developers may record
the original version information in a version control system and/or directory
names, such information is often either unavailable or incomplete. In this
research, we propose a code search method that takes as input a set of source
files and extracts all the components including similar files from a software
ecosystem (i.e., a collection of existing versions of software packages). Our
method employs an efficient file similarity computation using b-bit minwise
hashing technique. We use an aggregated file similarity for ranking components.
To evaluate the effectiveness of this tool, we analyzed 75 cloned components in
Firefox and Android source code. The tool took about two hours to report the
original components from 10 million files in Debian GNU/Linux packages. Recall
of the top-five components in the extracted lists is 0.907, while recall of a
baseline using SHA-1 file hash is 0.773, according to the ground truth recorded
in the source code repositories.Comment: 14th International Conference on Mining Software Repositorie
- âŠ