548 research outputs found
Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions
Improvements in sequencing technologies and reduced experimental costs have
resulted in a vast number of studies generating high-throughput data. Although
the number of methods to analyze these "omics" data has also increased,
computational complexity and lack of documentation hinder researchers from
analyzing their high-throughput data to its true potential. In this chapter we
detail our data-driven, transkingdom network (TransNet) analysis protocol to
integrate and interrogate multi-omics data. This systems biology approach has
allowed us to successfully identify important causal relationships between
different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of
data
Automatic Detection of Public Development Projects in Large Open Source Ecosystems: An Exploratory Study on GitHub
Hosting over 10 million of software projects, GitHub is one of the most
important data sources to study behavior of developers and software projects.
However, with the increase of the size of open source datasets, the potential
threats to mining these datasets have also grown. As the dataset grows, it
becomes gradually unrealistic for human to confirm quality of all samples. Some
studies have investigated this problem and provided solutions to avoid threats
in sample selection, but some of these solutions (e.g., finding development
projects) require human intervention. When the amount of data to be processed
increases, these semi-automatic solutions become less useful since the effort
in need for human intervention is far beyond affordable. To solve this problem,
we investigated the GHTorrent dataset and proposed a method to detect public
development projects. The results show that our method can effectively improve
the sample selection process in two ways: (1) We provide a simple model to
automatically select samples (with 0.827 precision and 0.947 recall); (2) We
also offer a complex model to help researchers carefully screen samples (with
63.2% less effort than manually confirming all samples, and can achieve 0.926
precision and 0.959 recall).Comment: Accepted by the SEKE2018 Conferenc
A Quantitative Study of Java Software Buildability
Researchers, students and practitioners often encounter a situation when the
build process of a third-party software system fails. In this paper, we aim to
confirm this observation present mainly as anecdotal evidence so far. Using a
virtual environment simulating a programmer's one, we try to fully
automatically build target archives from the source code of over 7,200 open
source Java projects. We found that more than 38% of builds ended in failure.
Build log analysis reveals the largest portion of errors are
dependency-related. We also conduct an association study of factors affecting
build success
Evaluating Maintainability Prejudices with a Large-Scale Study of Open-Source Projects
Exaggeration or context changes can render maintainability experience into
prejudice. For example, JavaScript is often seen as least elegant language and
hence of lowest maintainability. Such prejudice should not guide decisions
without prior empirical validation. We formulated 10 hypotheses about
maintainability based on prejudices and test them in a large set of open-source
projects (6,897 GitHub repositories, 402 million lines, 5 programming
languages). We operationalize maintainability with five static analysis
metrics. We found that JavaScript code is not worse than other code, Java code
shows higher maintainability than C# code and C code has longer methods than
other code. The quality of interface documentation is better in Java code than
in other code. Code developed by teams is not of higher and large code bases
not of lower maintainability. Projects with high maintainability are not more
popular or more often forked. Overall, most hypotheses are not supported by
open-source data.Comment: 20 page
Recommended from our members
A Large-Scale Study of Modern Code Review and Security in Open Source Projects.
- …