368 research outputs found
Coordination Implications of Software Coupling in Open Source Projects
The effect of software coupling on the quality of software has been studied quite widely since the seminal paper on software modularity by Parnas [1]. However, the effect of the increase in software coupling on the coordination of the developers has not been researched as much. In commercial software development environments there normally are coordination mechanisms in place to manage the coordination requirements due to software dependencies. But, in the case of Open Source software such coordination mechanisms are harder to implement, as the developers tend to rely solely on electronic means of communication. Hence, an understanding of the changing coordination requirements is essential to the management of an Open Source project. In this paper we study the effect of changes in software coupling on the coordination requirements in a case study of a popular Open Source project called JBoss
Robots that can adapt like animals
As robots leave the controlled environments of factories to autonomously
function in more complex, natural environments, they will have to respond to
the inevitable fact that they will become damaged. However, while animals can
quickly adapt to a wide variety of injuries, current robots cannot "think
outside the box" to find a compensatory behavior when damaged: they are limited
to their pre-specified self-sensing abilities, can diagnose only anticipated
failure modes, and require a pre-programmed contingency plan for every type of
potential damage, an impracticality for complex robots. Here we introduce an
intelligent trial and error algorithm that allows robots to adapt to damage in
less than two minutes, without requiring self-diagnosis or pre-specified
contingency plans. Before deployment, a robot exploits a novel algorithm to
create a detailed map of the space of high-performing behaviors: This map
represents the robot's intuitions about what behaviors it can perform and their
value. If the robot is damaged, it uses these intuitions to guide a
trial-and-error learning algorithm that conducts intelligent experiments to
rapidly discover a compensatory behavior that works in spite of the damage.
Experiments reveal successful adaptations for a legged robot injured in five
different ways, including damaged, broken, and missing legs, and for a robotic
arm with joints broken in 14 different ways. This new technique will enable
more robust, effective, autonomous robots, and suggests principles that animals
may use to adapt to injury
Understanding the Sources of Variation in Software Inspections
In a previous experiment, we determined how various changes in three
structural elements of the software inspection process (team size, and
number and sequencing of session), altered effectiveness and interval.
our results showed that such changes did not significantly influence the
defect detection reate, but that certain combinations of changes
dramatically increased the inspection interval.
We also observed a large amount of unexplained variance in the data,
indicating that other factors much be affecting inspection performance. The
nature and extent of these other factos now have to be determined to ensure
that they had not biased our earlier results. Also, identifying these
other factors might suggest additional ways to improve the efficiency of
inspection.
Acting on the hypothesis that the "inputs" into the inspection process
(reviewers, authors, and code units) were significant sources of variation,
we modeled their effects on inspection performance. We found that they
were responsible for much more variation in defect detection than was
process structure. This leads us to conclude that better defect detection
techniques, not better process structures, at the key to improving
inspection effectiveness.
The combined effects of process inputs and process structure on the
inspection interval accounted for only a small percentage of the variance
in inspection interval. Therefore, there still remain other factors which
need to be identified.
(Also cross-referenced as UMIACS-TR-97-22
Sequential design of computer experiments for the estimation of a probability of failure
This paper deals with the problem of estimating the volume of the excursion
set of a function above a given threshold,
under a probability measure on that is assumed to be known. In
the industrial world, this corresponds to the problem of estimating a
probability of failure of a system. When only an expensive-to-simulate model of
the system is available, the budget for simulations is usually severely limited
and therefore classical Monte Carlo methods ought to be avoided. One of the
main contributions of this article is to derive SUR (stepwise uncertainty
reduction) strategies from a Bayesian-theoretic formulation of the problem of
estimating a probability of failure. These sequential strategies use a Gaussian
process model of and aim at performing evaluations of as efficiently as
possible to infer the value of the probability of failure. We compare these
strategies to other strategies also based on a Gaussian process model for
estimating a probability of failure.Comment: This is an author-generated postprint version. The published version
is available at http://www.springerlink.co
Bayesian optimization for materials design
We introduce Bayesian optimization, a technique developed for optimizing
time-consuming engineering simulations and for fitting machine learning models
on large datasets. Bayesian optimization guides the choice of experiments
during materials design and discovery to find good material designs in as few
experiments as possible. We focus on the case when materials designs are
parameterized by a low-dimensional vector. Bayesian optimization is built on a
statistical technique called Gaussian process regression, which allows
predicting the performance of a new design based on previously tested designs.
After providing a detailed introduction to Gaussian process regression, we
introduce two Bayesian optimization methods: expected improvement, for design
problems with noise-free evaluations; and the knowledge-gradient method, which
generalizes expected improvement and may be used in design problems with noisy
evaluations. Both methods are derived using a value-of-information analysis,
and enjoy one-step Bayes-optimality
Effort estimation of FLOSS projects: A study of the Linux kernel
This is the post-print version of the Article. The official published version can be accessed from the link below - Copyright @ 2011 SpringerEmpirical research on Free/Libre/Open Source Software (FLOSS) has shown that developers tend to cluster around two main roles: “core” contributors differ from “peripheral” developers in terms of a larger number of responsibilities and a higher productivity pattern. A further, cross-cutting characterization of developers could be achieved by associating developers with “time slots”, and different patterns of activity and effort could be associated to such slots. Such analysis, if replicated, could be used not only to compare different FLOSS communities, and to evaluate their stability and maturity, but also to determine within projects, how the effort is distributed in a given period, and to estimate future needs with respect to key points in the software life-cycle (e.g., major releases). This study analyses the activity patterns within the Linux kernel project, at first focusing on the overall distribution of effort and activity within weeks and days; then, dividing each day into three 8-hour time slots, and focusing on effort and activity around major releases. Such analyses have the objective of evaluating effort, productivity and types of activity globally and around major releases. They enable a comparison of these releases and patterns of effort and activities with traditional software products and processes, and in turn, the identification of company-driven projects (i.e., working mainly during office hours) among FLOSS endeavors. The results of this research show that, overall, the effort within the Linux kernel community is constant (albeit at different levels) throughout the week, signalling the need of updated estimation models, different from those used in traditional 9am–5pm, Monday to Friday commercial companies. It also becomes evident that the activity before a release is vastly different from after a release, and that the changes show an increase in code complexity in specific time slots (notably in the late night hours), which will later require additional maintenance efforts
Bayesian Optimization Approaches for Massively Multi-modal Problems
The optimization of massively multi-modal functions is a challenging task, particularly for problems where the search space can lead the op- timization process to local optima. While evolutionary algorithms have been extensively investigated for these optimization problems, Bayesian Optimization algorithms have not been explored to the same extent. In this paper, we study the behavior of Bayesian Optimization as part of a hybrid approach for solving several massively multi-modal functions. We use well-known benchmarks and metrics to evaluate how different variants of Bayesian Optimization deal with multi-modality.TIN2016-78365-
Cisplatin-resistant triple-negative breast cancer subtypes: multiple mechanisms of resistance.
BACKGROUND: Understanding mechanisms underlying specific chemotherapeutic responses in subtypes of cancer may improve identification of treatment strategies most likely to benefit particular patients. For example, triple-negative breast cancer (TNBC) patients have variable response to the chemotherapeutic agent cisplatin. Understanding the basis of treatment response in cancer subtypes will lead to more informed decisions about selection of treatment strategies.
METHODS: In this study we used an integrative functional genomics approach to investigate the molecular mechanisms underlying known cisplatin-response differences among subtypes of TNBC. To identify changes in gene expression that could explain mechanisms of resistance, we examined 102 evolutionarily conserved cisplatin-associated genes, evaluating their differential expression in the cisplatin-sensitive, basal-like 1 (BL1) and basal-like 2 (BL2) subtypes, and the two cisplatin-resistant, luminal androgen receptor (LAR) and mesenchymal (M) subtypes of TNBC.
RESULTS: We found 20 genes that were differentially expressed in at least one subtype. Fifteen of the 20 genes are associated with cell death and are distributed among all TNBC subtypes. The less cisplatin-responsive LAR and M TNBC subtypes show different regulation of 13 genes compared to the more sensitive BL1 and BL2 subtypes. These 13 genes identify a variety of cisplatin-resistance mechanisms including increased transport and detoxification of cisplatin, and mis-regulation of the epithelial to mesenchymal transition.
CONCLUSIONS: We identified gene signatures in resistant TNBC subtypes indicative of mechanisms of cisplatin. Our results indicate that response to cisplatin in TNBC has a complex foundation based on impact of treatment on distinct cellular pathways. We find that examination of expression data in the context of heterogeneous data such as drug-gene interactions leads to a better understanding of mechanisms at work in cancer therapy response
The Comparative Toxicogenomics Database: update 2011
The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the interaction of environmental chemicals with gene products, and their effects on human health. Biocurators at CTD manually curate a triad of chemical–gene, chemical–disease and gene–disease relationships from the literature. These core data are then integrated to construct chemical–gene–disease networks and to predict many novel relationships using different types of associated data. Since 2009, we dramatically increased the content of CTD to 1.4 million chemical–gene–disease data points and added many features, statistical analyses and analytical tools, including GeneComps and ChemComps (to find comparable genes and chemicals that share toxicogenomic profiles), enriched Gene Ontology terms associated with chemicals, statistically ranked chemical–disease inferences, Venn diagram tools to discover overlapping and unique attributes of any set of chemicals, genes or disease, and enhanced gene pathway data content, among other features. Together, this wealth of expanded chemical–gene–disease data continues to help users generate testable hypotheses about the molecular mechanisms of environmental diseases. CTD is freely available at http://ctd.mdibl.org
- …