3,390 research outputs found
Evolution of statistical analysis in empirical software engineering research: Current state and steps forward
Software engineering research is evolving and papers are increasingly based
on empirical data from a multitude of sources, using statistical tests to
determine if and to what degree empirical evidence supports their hypotheses.
To investigate the practices and trends of statistical analysis in empirical
software engineering (ESE), this paper presents a review of a large pool of
papers from top-ranked software engineering journals. First, we manually
reviewed 161 papers and in the second phase of our method, we conducted a more
extensive semi-automatic classification of papers spanning the years 2001--2015
and 5,196 papers. Results from both review steps was used to: i) identify and
analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well
as relevant trends in usage of specific statistical methods (e.g.,
nonparametric tests and effect size measures) and, ii) develop a conceptual
model for a statistical analysis workflow with suggestions on how to apply
different statistical methods as well as guidelines to avoid pitfalls. Lastly,
we confirm existing claims that current ESE practices lack a standard to report
practical significance of results. We illustrate how practical significance can
be discussed in terms of both the statistical analysis and in the
practitioner's context.Comment: journal submission, 34 pages, 8 figure
In and out of Madagascar : dispersal to peripheral islands, insular speciation and diversification of Indian Ocean daisy trees (Psiadia, Asteraceae)
This study was supported by the European Union’s HOTSPOTS Training Network (MEST-2005-020561)Madagascar is surrounded by archipelagos varying widely in origin, age and structure. Although small and geologically young, these archipelagos have accumulated disproportionate numbers of unique lineages in comparison to Madagascar, highlighting the role of waif-dispersal and rapid in situ diversification processes in generating endemic biodiversity. We reconstruct the evolutionary and biogeographical history of the genus Psiadia (Asteraceae), a plant genus with near equal numbers of species in Madagascar and surrounding islands. Analyzing patterns and processes of diversification, we explain species accumulation on peripheral islands and aim to offer new insights on the origin and potential causes for diversification in the Madagascar and Indian Ocean Islands biodiversity hotspot. Our results provide support for an African origin of the group, with strong support for non-monophyly. Colonization of the Mascarenes took place by two evolutionary distinct lineages from Madagascar, via two independent dispersal events, each unique for their spatial and temporal properties. Significant shifts in diversification rate followed regional expansion, resulting in co-occurring and phenotypically convergent species on high-elevation volcanic slopes. Like other endemic island lineages, Psiadia have been highly successful in dispersing to and radiating on isolated oceanic islands, typified by high habitat diversity and dynamic ecosystems fuelled by continued geological activity. Results stress the important biogeographical role for Rodrigues in serving as an outlying stepping stone from which regional colonization took place. We discuss how isolated volcanic islands contribute to regional diversity by generating substantial numbers of endemic species on short temporal scales. Factors pertaining to the mode and tempo of archipelago formation and its geographical isolation strongly govern evolutionary pathways available for species diversification, and the potential for successful diversification of dispersed lineages, therefore, appears highly dependent on the timing of arrival, as habitat and resource properties change dramatically over the course of oceanic island evolution.Publisher PDFPeer reviewe
Learning Large-Scale Bayesian Networks with the sparsebn Package
Learning graphical models from data is an important problem with wide
applications, ranging from genomics to the social sciences. Nowadays datasets
often have upwards of thousands---sometimes tens or hundreds of thousands---of
variables and far fewer samples. To meet this challenge, we have developed a
new R package called sparsebn for learning the structure of large, sparse
graphical models with a focus on Bayesian networks. While there are many
existing software packages for this task, this package focuses on the unique
setting of learning large networks from high-dimensional data, possibly with
interventions. As such, the methods provided place a premium on scalability and
consistency in a high-dimensional setting. Furthermore, in the presence of
interventions, the methods implemented here achieve the goal of learning a
causal network from data. Additionally, the sparsebn package is fully
compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure
General Semiparametric Shared Frailty Model Estimation and Simulation with frailtySurv
The R package frailtySurv for simulating and fitting semi-parametric shared
frailty models is introduced. Package frailtySurv implements semi-parametric
consistent estimators for a variety of frailty distributions, including gamma,
log-normal, inverse Gaussian and power variance function, and provides
consistent estimators of the standard errors of the parameters' estimators. The
parameters' estimators are asymptotically normally distributed, and therefore
statistical inference based on the results of this package, such as hypothesis
testing and confidence intervals, can be performed using the normal
distribution. Extensive simulations demonstrate the flexibility and correct
implementation of the estimator. Two case studies performed with publicly
available datasets demonstrate applicability of the package. In the Diabetic
Retinopathy Study, the onset of blindness is clustered by patient, and in a
large hard drive failure dataset, failure times are thought to be clustered by
the hard drive manufacturer and model
ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R
We introduce the C++ application and R package ranger. The software is a fast
implementation of random forests for high dimensional data. Ensembles of
classification, regression and survival trees are supported. We describe the
implementation, provide examples, validate the package with a reference
implementation, and compare runtime and memory usage with other
implementations. The new software proves to scale best with the number of
features, samples, trees, and features tried for splitting. Finally, we show
that ranger is the fastest and most memory efficient implementation of random
forests to analyze data on the scale of a genome-wide association study
Purposeful Searching for Citations of Scholarly Publications
Citation data contains the citations among scholarly publications. The data can be used to find relevant sources during research, identify emerging trends and research areas, compute metrics for comparing authors or journals, or for thematic clustering. Manual administration of citation data is limited due to the large number of publications. In this work, we hence lay the foundations for the automatic search for scientific citations. The unique characteristics are a purposeful search of citations for a specified set of publications (of e.g., an author or an institute). Therefore, search strategies will be developed and evaluated in this work in order to reduce the costs for the analysis of documents without citations to the given set of publications. In our experiments, for authors with more than 100 publications about 75 % of the citations were found. The purposeful strategy examined thereby only 1.5 % of the 120 million publications of the used data set
Using R-based VOStat as a low resolution spectrum analysis tool
We describe here an online software suite VOStat written mainly for the Virtual Observatory, a novel structure in which astronomers share terabyte scale data. Written mostly in the public-domain statistical computing language and environment R, it can do a variety of statistical analysis on multidimensional, multi-epoch data with errors.
Included are techniques which allow astronomers to start with multi-color data in the form of low-resolution spectra and select special kinds of sources in a variety of ways including color outliers. Here we describe the tool and demonstrate it with an example from Palomar-QUEST, a synoptic sky survey
Early Term Effects of rhBMP-2 on Pedicle Screw Fixation in a Sheep Model: Histomorphometric and Biomechanical Analyses
Background: The effects of recombinant human bone morphogenetic protein-2 (rhBMP-2) on pedicle screw pullout force and its potential to improve spinal fixation have not previously been investigated. rhBMP-2 on an absorbable collagen sponge (ACS) carrier was delivered in and around cannulated and fenestrated pedicle screws in a sheep lumbar spine instability model. Two control groups (empty screw and ACS with buffer) were also evaluated. We hypothesized that rhBMP-2 could stimulate bone growth in and around the cannulated and fenestrated pedicle screws to improve early bone purchase.
Methods: Eight skeletally mature sheep underwent destabilizing laminectomies at L2–L3 and L4–L5 followed by stabilization with pedicle screw and rod constructs. An ACS carrier was used to deliver 0.15 mg of rhBMP-2 within and around the cannulated and fenestrated titanium pedicle screws. Biomechanics and histomorphometry were used to evaluate the early term results at 6 and 12 postoperative weeks.
Results: rhBMP-2 was unable to improve bony purchase of the cannulated and fenestrated pedicle screws compared to both control groups. Although rhBMP-2 groups had pullout forces that were less than both control groups, both rhBMP-2 groups had pullout force values exceeding 2,000 N, which was comparable to previously published results for unmodified pedicle screws. Significant differences in the percentages of bone in peri-screw tissues was not observed amongst the four treatment groups. Microradiography and quantitative histomorphometry showed that at 6 weeks, rhBMP-2 induced peri-screw remodeling regions containing peri-implant bone which was hypodense with respect to surrounding native trabeculae. A moderate correlation between biomechanical pullout variables and histomorphometry data was observed.
Conclusions: The design of the cannulated and fenestrated pedicle screw was able to facilitate new bone formation to achieve high pullout forces. However, delivery of rhBMP-2 should be carefully controlled to prevent excessive bone remodeling which could cause early screw loosening
- …