3,390 research outputs found

    Evolution of statistical analysis in empirical software engineering research: Current state and steps forward

    Full text link
    Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.Comment: journal submission, 34 pages, 8 figure

    In and out of Madagascar : dispersal to peripheral islands, insular speciation and diversification of Indian Ocean daisy trees (Psiadia, Asteraceae)

    Get PDF
    This study was supported by the European Union’s HOTSPOTS Training Network (MEST-2005-020561)Madagascar is surrounded by archipelagos varying widely in origin, age and structure. Although small and geologically young, these archipelagos have accumulated disproportionate numbers of unique lineages in comparison to Madagascar, highlighting the role of waif-dispersal and rapid in situ diversification processes in generating endemic biodiversity. We reconstruct the evolutionary and biogeographical history of the genus Psiadia (Asteraceae), a plant genus with near equal numbers of species in Madagascar and surrounding islands. Analyzing patterns and processes of diversification, we explain species accumulation on peripheral islands and aim to offer new insights on the origin and potential causes for diversification in the Madagascar and Indian Ocean Islands biodiversity hotspot. Our results provide support for an African origin of the group, with strong support for non-monophyly. Colonization of the Mascarenes took place by two evolutionary distinct lineages from Madagascar, via two independent dispersal events, each unique for their spatial and temporal properties. Significant shifts in diversification rate followed regional expansion, resulting in co-occurring and phenotypically convergent species on high-elevation volcanic slopes. Like other endemic island lineages, Psiadia have been highly successful in dispersing to and radiating on isolated oceanic islands, typified by high habitat diversity and dynamic ecosystems fuelled by continued geological activity. Results stress the important biogeographical role for Rodrigues in serving as an outlying stepping stone from which regional colonization took place. We discuss how isolated volcanic islands contribute to regional diversity by generating substantial numbers of endemic species on short temporal scales. Factors pertaining to the mode and tempo of archipelago formation and its geographical isolation strongly govern evolutionary pathways available for species diversification, and the potential for successful diversification of dispersed lineages, therefore, appears highly dependent on the timing of arrival, as habitat and resource properties change dramatically over the course of oceanic island evolution.Publisher PDFPeer reviewe

    Learning Large-Scale Bayesian Networks with the sparsebn Package

    Get PDF
    Learning graphical models from data is an important problem with wide applications, ranging from genomics to the social sciences. Nowadays datasets often have upwards of thousands---sometimes tens or hundreds of thousands---of variables and far fewer samples. To meet this challenge, we have developed a new R package called sparsebn for learning the structure of large, sparse graphical models with a focus on Bayesian networks. While there are many existing software packages for this task, this package focuses on the unique setting of learning large networks from high-dimensional data, possibly with interventions. As such, the methods provided place a premium on scalability and consistency in a high-dimensional setting. Furthermore, in the presence of interventions, the methods implemented here achieve the goal of learning a causal network from data. Additionally, the sparsebn package is fully compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure

    General Semiparametric Shared Frailty Model Estimation and Simulation with frailtySurv

    Get PDF
    The R package frailtySurv for simulating and fitting semi-parametric shared frailty models is introduced. Package frailtySurv implements semi-parametric consistent estimators for a variety of frailty distributions, including gamma, log-normal, inverse Gaussian and power variance function, and provides consistent estimators of the standard errors of the parameters' estimators. The parameters' estimators are asymptotically normally distributed, and therefore statistical inference based on the results of this package, such as hypothesis testing and confidence intervals, can be performed using the normal distribution. Extensive simulations demonstrate the flexibility and correct implementation of the estimator. Two case studies performed with publicly available datasets demonstrate applicability of the package. In the Diabetic Retinopathy Study, the onset of blindness is clustered by patient, and in a large hard drive failure dataset, failure times are thought to be clustered by the hard drive manufacturer and model

    ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

    Get PDF
    We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study

    Purposeful Searching for Citations of Scholarly Publications

    Get PDF
    Citation data contains the citations among scholarly publications. The data can be used to find relevant sources during research, identify emerging trends and research areas, compute metrics for comparing authors or journals, or for thematic clustering. Manual administration of citation data is limited due to the large number of publications. In this work, we hence lay the foundations for the automatic search for scientific citations. The unique characteristics are a purposeful search of citations for a specified set of publications (of e.g., an author or an institute). Therefore, search strategies will be developed and evaluated in this work in order to reduce the costs for the analysis of documents without citations to the given set of publications. In our experiments, for authors with more than 100 publications about 75 % of the citations were found. The purposeful strategy examined thereby only 1.5 % of the 120 million publications of the used data set

    Using R-based VOStat as a low resolution spectrum analysis tool

    Get PDF
    We describe here an online software suite VOStat written mainly for the Virtual Observatory, a novel structure in which astronomers share terabyte scale data. Written mostly in the public-domain statistical computing language and environment R, it can do a variety of statistical analysis on multidimensional, multi-epoch data with errors. Included are techniques which allow astronomers to start with multi-color data in the form of low-resolution spectra and select special kinds of sources in a variety of ways including color outliers. Here we describe the tool and demonstrate it with an example from Palomar-QUEST, a synoptic sky survey

    Early Term Effects of rhBMP-2 on Pedicle Screw Fixation in a Sheep Model: Histomorphometric and Biomechanical Analyses

    Get PDF
    Background: The effects of recombinant human bone morphogenetic protein-2 (rhBMP-2) on pedicle screw pullout force and its potential to improve spinal fixation have not previously been investigated. rhBMP-2 on an absorbable collagen sponge (ACS) carrier was delivered in and around cannulated and fenestrated pedicle screws in a sheep lumbar spine instability model. Two control groups (empty screw and ACS with buffer) were also evaluated. We hypothesized that rhBMP-2 could stimulate bone growth in and around the cannulated and fenestrated pedicle screws to improve early bone purchase. Methods: Eight skeletally mature sheep underwent destabilizing laminectomies at L2–L3 and L4–L5 followed by stabilization with pedicle screw and rod constructs. An ACS carrier was used to deliver 0.15 mg of rhBMP-2 within and around the cannulated and fenestrated titanium pedicle screws. Biomechanics and histomorphometry were used to evaluate the early term results at 6 and 12 postoperative weeks. Results: rhBMP-2 was unable to improve bony purchase of the cannulated and fenestrated pedicle screws compared to both control groups. Although rhBMP-2 groups had pullout forces that were less than both control groups, both rhBMP-2 groups had pullout force values exceeding 2,000 N, which was comparable to previously published results for unmodified pedicle screws. Significant differences in the percentages of bone in peri-screw tissues was not observed amongst the four treatment groups. Microradiography and quantitative histomorphometry showed that at 6 weeks, rhBMP-2 induced peri-screw remodeling regions containing peri-implant bone which was hypodense with respect to surrounding native trabeculae. A moderate correlation between biomechanical pullout variables and histomorphometry data was observed. Conclusions: The design of the cannulated and fenestrated pedicle screw was able to facilitate new bone formation to achieve high pullout forces. However, delivery of rhBMP-2 should be carefully controlled to prevent excessive bone remodeling which could cause early screw loosening

    Facility Layout Planning and Job Shop Scheduling – A survey

    Get PDF
    corecore