4,280 research outputs found

    Evolution of statistical analysis in empirical software engineering research: Current state and steps forward

    Full text link
    Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.Comment: journal submission, 34 pages, 8 figure

    Data-driven linear decision rule approach for distributionally robust optimization of on-line signal control

    Get PDF
    We propose a two-stage, on-line signal control strategy for dynamic networks using a linear decision rule (LDR) approach and a distributionally robust optimization (DRO) technique. The first (off-line) stage formulates a LDR that maps real-time traffic data to optimal signal control policies. A DRO problem is solved to optimize the on-line performance of the LDR in the presence of uncertainties associated with the observed traffic states and ambiguity in their underlying distribution functions. We employ a data-driven calibration of the uncertainty set, which takes into account historical traffic data. The second (on-line) stage implements a very efficient linear decision rule whose performance is guaranteed by the off-line computation. We test the proposed signal control procedure in a simulation environment that is informed by actual traffic data obtained in Glasgow, and demonstrate its full potential in on-line operation and deployability on realistic networks, as well as its effectiveness in improving traffic

    Evaluation of Reliability Indices for Gas Turbines Based on the Johnson SB Distribution: Towards Practical Development

    Get PDF
    Recent advancements in computer engineering have provided effective solutions for processing and analyzing complex systems and big data. Consequently, the adjustment and standardization of this data play a crucial role in addressing issues related to the monitoring of industrial systems. In this study, we propose a reliability approach for gas turbines to identify and characterize their degradation using operational data. We introduce a method for adjusting turbine reliability data, which resolves the challenges associated with the nature of these operating data. This enables us to determine a mathematical function that models the relationships between turbine reliability parameters and evaluate the impact of reliability practices in terms of availability. Additionally, we determine the survival function and employ it as a lifespan distribution model by estimating the parameters of the Johnson SB function. Furthermore, we calculate the failure rates and mean time between good operations for this rotating machine under different operating conditions. The obtained results allow us to estimate the parameters of the distribution that best fit the turbine reliability data, which are validated through statistical and graphical tests. We assess the goodness-of-fit using mean square error and reliability tests such as Kolmogorov-Smirnov

    Multiple populations in Omega Centauri: a cluster analysis of spectroscopic data

    Full text link
    Omega Cen is composed of several stellar populations. Their history might allow us to reconstruct the evolution of this complex object. We performed a statistical cluster analysis on the large data set provided by Johnson and Pilachowski (2010). Stars in Omega Cen divide into three main groups. The metal-poor group includes about a third of the total. It shows a moderate O-Na anticorrelation, and similarly to other clusters, the O-poor second generation stars are more centrally concentrated than the O-rich first generation ones. This whole population is La-poor, with a pattern of abundances for n-capture elements which is very close to a scaled r-process one. The metal-intermediate group includes the majority of the cluster stars. This is a much more complex population, with an internal spread in the abundances of most elements. It shows an extreme O-Na anticorrelation, with a very numerous population of extremely O-poor and He-rich second generation stars. This second generation is very centrally concentrated. This whole population is La-rich, with a pattern of the abundances of n-capture elements that shows a strong contribution by the s-process. The spread in metallicity within this metal-intermediate population is not very large, and we might attribute it either to non uniformities of an originally very extended star forming region, or to some ability to retain a fraction of the ejecta of the core collapse SNe that exploded first, or both. As previously noticed, the metal-rich group has an Na-O correlation, rather than anticorrelation. There is evidence for the contribution of both massive stars ending their life as core-collapse SNe, and intermediate/low mass stars, producing the s-capture elements. Kinematics of this population suggests that it formed within the cluster rather than being accreted.Comment: Accepted for publication in Astronomy and Astrophysic

    Detection of atrial fibrillation episodes in long-term heart rhythm signals using a support vector machine

    Get PDF
    Atrial fibrillation (AF) is a serious heart arrhythmia leading to a significant increase of the risk for occurrence of ischemic stroke. Clinically, the AF episode is recognized in an electrocardiogram. However, detection of asymptomatic AF, which requires a long-term monitoring, is more efficient when based on irregularity of beat-to-beat intervals estimated by the heart rate (HR) features. Automated classification of heartbeats into AF and non-AF by means of the Lagrangian Support Vector Machine has been proposed. The classifier input vector consisted of sixteen features, including four coefficients very sensitive to beat-to-beat heart changes, taken from the fetal heart rate analysis in perinatal medicine. Effectiveness of the proposed classifier has been verified on the MIT-BIH Atrial Fibrillation Database. Designing of the LSVM classifier using very large number of feature vectors requires extreme computational efforts. Therefore, an original approach has been proposed to determine a training set of the smallest possible size that still would guarantee a high quality of AF detection. It enables to obtain satisfactory results using only 1.39% of all heartbeats as the training data. Post-processing stage based on aggregation of classified heartbeats into AF episodes has been applied to provide more reliable information on patient risk. Results obtained during the testing phase showed the sensitivity of 98.94%, positive predictive value of 98.39%, and classification accuracy of 98.86%.Web of Science203art. no. 76
    • …
    corecore