Search CORE

6,746 research outputs found

Evolution of statistical analysis in empirical software engineering research: Current state and steps forward

Author: Feldt Robert
Furia Carlo A.
Gren Lucas
Huang Ziwei
Neto Francisco Gomes de Oliveira
Torkar Richard
Publication venue
Publication date: 01/01/2019
Field of study

Software engineering research is evolving and papers are increasingly based on empirical data from a multitude of sources, using statistical tests to determine if and to what degree empirical evidence supports their hypotheses. To investigate the practices and trends of statistical analysis in empirical software engineering (ESE), this paper presents a review of a large pool of papers from top-ranked software engineering journals. First, we manually reviewed 161 papers and in the second phase of our method, we conducted a more extensive semi-automatic classification of papers spanning the years 2001--2015 and 5,196 papers. Results from both review steps was used to: i) identify and analyze the predominant practices in ESE (e.g., using t-test or ANOVA), as well as relevant trends in usage of specific statistical methods (e.g., nonparametric tests and effect size measures) and, ii) develop a conceptual model for a statistical analysis workflow with suggestions on how to apply different statistical methods as well as guidelines to avoid pitfalls. Lastly, we confirm existing claims that current ESE practices lack a standard to report practical significance of results. We illustrate how practical significance can be discussed in terms of both the statistical analysis and in the practitioner's context.Comment: journal submission, 34 pages, 8 figure

arXiv.org e-Print Archive

Chalmers Research

Empirical Likelihood for Regression Discontinuity Design

Author: Ke-Li Xu
Taisuke Otsu
Publication venue
Publication date
Field of study

This paper proposes empirical likelihood based inference methods for causal effects identified from regression discontinuity designs. We consider both the sharp and fuzzy regression discontinuity designs and treat the regression functions as nonparametric. The proposed inference procedures do not require asymptotic variance estimation and the confidence sets have natural shapes, unlike the conventional Wald-type method. These features are illustrated by simulations and an empirical example which evaluates the effect of class size on pupils' scholastic achievements. Bandwidth selection methods, higher-order properties, and extensions to incorporate additional covariates and parametric functional forms are also discussed.Empirical likelihood, Nonparametric methods, Regression discontinuity design, Treatment effect

Research Papers in Economics

Short-term load forecasting based on a semi-parametric additive model

Author: Rob Hyndman
Shu Fan
Publication venue
Publication date
Field of study

Short-term load forecasting is an essential instrument in power system planning, operation and control. Many operating decisions are based on load forecasts, such as dispatch scheduling of generating capacity, reliability analysis, and maintenance planning for the generators. Overestimation of electricity demand will cause a conservative operation, which leads to the start-up of too many units or excessive energy purchase, thereby supplying an unnecessary level of reserve. On the contrary, underestimation may result in a risky operation, with insufficient preparation of spinning reserve, causing the system to operate in a vulnerable region to the disturbance. In this paper, semi-parametric additive models are proposed to estimate the relationships between demand and the driver variables. Specifically, the inputs for these models are calendar variables, lagged actual demand observations and historical and forecast temperature traces for one or more sites in the target power system. In addition to point forecasts, prediction intervals are also estimated using a modified bootstrap method suitable for the complex seasonality seen in electricity demand data. The proposed methodology has been used to forecast the half-hourly electricity demand for up to seven days ahead for power systems in the Australian National Electricity Market. The performance of the methodology is validated via out-of-sample experiments with real data from the power system, as well as through on-site implementation by the system operator.Short-term load forecasting, additive model, time series, forecast distribution

Research Papers in Economics

Regression Discontinuity Designs Using Covariates

Author: Calonico Sebastian
Cattaneo Matias D.
Farrell Max H.
Titiunik Rocio
Publication venue
Publication date: 11/09/2018
Field of study

We study regression discontinuity designs when covariates are included in the estimation. We examine local polynomial estimators that include discrete or continuous covariates in an additive separable way, but without imposing any parametric restrictions on the underlying population regression functions. We recommend a covariate-adjustment approach that retains consistency under intuitive conditions, and characterize the potential for estimation and inference improvements. We also present new covariate-adjusted mean squared error expansions and robust bias-corrected inference procedures, with heteroskedasticity-consistent and cluster-robust standard errors. An empirical illustration and an extensive simulation study is presented. All methods are implemented in \texttt{R} and \texttt{Stata} software packages

arXiv.org e-Print Archive

Princeton University Open Access Repository

Hybrid model using logit and nonparametric methods for predicting micro-entity failure

Author: Blanco Oliver Antonio Jesús
Irimia Diéguez Ana Isabel
Oliver Alfonso María Dolores
Vázquez Cueto María José
Publication venue: LLC “Consulting Publishing Company “Business Perspectives”
Publication date: 01/01/2016
Field of study

Following the calls from literature on bankruptcy, a parsimonious hybrid bankruptcy model is developed in this paper by combining parametric and non-parametric approaches.To this end, the variables with the highest predictive power to detect bankruptcy are selected using logistic regression (LR). Subsequently, alternative non-parametric methods (Multilayer Perceptron, Rough Set, and Classification-Regression Trees) are applied, in turn, to firms classified as either “bankrupt” or “not bankrupt”. Our findings show that hybrid models, particularly those combining LR and Multilayer Perceptron, offer better accuracy performance and interpretability and converge faster than each method implemented in isolation. Moreover, the authors demonstrate that the introduction of non-financial and macroeconomic variables complement financial ratios for bankruptcy prediction

idUS. Depósito de Investigación Universidad de Sevilla

Recent advances in directional statistics

Author: García-Portugués Eduardo
Pewsey Arthur
Publication venue
Publication date: 22/09/2020
Field of study

Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

arXiv.org e-Print Archive

Crossref

Universidad Carlos III de Madrid e-Archivo

Comparison of different classification algorithms for fault detection and fault isolation in complex systems

Author: Jung Marcel
Niculita Octavian
Skaf Zakwan
Publication venue: 'Elsevier BV'
Publication date: 08/02/2018
Field of study

Due to the lack of sufficient results seen in literature, feature extraction and classification methods of hydraulic systems appears to be somewhat challenging. This paper compares the performance of three classifiers (namely linear support vector machine (SVM), distance-weighted k-nearest neighbor (WKNN), and decision tree (DT) using data from optimized and non-optimized sensor set solutions. The algorithms are trained with known data and then tested with unknown data for different scenarios characterizing faults with different degrees of severity. This investigation is based solely on a data-driven approach and relies on data sets that are taken from experiments on the fuel system. The system that is used throughout this study is a typical fuel delivery system consisting of standard components such as a filter, pump, valve, nozzle, pipes, and two tanks. Running representative tests on a fuel system are problematic because of the time, cost, and reproduction constraints involved in capturing any significant degradation. Simulating significant degradation requires running over a considerable period; this cannot be reproduced quickly and is costly

Cranfield CERES

ResearchOnline@GCU