90,082 research outputs found

    Prediction of remaining life of power transformers based on left truncated and right censored lifetime data

    Get PDF
    Prediction of the remaining life of high-voltage power transformers is an important issue for energy companies because of the need for planning maintenance and capital expenditures. Lifetime data for such transformers are complicated because transformer lifetimes can extend over many decades and transformer designs and manufacturing practices have evolved. We were asked to develop statistically-based predictions for the lifetimes of an energy company's fleet of high-voltage transmission and distribution transformers. The company's data records begin in 1980, providing information on installation and failure dates of transformers. Although the dataset contains many units that were installed before 1980, there is no information about units that were installed and failed before 1980. Thus, the data are left truncated and right censored. We use a parametric lifetime model to describe the lifetime distribution of individual transformers. We develop a statistical procedure, based on age-adjusted life distributions, for computing a prediction interval for remaining life for individual transformers now in service. We then extend these ideas to provide predictions and prediction intervals for the cumulative number of failures, over a range of time, for the overall fleet of transformers.Comment: Published in at http://dx.doi.org/10.1214/00-AOAS231 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Tutorial on Estimating Time-Varying Vector Autoregressive Models

    Get PDF
    Time series of individual subjects have become a common data type in psychological research. These data allow one to estimate models of within-subject dynamics, and thereby avoid the notorious problem of making within-subjects inferences from between-subjects data, and naturally address heterogeneity between subjects. A popular model for these data is the Vector Autoregressive (VAR) model, in which each variable is predicted as a linear function of all variables at previous time points. A key assumption of this model is that its parameters are constant (or stationary) across time. However, in many areas of psychological research time-varying parameters are plausible or even the subject of study. In this tutorial paper, we introduce methods to estimate time-varying VAR models based on splines and kernel-smoothing with/without regularization. We use simulations to evaluate the relative performance of all methods in scenarios typical in applied research, and discuss their strengths and weaknesses. Finally, we provide a step-by-step tutorial showing how to apply the discussed methods to an openly available time series of mood-related measurements

    Reducing bias in auditory duration reproduction by integrating the reproduced signal

    Get PDF
    Duration estimation is known to be far from veridical and to differ for sensory estimates and motor reproduction. To investigate how these differential estimates are integrated for estimating or reproducing a duration and to examine sensorimotor biases in duration comparison and reproduction tasks, we compared estimation biases and variances among three different duration estimation tasks: perceptual comparison, motor reproduction, and auditory reproduction (i.e. a combined perceptual-motor task). We found consistent overestimation in both motor and perceptual-motor auditory reproduction tasks, and the least overestimation in the comparison task. More interestingly, compared to pure motor reproduction, the overestimation bias was reduced in the auditory reproduction task, due to the additional reproduced auditory signal. We further manipulated the signal-to-noise ratio (SNR) in the feedback/comparison tones to examine the changes in estimation biases and variances. Considering perceptual and motor biases as two independent components, we applied the reliability-based model, which successfully predicted the biases in auditory reproduction. Our findings thus provide behavioral evidence of how the brain combines motor and perceptual information together to reduce duration estimation biases and improve estimation reliability

    Differential expression analysis with global network adjustment

    Get PDF
    <p>Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.</p> <p>Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.</p> <p>Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p&gt

    Verbal Autopsy Methods with Multiple Causes of Death

    Get PDF
    Verbal autopsy procedures are widely used for estimating cause-specific mortality in areas without medical death certification. Data on symptoms reported by caregivers along with the cause of death are collected from a medical facility, and the cause-of-death distribution is estimated in the population where only symptom data are available. Current approaches analyze only one cause at a time, involve assumptions judged difficult or impossible to satisfy, and require expensive, time-consuming, or unreliable physician reviews, expert algorithms, or parametric statistical models. By generalizing current approaches to analyze multiple causes, we show how most of the difficult assumptions underlying existing methods can be dropped. These generalizations also make physician review, expert algorithms and parametric statistical assumptions unnecessary. With theoretical results, and empirical analyses in data from China and Tanzania, we illustrate the accuracy of this approach. While no method of analyzing verbal autopsy data, including the more computationally intensive approach offered here, can give accurate estimates in all circumstances, the procedure offered is conceptually simpler, less expensive, more general, as or more replicable, and easier to use in practice than existing approaches. We also show how our focus on estimating aggregate proportions, which are the quantities of primary interest in verbal autopsy studies, may also greatly reduce the assumptions necessary for, and thus improve the performance of, many individual classifiers in this and other areas. As a companion to this paper, we also offer easy-to-use software that implements the methods discussed herein.Comment: Published in at http://dx.doi.org/10.1214/07-STS247 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Software defect prediction: do different classifiers find the same defects?

    Get PDF
    Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio

    A Bayesian framework for verification and recalibration of ensemble forecasts: How uncertain is NAO predictability?

    Get PDF
    Predictability estimates of ensemble prediction systems are uncertain due to limited numbers of past forecasts and observations. To account for such uncertainty, this paper proposes a Bayesian inferential framework that provides a simple 6-parameter representation of ensemble forecasting systems and the corresponding observations. The framework is probabilistic, and thus allows for quantifying uncertainty in predictability measures such as correlation skill and signal-to-noise ratios. It also provides a natural way to produce recalibrated probabilistic predictions from uncalibrated ensembles forecasts. The framework is used to address important questions concerning the skill of winter hindcasts of the North Atlantic Oscillation for 1992-2011 issued by the Met Office GloSea5 climate prediction system. Although there is much uncertainty in the correlation between ensemble mean and observations, there is strong evidence of skill: the 95% credible interval of the correlation coefficient of [0.19,0.68] does not overlap zero. There is also strong evidence that the forecasts are not exchangeable with the observations: With over 99% certainty, the signal-to-noise ratio of the forecasts is smaller than the signal-to-noise ratio of the observations, which suggests that raw forecasts should not be taken as representative scenarios of the observations. Forecast recalibration is thus required, which can be coherently addressed within the proposed framework.Comment: 36 pages, 10 figure

    Improving statistical inference on pathogen densities estimated by quantitative molecular methods: malaria gametocytaemia as a case study

    Get PDF
    BACKGROUND: Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. RESULTS: The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. CONCLUSIONS: Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens

    A practical degradation based method to predict long-term moisture incursion and colour change in high power LEDs

    Get PDF
    The effect of relative humidity on LEDs and how the moisture incursion is associated to the color shift is studied. This paper proposes a different approach to describe the lumen degradation of LEDs due to the long-term effects of humidity. Using the lumen degradation data of different types of LEDs under varying conditions of relative humidity, a humidity based degradation model (HBDM) is developed. A practical estimation method from the degradation behaviour is proposed to quantitatively gauge the effect of moisture incursion by means of a humidity index. This index demonstrates a high correlation with the color shift indicated by the LED's yellow to blue output intensity ratio. Physical analyses of the LEDs provide a qualitative validation of the model, which provides good accuracy with longer periods of moisture exposure. The results demonstrate that the HBDM is an effective indicator to predict the extent of the long-term impact of humidity and associated relative color shift
    corecore