90,082 research outputs found
Prediction of remaining life of power transformers based on left truncated and right censored lifetime data
Prediction of the remaining life of high-voltage power transformers is an
important issue for energy companies because of the need for planning
maintenance and capital expenditures. Lifetime data for such transformers are
complicated because transformer lifetimes can extend over many decades and
transformer designs and manufacturing practices have evolved. We were asked to
develop statistically-based predictions for the lifetimes of an energy
company's fleet of high-voltage transmission and distribution transformers. The
company's data records begin in 1980, providing information on installation and
failure dates of transformers. Although the dataset contains many units that
were installed before 1980, there is no information about units that were
installed and failed before 1980. Thus, the data are left truncated and right
censored. We use a parametric lifetime model to describe the lifetime
distribution of individual transformers. We develop a statistical procedure,
based on age-adjusted life distributions, for computing a prediction interval
for remaining life for individual transformers now in service. We then extend
these ideas to provide predictions and prediction intervals for the cumulative
number of failures, over a range of time, for the overall fleet of
transformers.Comment: Published in at http://dx.doi.org/10.1214/00-AOAS231 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Tutorial on Estimating Time-Varying Vector Autoregressive Models
Time series of individual subjects have become a common data type in
psychological research. These data allow one to estimate models of
within-subject dynamics, and thereby avoid the notorious problem of making
within-subjects inferences from between-subjects data, and naturally address
heterogeneity between subjects. A popular model for these data is the Vector
Autoregressive (VAR) model, in which each variable is predicted as a linear
function of all variables at previous time points. A key assumption of this
model is that its parameters are constant (or stationary) across time. However,
in many areas of psychological research time-varying parameters are plausible
or even the subject of study. In this tutorial paper, we introduce methods to
estimate time-varying VAR models based on splines and kernel-smoothing
with/without regularization. We use simulations to evaluate the relative
performance of all methods in scenarios typical in applied research, and
discuss their strengths and weaknesses. Finally, we provide a step-by-step
tutorial showing how to apply the discussed methods to an openly available time
series of mood-related measurements
Reducing bias in auditory duration reproduction by integrating the reproduced signal
Duration estimation is known to be far from veridical and to differ for sensory estimates and motor reproduction. To investigate how these differential estimates are integrated for estimating or reproducing a duration and to examine sensorimotor biases in duration comparison and reproduction tasks, we compared estimation biases and variances among three different duration estimation tasks: perceptual comparison, motor reproduction, and auditory reproduction (i.e. a combined perceptual-motor task). We found consistent overestimation in both motor and perceptual-motor auditory reproduction tasks, and the least overestimation in the comparison task. More interestingly, compared to pure motor reproduction, the overestimation bias was reduced in the auditory reproduction task, due to the additional reproduced auditory signal. We further manipulated the signal-to-noise ratio (SNR) in the feedback/comparison tones to examine the changes in estimation biases and variances. Considering perceptual and motor biases as two independent components, we applied the reliability-based model, which successfully predicted the biases in auditory reproduction. Our findings thus provide behavioral evidence of how the brain combines motor and perceptual information together to reduce duration estimation biases and improve estimation reliability
Differential expression analysis with global network adjustment
<p>Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.</p>
<p>Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.</p>
<p>Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p>
Verbal Autopsy Methods with Multiple Causes of Death
Verbal autopsy procedures are widely used for estimating cause-specific
mortality in areas without medical death certification. Data on symptoms
reported by caregivers along with the cause of death are collected from a
medical facility, and the cause-of-death distribution is estimated in the
population where only symptom data are available. Current approaches analyze
only one cause at a time, involve assumptions judged difficult or impossible to
satisfy, and require expensive, time-consuming, or unreliable physician
reviews, expert algorithms, or parametric statistical models. By generalizing
current approaches to analyze multiple causes, we show how most of the
difficult assumptions underlying existing methods can be dropped. These
generalizations also make physician review, expert algorithms and parametric
statistical assumptions unnecessary. With theoretical results, and empirical
analyses in data from China and Tanzania, we illustrate the accuracy of this
approach. While no method of analyzing verbal autopsy data, including the more
computationally intensive approach offered here, can give accurate estimates in
all circumstances, the procedure offered is conceptually simpler, less
expensive, more general, as or more replicable, and easier to use in practice
than existing approaches. We also show how our focus on estimating aggregate
proportions, which are the quantities of primary interest in verbal autopsy
studies, may also greatly reduce the assumptions necessary for, and thus
improve the performance of, many individual classifiers in this and other
areas. As a companion to this paper, we also offer easy-to-use software that
implements the methods discussed herein.Comment: Published in at http://dx.doi.org/10.1214/07-STS247 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Software defect prediction: do different classifiers find the same defects?
Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of about 80% recall. We investigate the individual defects that four classifiers predict and analyse the level of prediction uncertainty produced by these classifiers. We perform a sensitivity analysis to compare the performance of Random Forest, Naïve Bayes, RPart and SVM classifiers when predicting defects in NASA, open source and commercial datasets. The defect predictions that each classifier makes is captured in a confusion matrix and the prediction uncertainty of each classifier is compared. Despite similar predictive performance values for these four classifiers, each detects different sets of defects. Some classifiers are more consistent in predicting defects than others. Our results confirm that a unique subset of defects can be detected by specific classifiers. However, while some classifiers are consistent in the predictions they make, other classifiers vary in their predictions. Given our results, we conclude that classifier ensembles with decision-making strategies not based on majority voting are likely to perform best in defect prediction.Peer reviewedFinal Published versio
A Bayesian framework for verification and recalibration of ensemble forecasts: How uncertain is NAO predictability?
Predictability estimates of ensemble prediction systems are uncertain due to
limited numbers of past forecasts and observations. To account for such
uncertainty, this paper proposes a Bayesian inferential framework that provides
a simple 6-parameter representation of ensemble forecasting systems and the
corresponding observations. The framework is probabilistic, and thus allows for
quantifying uncertainty in predictability measures such as correlation skill
and signal-to-noise ratios. It also provides a natural way to produce
recalibrated probabilistic predictions from uncalibrated ensembles forecasts.
The framework is used to address important questions concerning the skill of
winter hindcasts of the North Atlantic Oscillation for 1992-2011 issued by the
Met Office GloSea5 climate prediction system. Although there is much
uncertainty in the correlation between ensemble mean and observations, there is
strong evidence of skill: the 95% credible interval of the correlation
coefficient of [0.19,0.68] does not overlap zero. There is also strong evidence
that the forecasts are not exchangeable with the observations: With over 99%
certainty, the signal-to-noise ratio of the forecasts is smaller than the
signal-to-noise ratio of the observations, which suggests that raw forecasts
should not be taken as representative scenarios of the observations. Forecast
recalibration is thus required, which can be coherently addressed within the
proposed framework.Comment: 36 pages, 10 figure
Improving statistical inference on pathogen densities estimated by quantitative molecular methods: malaria gametocytaemia as a case study
BACKGROUND: Quantitative molecular methods (QMMs) such as quantitative real-time polymerase chain reaction (q-PCR), reverse-transcriptase PCR (qRT-PCR) and quantitative nucleic acid sequence-based amplification (QT-NASBA) are increasingly used to estimate pathogen density in a variety of clinical and epidemiological contexts. These methods are often classified as semi-quantitative, yet estimates of reliability or sensitivity are seldom reported. Here, a statistical framework is developed for assessing the reliability (uncertainty) of pathogen densities estimated using QMMs and the associated diagnostic sensitivity. The method is illustrated with quantification of Plasmodium falciparum gametocytaemia by QT-NASBA. RESULTS: The reliability of pathogen (e.g. gametocyte) densities, and the accompanying diagnostic sensitivity, estimated by two contrasting statistical calibration techniques, are compared; a traditional method and a mixed model Bayesian approach. The latter accounts for statistical dependence of QMM assays run under identical laboratory protocols and permits structural modelling of experimental measurements, allowing precision to vary with pathogen density. Traditional calibration cannot account for inter-assay variability arising from imperfect QMMs and generates estimates of pathogen density that have poor reliability, are variable among assays and inaccurately reflect diagnostic sensitivity. The Bayesian mixed model approach assimilates information from replica QMM assays, improving reliability and inter-assay homogeneity, providing an accurate appraisal of quantitative and diagnostic performance. CONCLUSIONS: Bayesian mixed model statistical calibration supersedes traditional techniques in the context of QMM-derived estimates of pathogen density, offering the potential to improve substantially the depth and quality of clinical and epidemiological inference for a wide variety of pathogens
A practical degradation based method to predict long-term moisture incursion and colour change in high power LEDs
The effect of relative humidity on LEDs and how the moisture incursion is associated to the color shift is studied. This paper proposes a different approach to describe the lumen degradation of LEDs due to the long-term effects of humidity. Using the lumen degradation data of different types of LEDs under varying conditions of relative humidity, a humidity based degradation model (HBDM) is developed. A practical estimation method from the degradation behaviour is proposed to quantitatively gauge the effect of moisture incursion by means of a humidity index. This index demonstrates a high correlation with the color shift indicated by the LED's yellow to blue output intensity ratio. Physical analyses of the LEDs provide a qualitative validation of the model, which provides good accuracy with longer periods of moisture exposure. The results demonstrate that the HBDM is an effective indicator to predict the extent of the long-term impact of humidity and associated relative color shift
- …