268,550 research outputs found

    Function estimation with locally adaptive dynamic models

    Get PDF
    We present a nonparametric Bayesian method for fitting unsmooth and highly oscillating functions, which is based on a locally adaptive hierarchical extension of standard dynamic or state space models. The main idea is to introduce locally varying variances in the state equations and to add a further smoothness prior for this variance function. Estimation is fully Bayesian and carried out by recent MCMC techniques. The whole approach can be understood as an alternative to other nonparametric function estimators, such as local or penalized regression with variable bandwidth or smoothing parameter selection. Performance is illustrated with simulated data, including unsmooth examples constructed for wavelet shrinkage, and by an application to sales data. Although the approach is developed for classical Gaussian nonparametric regression, it can be extended to more complex regression problems

    Selection of the number of frequencies using bootstrap techniques in log-periodogram regression

    Get PDF
    The choice of the bandwidth in the local log-periodogram regression is of crucial importance for estimation of the memory parameter of a long memory time series. Different choices may give rise to completely different estimates, which may lead to contradictory conclusions, for example about the stationarity of the series. We propose here a data driven bandwidth selection strategy that is based on minimizing a bootstrap approximation of the mean squared error and compare its performance with other existing techniques for optimal bandwidth selection in a mean squared error sense, revealing its better performance in a wider class of models. The empirical applicability of the proposed strategy is shown with two examples: the widely analyzed in a long memory context Nile river annual minimum levels and the input gas rate series of Box and Jenkins.bootstrap, long memory, log-periodogram regression, bandwidth selection

    Drift-Aware Ensemble Regression

    Get PDF
    Regression models are often required for controlling production processes by predicting parameter values. However, the implicit assumption of standard regression techniques that the data set used for parameter estimation comes from a stationary joint distribution may not hold in this context because manufacturing processes are subject to physical changes like wear and aging, denoted as process drift. This can cause the estimated model to deviate significantly from the current state of the modeled system. In this paper, we discuss the problem of estimating regression models from drifting processes and we present ensemble regression, an approach that maintains a set of regression models—estimated from different ranges of the data set—according to their predictive performance. We extensively evaluate our approach on synthetic and real-world data

    Partial mixture model for tight clustering of gene expression time-course

    Get PDF
    Background: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored. Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms. Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion

    An Intraseason Forecasting System for Commercial Marine Fisheries

    Get PDF
    The reliability of an intraseason yield estimation technique which is commonly used by Pacific salmon harvest managers is evaluated for applicability to a variety of commercial finfish and crustacean fisheries. The estimation technique is known as the average timing or the average performance model. The method is not easily related to standard statistical models, but does show some similarity to both a single parameter linear regression model and the ratio estimator of sampling theory. A comparison of these models, a two parameter linear model, and a regression estimator is made to determine if the precision of forecasts of performance can be improved. Forecasts by all methods are calculated on each successive time interval of the season. For a yield estimate by the average timing estimator, the cumulative catch of the current year is divided by the corresponding expected cumulative proportion of total yield. The time series of expected proportions is calculated from historical data. The linear model regresses annual yield on cumulative catch. Forecasts of period catches, by similar methods, have also been presented. Use of the estimation techniques has been extended to other measures of fishery performance, including catch per unit of effort (CPUE) data and abundance data. Stratification of historical data, performed on the basis of statistical criteria, is used to select annual data series that have patterns similar to the current year. Such stratification is done in conjunction with the ratio estimator. Six different estimators of annual performance were applied to fifty-six years of data from six different commercial fisheries. Two methods of forecasting performance for each time interval within a season were also used. The estimators were evaluated on the basis of the mean absolute percentage deviation (MAPD); where percentage deviation is the forecasting error expressed as a percentage of the forecast. A simple linear regression model of annual performance versus cumulative performance for each time interval of the season proved to be more accurate than all other methods. In general, estimates improve as the season progresses but for all methods except the linear regression model are unreliable prior to the midpoint of the season. The overall precision of the linear regression forecasts are correlated with the variability of annual performance. Fisheries which exhibit conservative seasonal patterns of performance are well suited for this type of forecasting regime

    Comparison of ridge and other shrinkage estimation techniques

    Get PDF
    Includes bibliographical references.Shrinkage estimation is an increasingly popular class of biased parameter estimation techniques, vital when the columns of the matrix of independent variables X exhibit dependencies or near dependencies. These dependencies often lead to serious problems in least squares estimation: inflated variances and mean squared errors of estimates unstable coefficients, imprecision and improper estimation. Shrinkage methods allow for a little bias and at the same time introduce smaller mean squared error and variances for the biased estimators, compared to those of unbiased estimators. However, shrinkage methods are based on the shrinkage factor, of which estimation depends on the unknown values, often computed from the OLS solution. We argue that the instability of OLS estimates may have an adverse effect on performance of shrinkage estimators. Hence a new method for estimating the shrinkage factors is proposed and applied on ridge and generalized ridge regression. We propose that the new shrinkage factors should be based on the principal components instead of the unstable OLS estimates
    corecore