247 research outputs found

    On Non-Parametric Confidence Intervals for Density and Hazard Rate Functions & Trends in Daily Snow Depths in the United States and Canada

    Get PDF
    The nonparametric confidence interval for an unknown function is quite a useful tool in statistical inferential procedures; and thus, there exists a wide body of literature on the topic. The primary issues are the smoothing parameter selection using an appropriate criterion and then the coverage probability and length of the associated confidence interval. Here our focus is on the interval length in general and, in particular, on the variability in the lengths of nonparametric intervals for probability density and hazard rate functions. We start with the analysis of a nonparametric confidence interval for a probability density function noting that the confidence interval length is directly proportional to the square root of a density function. That is variability of the length of the confidence interval is driven by the variance of the estimator used to estimate the square-root of the density function. Therefore we propose and use a kernel-based constant variance estimator of the square-root of a density function. The performance of confidence intervals so obtained is studied through simulations. The methodology is then extended to nonparametric confidence intervals for the hazard rate function. Changing direction somewhat, the second part of this thesis presents a statistical study of daily snow trends in the United States and Canada from 1960-2009. A storage model balance equation with periodic features is used to describe the daily snow depth process. Changepoint (inhomogeneities features) are permitted in the model in the form of mean level shifts. The results show that snow depths are mostly declining in the United States. In contrast, snow depths seem to be increasing in Canada, especially in north-western areas of the country. On the whole, more grids are estimated to have an increasing snow trend than a decreasing trend. The changepoint component in the model serves to lessen the overall magnitude of the trends in most locations

    Change-point Problem and Regression: An Annotated Bibliography

    Get PDF
    The problems of identifying changes at unknown times and of estimating the location of changes in stochastic processes are referred to as the change-point problem or, in the Eastern literature, as disorder . The change-point problem, first introduced in the quality control context, has since developed into a fundamental problem in the areas of statistical control theory, stationarity of a stochastic process, estimation of the current position of a time series, testing and estimation of change in the patterns of a regression model, and most recently in the comparison and matching of DNA sequences in microarray data analysis. Numerous methodological approaches have been implemented in examining change-point models. Maximum-likelihood estimation, Bayesian estimation, isotonic regression, piecewise regression, quasi-likelihood and non-parametric regression are among the methods which have been applied to resolving challenges in change-point problems. Grid-searching approaches have also been used to examine the change-point problem. Statistical analysis of change-point problems depends on the method of data collection. If the data collection is ongoing until some random time, then the appropriate statistical procedure is called sequential. If, however, a large finite set of data is collected with the purpose of determining if at least one change-point occurred, then this may be referred to as non-sequential. Not surprisingly, both the former and the latter have a rich literature with much of the earlier work focusing on sequential methods inspired by applications in quality control for industrial processes. In the regression literature, the change-point model is also referred to as two- or multiple-phase regression, switching regression, segmented regression, two-stage least squares (Shaban, 1980), or broken-line regression. The area of the change-point problem has been the subject of intensive research in the past half-century. The subject has evolved considerably and found applications in many different areas. It seems rather impossible to summarize all of the research carried out over the past 50 years on the change-point problem. We have therefore confined ourselves to those articles on change-point problems which pertain to regression. The important branch of sequential procedures in change-point problems has been left out entirely. We refer the readers to the seminal review papers by Lai (1995, 2001). The so called structural change models, which occupy a considerable portion of the research in the area of change-point, particularly among econometricians, have not been fully considered. We refer the reader to Perron (2005) for an updated review in this area. Articles on change-point in time series are considered only if the methodologies presented in the paper pertain to regression analysis

    Sequential Cross-Validated Bandwidth Selection Under Dependence and Anscombe-Type Extensions to Random Time Horizons

    Full text link
    To detect changes in the mean of a time series, one may use previsible detection procedures based on nonparametric kernel prediction smoothers which cover various classic detection statistics as special cases. Bandwidth selection, particularly in a data-adaptive way, is a serious issue and not well studied for detection problems. To ensure data adaptation, we select the bandwidth by cross-validation, but in a sequential way leading to a functional estimation approach. This article provides the asymptotic theory for the method under fairly weak assumptions on the dependence structure of the error terms, which cover, e.g., GARCH(p,qp,q) processes, by establishing (sequential) functional central limit theorems for the cross-validation objective function and the associated bandwidth selector. It turns out that the proof can be based in a neat way on \cite{KurtzProtter1996}'s results on the weak convergence of \ito integrals and a diagonal argument. Our gradual change-point model covers multiple change-points in that it allows for a nonlinear regression function after the first change-point possibly with further jumps and Lipschitz continuous between those discontinuities. In applications, the time horizon where monitoring stops latest is often determined by a random experiment, e.g. a first-exit stopping time applied to a cumulated cost process or a risk measure, possibly stochastically dependent from the monitored time series. Thus, we also study that case and establish related limit theorems in the spirit of \citet{Anscombe1952}'s result. The result has various applications including statistical parameter estimation and monitoring financial investment strategies with risk-controlled early termination, which are briefly discussed

    Autocovariance estimation in regression with a discontinuous signal and mm-dependent errors: A difference-based approach

    Full text link
    We discuss a class of difference-based estimators for the autocovariance in nonparametric regression when the signal is discontinuous (change-point regression), possibly highly fluctuating, and the errors form a stationary mm-dependent process. These estimators circumvent the explicit pre-estimation of the unknown regression function, a task which is particularly challenging for such signals. We provide explicit expressions for their mean squared errors when the signal function is piecewise constant (segment regression) and the errors are Gaussian. Based on this we derive biased-optimized estimates which do not depend on the particular (unknown) autocovariance structure. Notably, for positively correlated errors, that part of the variance of our estimators which depends on the signal is minimal as well. Further, we provide sufficient conditions for n\sqrt{n}-consistency; this result is extended to piecewise Holder regression with non-Gaussian errors. We combine our biased-optimized autocovariance estimates with a projection-based approach and derive covariance matrix estimates, a method which is of independent interest. Several simulation studies as well as an application to biophysical measurements complement this paper.Comment: 41 pages, 3 figures, 3 table

    Sequential Data-Adaptive Bandwidth Selection by Cross-Validation for Nonparametric Prediction

    Full text link
    We consider the problem of bandwidth selection by cross-validation from a sequential point of view in a nonparametric regression model. Having in mind that in applications one often aims at estimation, prediction and change detection simultaneously, we investigate that approach for sequential kernel smoothers in order to base these tasks on a single statistic. We provide uniform weak laws of large numbers and weak consistency results for the cross-validated bandwidth. Extensions to weakly dependent error terms are discussed as well. The errors may be {\alpha}-mixing or L2-near epoch dependent, which guarantees that the uniform convergence of the cross validation sum and the consistency of the cross-validated bandwidth hold true for a large class of time series. The method is illustrated by analyzing photovoltaic data.Comment: 26 page

    seq2R: an R package to detect change points in DNA sequences

    Get PDF
    Identifying the mutational processes that shape the nucleotide composition of the mitochondrial genome (mtDNA) is fundamental to better understand how these genomes evolve. Several methods have been proposed to analyze DNA sequence nucleotide composition and skewness, but most of them lack any measurement of statistical support or were not developed taking into account the specificities of mitochondrial genomes. A new methodology is presented, which is specifically developed for mtDNA to detect compositional changes or asymmetries (AT and CG skews) based on nonparametric regression models and their derivatives. The proposed method also includes the construction of confidence intervals, which are built using bootstrap techniques. This paper introduces an R package, known as seq2R, that implements the proposed methodology. Moreover, an illustration of the use of seq2R is provided using real data, specifically two publicly available complete mtDNAs: the human (Homo sapiens) sequence and a nematode (Radopholus similis) mitogenome sequence.Ministerio de Ciencia e InnovaciĆ³n | Ref. MTM2011-23204Ministerio de Ciencia e InnovaciĆ³n | Ref. PID2020-118101GB-I00Xunta de Galicia | Ref. 10PXIB 300 068 P

    On parameter estimation for locally stationary long-memory processes

    Get PDF
    We consider parameter estimation for time-dependent locally stationary long-memory processes. The asymptotic distribution of an estimator based on the local infinite autoregressive representation is derived, and asymptotic formulas for the mean squared error of the estimator, and the asymptotically optimal bandwidth are obtained. In spite of long memory, the optimal bandwidth turns out to be of the order n^(-1/5) and inversely proportional to the square of the second derivative of d. In this sense, local estimation of d is comparable to regression smoothing with iid residuals.long memory, fractional ARIMA process, local stationarity, bandwidth selection
    • ā€¦
    corecore