192 research outputs found

    Estimation of a distribution function by an indirect sample

    No full text
    The problem of estimation of a distribution function is considered in the case where the observer has access only to a part of the indicator random values. Some basic asymptotic properties of the constructed estimates are studied. The limit theorems are proved for continuous functionals related to the estimation of F^n(x) in the space C[a, 1 - a], 0 < a < 1/2.Розглянуто задачу оцінювання функції розподілу у випадку, коли спостерігач має доступ лише до деяких індикаторних випадкових значень. Вивчено деякі базові асимптотичні властивості побудованих оцінок. У статгі доведено граничні теореми для неперервних функціоналів щодо оцінки Fn(x) у просторі C[a,1−a],0 < a < 1/2

    Integral Functionals of the Gasser–Muller Regression Function

    No full text
    For integral functionals of the Gasser–Muller regression function and its derivatives, we consider the plug-in estimator. The consistency and asymptotic normality of the estimator are shown.Для інтегральних функцiоналiв Функції регресії Гассера-Мюллера та їх похідних розглядається оцінка, що підключається. Встановлено обґрунтованість та асимптотичну нормальність цієї оцінки

    Nonparametric estimation of conditional transition probabilities in a non-Markov illness-death model

    Get PDF
    One important goal in multi-state modeling is the estimation of transition probabilities. In longitudinal medical studies these quantities are particularly of interest since they allow for long-term predictions of the process. In recent years signi ficant contributions have been made regarding this topic. However, most of the approaches assume independent censoring and do not account for the influence of covariates. The goal of the paper is to introduce feasible estimation methods for the transition probabilities in an illness-death model conditionally on current or past covariate measures. All approaches are evaluated through a simulation study, leading to a comparison of two di erent estimators. The proposed methods are illustrated using real a colon cancer data set.This research was nanced by FEDER Funds through Programa Operacional Factores de Competitividade COMPETE and by Portuguese Funds through FCT - Funda ção para a Cência e a Tecnologia, within Projects Est-C/MAT/UI0013/2011 and PTDC/MAT/104879/2008. We also acknowledge nancial support from the project Grants MTM2008-03129 and MTM2011-23204 (FEDER support included) of the Spanish Ministerio de Ciencia e Innovaci on and 10PXIB300068PR of the Xunta de Galicia. Partial support from a grant from the US National Security Agency (H98230-11-1-0168) is greatly appreciated

    PAC learning using Nadaraya-Watson estimator based on orthonormal systems

    Get PDF
    Regression or function classes of Euclidean type with compact support and certain smoothness properties are shown to be PAC learnable by the Nadaraya-Watson estimator based on complete orthonormal systems. While requiring more smoothness properties than typical PAC formulations, this estimator is computationally efficient, easy to implement, and known to perform well in a number of practical applications. The sample sizes necessary for PAC learning of regressions or functions under sup norm cost are derived for a general orthonormal system. The result covers the widely used estimators based on Haar wavelets, trignometric functions, and Daubechies wavelets

    Kernel bandwidth optimization in spike rate estimation

    Get PDF
    Kernel smoother and a time-histogram are classical tools for estimating an instantaneous rate of spike occurrences. We recently established a method for selecting the bin width of the time-histogram, based on the principle of minimizing the mean integrated square error (MISE) between the estimated rate and unknown underlying rate. Here we apply the same optimization principle to the kernel density estimation in selecting the width or “bandwidth” of the kernel, and further extend the algorithm to allow a variable bandwidth, in conformity with data. The variable kernel has the potential to accurately grasp non-stationary phenomena, such as abrupt changes in the firing rate, which we often encounter in neuroscience. In order to avoid possible overfitting that may take place due to excessive freedom, we introduced a stiffness constant for bandwidth variability. Our method automatically adjusts the stiffness constant, thereby adapting to the entire set of spike data. It is revealed that the classical kernel smoother may exhibit goodness-of-fit comparable to, or even better than, that of modern sophisticated rate estimation methods, provided that the bandwidth is selected properly for a given set of spike data, according to the optimization methods presented here

    The Statistics of Bulk Segregant Analysis Using Next Generation Sequencing

    Get PDF
    We describe a statistical framework for QTL mapping using bulk segregant analysis (BSA) based on high throughput, short-read sequencing. Our proposed approach is based on a smoothed version of the standard statistic, and takes into account variation in allele frequency estimates due to sampling of segregants to form bulks as well as variation introduced during the sequencing of bulks. Using simulation, we explore the impact of key experimental variables such as bulk size and sequencing coverage on the ability to detect QTLs. Counterintuitively, we find that relatively large bulks maximize the power to detect QTLs even though this implies weaker selection and less extreme allele frequency differences. Our simulation studies suggest that with large bulks and sufficient sequencing depth, the methods we propose can be used to detect even weak effect QTLs and we demonstrate the utility of this framework by application to a BSA experiment in the budding yeast Saccharomyces cerevisiae

    Genetic Association Studies of Copy-Number Variation: Should Assignment of Copy Number States Precede Testing?

    Get PDF
    Recently, structural variation in the genome has been implicated in many complex diseases. Using genomewide single nucleotide polymorphism (SNP) arrays, researchers are able to investigate the impact not only of SNP variation, but also of copy-number variants (CNVs) on the phenotype. The most common analytic approach involves estimating, at the level of the individual genome, the underlying number of copies present at each location. Once this is completed, tests are performed to determine the association between copy number state and phenotype. An alternative approach is to carry out association testing first, between phenotype and raw intensities from the SNP array at the level of the individual marker, and then aggregate neighboring test results to identify CNVs associated with the phenotype. Here, we explore the strengths and weaknesses of these two approaches using both simulations and real data from a pharmacogenomic study of the chemotherapeutic agent gemcitabine. Our results indicate that pooled marker-level testing is capable of offering a dramatic increase in power (-fold) over CNV-level testing, particularly for small CNVs. However, CNV-level testing is superior when CNVs are large and rare; understanding these tradeoffs is an important consideration in conducting association studies of structural variation

    Distribution Dynamics in the US. A spatial perspective.

    Get PDF
    It is quite common in cross-sectional convergence analyses that data exhibit strong spatial dependence. While the literature adopting the regression approach is now fully aware that neglecting this feature may lead to inaccurate results and has therefore suggested a number of statistical tools for addressing the issue, research is only at a very initial stage within the distribution dynamics approach. In particular, in the continuous state-space framework, a few authors opted for spatial pre-filtering the data in order to guarantee the statistical properties of the estimates. In this paper, we follow an alternative route that starts from the idea that spatial dependence is not just noise but can be a substantive element of the data generating process. In particular, we develop a tool that, building on a mean-bias adjustment procedure established in the literature, explicitly allows for spatial dependence in distribution dynamics analysis thus eliminating the need for pre-filtering. Using this tool, we then reconsider the evidence on convergence across US states
    corecore