166 research outputs found

    Rate Optimal Semiparametric Estimationof the Memory Parameter of the Gaussian Time Series with Long Range Dependence

    Get PDF
    There exist several estimators of the memory parameter in long-memorytime series models with mean mu and the spectrum specified only locally near zerofrequency. In this paper we give a lower bound for the rate of convergence of anyestimator of the memory parameter as a function of the degree of local smoothnessof the spectral density at zero. The lower bound allows one to evaluate andcompare different estimators by their asymptotic behavior, and to claim the rateoptimality for any estimator attaining the bound. A log-periodogram regressionestimator, analysed by Robinson (1992), is then shown to attain the lower bound,and is thus rate optimal

    Component identification and estimation in nonlinear high-dimensional regression models by structural adaption

    Get PDF
    This article proposes a new method of analysis of a partially linear model whose nonlinear component is completely unknown. The target of analysis is identification of the set of regressors which enter in a nonlinear way in the model function, and the complete estimation of the model including slope coefficients of the linear component and the link function of the nonlinear component. The procedure also allows for selecting the significant regression variables. As a by-product, we develop a test of linear hypothesis against a partially linear alternative, or, more generally, a test that the nonlinear component is M-dimensional for M = 0,1,2,.... The approach proposed in this article is fully adaptive to the unknown model structure and applies under mild conditions on the model. The only important assumption is that the dimensionality of nonlinear component is relatively small. The theoretical results indicate that the procedure provides a prescribed level of the identification error and estimates the linear component with the accuracy of order 푛-1/2. A numerical study demonstrates a very good performance of the method even for small or moderate sample sizes

    Noisy Independent Factor Analysis Model for Density Estimation and Classification

    Get PDF
    We consider the problem of multivariate density estimation when the unknown density is assumed to follow a particular form of dimensionality reduction, a noisy independent factor analysis (IFA) model. In this model the data are generated by a number of latent independent components having unknown distributions and are observed in Gaussian noise. We do not assume that either the number of components or the matrix mixing the components are known. We show that the densities of this form can be estimated with a fast rate. Using the mirror averaging aggregation algorithm, we construct a density estimator which achieves a nearly parametric rate (log1/4 n)/√n, independent of the dimensionality of the data, as the sample size n tends to infinity. This estimator is adaptive to the number of components, their distributions and the mixing matrix. We then apply this density estimator to construct nonparametric plug-in classifiers and show that they achieve the best obtainable rate of the excess Bayes risk, to within a logarithmic factor independent of the dimension of the data. Applications of this classifier to simulated data sets and to real data from a remote sensing experiment show promising results.Financial support from the IAP research network of the Belgian government (Belgian Federal Science Policy) is gratefully acknowledged. Research of A. Samarov was partially supported by NSF grant DMS- 0505561 and by a grant from Singapore-MIT Alliance (CSB). Research of A.B. Tsybakov was partially supported by the grant ANR-06-BLAN-0194 and by the PASCAL Network of Excellence

    Noisy Independent Factor Analysis Model for Density Estimation and Classification

    Get PDF
    We consider the problem of multivariate density estimation when the unknown density is assumed to follow a particular form of dimensionality reduction, a noisy independent factor analysis (IFA) model. In this model the data are generated by a number of latent independent components having unknown distributions and are observed in Gaussian noise. We do not assume that either the number of components or the matrix mixing the components are known. We show that the densities of this form can be estimated with a fast rate. Using the mirror averaging aggregation algorithm, we construct a density estimator which achieves a nearly parametric rate log^(1/4)n/sqrt(n), independent of the dimensionality of the data, as the sample size nn tends to infinity. This estimator is adaptive to the number of components, their distributions and the mixing matrix. We then apply this density estimator to construct nonparametric plug-in classifiers and show that they achieve the best obtainable rate of the excess Bayes risk, to within a logarithmic factor independent of the dimension of the data. Applications of this classifier to simulated data sets and to real data from a remote sensing experiment show promising results

    The analysis and advanced extensions of canonical correlation analysis

    Get PDF
    Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A problem that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested. In this dissertation I will provide an approach to the problem of virtual screening based on Canonical Correlation Analysis (CCA) and several extensions which use kernel and spectral learning ideas. Specifically these methods will be applied to the protein ligand matching problem. Additionally, theoretical results analyzing the behavior of CCA in the High Dimension Low Sample Size (HDLSS) setting will be provided

    Liquid-liquid equilibrium in quarternary systems ethanol – ethylpropanoate – choline chloride –glycerol, propanol – propylpropanoate – choline chloride –glycerol, butanol – butylpropanoate – choline chloride – glycerol

    Full text link
    This study was supported by the Russian Foundation for Basic Research (project № 16-33-60128 mol_a_dk). The experimental work was facilitated by the equipment of Magnetic Resonance Research Centre at St. Petersburg State University
    corecore