516 research outputs found
Estimation of a distribution function by an indirect sample
The problem of estimation of a distribution function is considered in the case where the observer has access only to a part of the indicator random values. Some basic asymptotic properties of the constructed estimates are studied. The limit theorems are proved for continuous functionals related to the estimation of F^n(x) in the space C[a, 1 - a], 0 < a < 1/2.Розглянуто задачу оцінювання функції розподілу у випадку, коли спостерігач має доступ лише до деяких індикаторних випадкових значень. Вивчено деякі базові асимптотичні властивості побудованих оцінок. У статгі доведено граничні теореми для неперервних функціоналів щодо оцінки Fn(x) у просторі C[a,1−a],0 < a < 1/2
About Testing the Hypothesis of Equality of Two Bernoulli Regression Curves
The limiting distribution of an integral square deviation between two kernel type estimators of Bernoulli regression functions is established in the case of two independent samples. The criterion of testing is constructed for both simple and composite hypotheses of equality of two Bernoulli regression functions. The question of consistency is studied. The asymptotics of behavior of the power of test is investigated for some close alternatives. Keywords: Bernoulli Regression Function, Power of Test, Consistency, Composite Hypothesi
Integral Functionals of the Gasser–Muller Regression Function
For integral functionals of the Gasser–Muller regression function and its derivatives, we consider the plug-in estimator. The consistency and asymptotic normality of the estimator are shown.Для інтегральних функцiоналiв Функції регресії Гассера-Мюллера та їх похідних розглядається оцінка, що підключається. Встановлено обґрунтованість та асимптотичну нормальність цієї оцінки
Conditional stochastic dominance tests in dynamic settings
This paper proposes nonparametric consistent tests of conditional stochastic dominance of arbitrary order in a dynamic setting. The novelty of these tests lies in the nonparametric manner of incorporating the information set. The test allows for general forms of unknown serial and mutual dependence between random variables, and has an asymptotic distribution that can be easily approximated by simulation. This method has good finite-sample performance. These tests are applied to determine investment efficiency between US industry portfolios conditional on the dynamics of the market portfolio. The empirical analysis suggests that telecommunications dominates the other sectoral portfolios under risk aversion
Local generalised method of moments: an application to point process-based rainfall models
Long series of simulated rainfall are required at point locations for a range of applications, including hydrological studies. Clustered point process-based rainfall models have been used for generating such simulations for many decades. These models suffer from a major limitation, however, their stationarity. Although seasonality can be allowed by fitting separate models for each calendar month or season, the models are unsuitable in their basic form for climate impact studies. In this paper, we develop new methodology to address this limitation. We extend the current fitting approach by allowing the discrete covariate, calendar month, to be replaced or supplemented with continuous covariates that are more directly related to the incidence and nature of rainfall. The covariate-dependent model parameters are estimated for each time interval using a kernel-based nonparametric approach within a generalised method-of-moments framework. An empirical study demonstrates the new methodology using a time series of 5-min rainfall data. The study considers both local mean and local linear approaches. While asymptotic results are included, the focus is on developing useable methodology for a complex model that can only be solved numerically. Issues including the choice of weighting matrix, estimation of parameter uncertainty and bandwidth and model selection are considered from this perspective
Non-Redundant Spectral Dimensionality Reduction
Spectral dimensionality reduction algorithms are widely used in numerous
domains, including for recognition, segmentation, tracking and visualization.
However, despite their popularity, these algorithms suffer from a major
limitation known as the "repeated Eigen-directions" phenomenon. That is, many
of the embedding coordinates they produce typically capture the same direction
along the data manifold. This leads to redundant and inefficient
representations that do not reveal the true intrinsic dimensionality of the
data. In this paper, we propose a general method for avoiding redundancy in
spectral algorithms. Our approach relies on replacing the orthogonality
constraints underlying those methods by unpredictability constraints.
Specifically, we require that each embedding coordinate be unpredictable (in
the statistical sense) from all previous ones. We prove that these constraints
necessarily prevent redundancy, and provide a simple technique to incorporate
them into existing methods. As we illustrate on challenging high-dimensional
scenarios, our approach produces significantly more informative and compact
representations, which improve visualization and classification tasks
The distribution of exoplanet masses
The present study derives the distribution of secondary masses M2 for the 67
exoplanets and very low-mass brown dwarf companions of solar-type stars, known
as of April 4, 2001. This distribution is related to the distribution of M2 sin
i through an integral equation of Abel's type. Although a formal solution
exists for this equation, it is known to be ill-behaved, and thus very
sensitive to the statistical noise present in the input M2 sin i distribution.
To overcome that difficulty, we present two robust, independent approaches: (i)
the formal solution of the integral equation is numerically computed after
performing an optimal smoothing of the input distribution, (ii) the
Lucy-Richardson algorithm is used to invert the integral equation. Both
approaches give consistent results. The resulting statistical distribution of
exoplanet true masses reveals that there is no reason to ascribe the transition
between giant planets and brown dwarfs to the threshold mass for deuterium
ignition (about 13 MJ). The M2 distribution shows instead that all the objects
have M2 < 10 MJ, except the heavier candidates which cluster around 15 MJ.Comment: Accepted by Astronomy & Astrophysics (7 pages, 4 figures
Simultaneous interval regression for K-nearest neighbor
International audienceIn some regression problems, it may be more reasonable to predict intervals rather than precise values. We are interested in finding intervals which simultaneously for all input instances x ∈X contain a β proportion of the response values. We name this problem simultaneous interval regression. This is similar to simultaneous tolerance intervals for regression with a high confidence level γ ≈ 1 and several authors have already treated this problem for linear regression. Such intervals could be seen as a form of confidence envelop for the prediction variable given any value of predictor variables in their domain. Tolerance intervals and simultaneous tolerance intervals have not yet been treated for the K-nearest neighbor (KNN) regression method. The goal of this paper is to consider the simultaneous interval regression problem for KNN and this is done without the homoscedasticity assumption. In this scope, we propose a new interval regression method based on KNN which takes advantage of tolerance intervals in order to choose, for each instance, the value of the hyper-parameter K which will be a good trade-off between the precision and the uncertainty due to the limited sample size of the neighborhood around each instance. In the experiment part, our proposed interval construction method is compared with a more conventional interval approximation method on six benchmark regression data sets
Non-linear regression models for Approximate Bayesian Computation
Approximate Bayesian inference on the basis of summary statistics is
well-suited to complex problems for which the likelihood is either
mathematically or computationally intractable. However the methods that use
rejection suffer from the curse of dimensionality when the number of summary
statistics is increased. Here we propose a machine-learning approach to the
estimation of the posterior density by introducing two innovations. The new
method fits a nonlinear conditional heteroscedastic regression of the parameter
on the summary statistics, and then adaptively improves estimation using
importance sampling. The new algorithm is compared to the state-of-the-art
approximate Bayesian methods, and achieves considerable reduction of the
computational burden in two examples of inference in statistical genetics and
in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and
Computin
- …