207 research outputs found
Money and Market in the Countryside of the Helvetian civitas
International audienc
A comparison of minimum distance and maximum likelihood techniques for proportion estimation
The estimation of mixing proportions P sub 1, P sub 2,...P sub m in the mixture density f(x) = the sum of the series P sub i F sub i(X) with i = 1 to M is often encountered in agricultural remote sensing problems in which case the p sub i's usually represent crop proportions. In these remote sensing applications, component densities f sub i(x) have typically been assumed to be normally distributed, and parameter estimation has been accomplished using maximum likelihood (ML) techniques. Minimum distance (MD) estimation is examined as an alternative to ML where, in this investigation, both procedures are based upon normal components. Results indicate that ML techniques are superior to MD when component distributions actually are normal, while MD estimation provides better estimates than ML under symmetric departures from normality. When component distributions are not symmetric, however, it is seen that neither of these normal based techniques provides satisfactory results
Bias in parametric estimation: reduction and useful side-effects
The bias of an estimator is defined as the difference of its expected value
from the parameter to be estimated, where the expectation is with respect to
the model. Loosely speaking, small bias reflects the desire that if an
experiment is repeated indefinitely then the average of all the resultant
estimates will be close to the parameter value that is estimated. The current
paper is a review of the still-expanding repository of methods that have been
developed to reduce bias in the estimation of parametric models. The review
provides a unifying framework where all those methods are seen as attempts to
approximate the solution of a simple estimating equation. Of particular focus
is the maximum likelihood estimator, which despite being asymptotically
unbiased under the usual regularity conditions, has finite-sample bias that can
result in significant loss of performance of standard inferential procedures.
An informal comparison of the methods is made revealing some useful practical
side-effects in the estimation of popular models in practice including: i)
shrinkage of the estimators in binomial and multinomial regression models that
guarantees finiteness even in cases of data separation where the maximum
likelihood estimator is infinite, and ii) inferential benefits for models that
require the estimation of dispersion or precision parameters
Constructing a bivariate distribution function with given marginals and correlation: application to the galaxy luminosity function
We show an analytic method to construct a bivariate distribution function
(DF) with given marginal distributions and correlation coefficient. We
introduce a convenient mathematical tool, called a copula, to connect two DFs
with any prescribed dependence structure. If the correlation of two variables
is weak (Pearson's correlation coefficient ), the
Farlie-Gumbel-Morgenstern (FGM) copula provides an intuitive and natural way
for constructing such a bivariate DF. When the linear correlation is stronger,
the FGM copula cannot work anymore. In this case, we propose to use a Gaussian
copula, which connects two given marginals and directly related to the linear
correlation coefficient between two variables. Using the copulas, we
constructed the BLFs and discuss its statistical properties. Especially, we
focused on the FUV--FIR BLF, since these two luminosities are related to the
star formation (SF) activity. Though both the FUV and FIR are related to the SF
activity, the univariate LFs have a very different functional form: former is
well described by the Schechter function whilst the latter has a much more
extended power-law like luminous end. We constructed the FUV-FIR BLFs by the
FGM and Gaussian copulas with different strength of correlation, and examined
their statistical properties. Then, we discuss some further possible
applications of the BLF: the problem of a multiband flux-limited sample
selection, the construction of the SF rate (SFR) function, and the construction
of the stellar mass of galaxies ()--specific SFR () relation. The
copulas turned out to be a very useful tool to investigate all these issues,
especially for including the complicated selection effects.Comment: 14 pages, 5 figures, accepted for publication in MNRAS
Normal approximation for strong demimartingales
We consider a sequence of strong demimartingales. For these random objects, a central limit theorem is obtained by utilizing Zolotarev’s ideal metric and the fact that a sequence of strong demimartingales is ordered via the convex order with the sequence of its independent duplicates. The CLT can also be applied to demimartingale sequences with constant mean. Newman (1984) conjectures a central limit theorem for demimartingales but this problem remains open. Although the result obtained in this paper does not provide a solution to Newman’s conjecture, it is the first CLT for demimartingales available in the literature
Advice on testing the null hypothesis that a sample is drawn from a Normal distribution.
The Normal distribution remains the most widely-used statistical model, so it is only natural that researchers will frequently be required to consider whether a sample of data appears to have been drawn from a Normal distribution. Commonly-used statistical packages offer a range of alternative formal statistical tests of the null hypothesis of Normality, with inference being drawn on the basis of a calculated p-value. Here we aim to review the statistical literature on the performance of these tests, and briefly survey current usage of them in recently-published papers, with a view to offering advice on good practice. We find that authors in animal behaviour seem to be using such testing most commonly in situations where it is inadvisable (or at best unnecessary) involving pre-testing to select parametric or not-parametric analyses; and making little use of it in model-fitting situations where it might be of value. Of the many alternative tests, we recommend the routine use of either the Shapiro-Wilk or Chen-Shapiro tests; these are almost always superior to commonly-used alternatives like the Kolmogorov-Smirnov test, often by a substantial margin. We describe how both our recommend tests can be implemented. In contrast to current practice as indicated by our survey, we recommend that the results of these tests are reported in more detail (providing both the calculated sample statistic and the associated p-value). Finally, emphasize that even the higher-performing tests of Normality have low power (generally below 0.5 and often much lower) when sample sizes are less than 50, as is often the case in our field.
Keywords: Gaussian distribution, parametric statistics, Schapiro-Wilk test, statistics, statistical powe
The Statistics of Bulk Segregant Analysis Using Next Generation Sequencing
We describe a statistical framework for QTL mapping using bulk segregant analysis (BSA) based on high throughput, short-read sequencing. Our proposed approach is based on a smoothed version of the standard statistic, and takes into account variation in allele frequency estimates due to sampling of segregants to form bulks as well as variation introduced during the sequencing of bulks. Using simulation, we explore the impact of key experimental variables such as bulk size and sequencing coverage on the ability to detect QTLs. Counterintuitively, we find that relatively large bulks maximize the power to detect QTLs even though this implies weaker selection and less extreme allele frequency differences. Our simulation studies suggest that with large bulks and sufficient sequencing depth, the methods we propose can be used to detect even weak effect QTLs and we demonstrate the utility of this framework by application to a BSA experiment in the budding yeast Saccharomyces cerevisiae
- …