2,602 research outputs found
Testing linear hypotheses in high-dimensional regressions
For a multivariate linear model, Wilk's likelihood ratio test (LRT)
constitutes one of the cornerstone tools. However, the computation of its
quantiles under the null or the alternative requires complex analytic
approximations and more importantly, these distributional approximations are
feasible only for moderate dimension of the dependent variable, say .
On the other hand, assuming that the data dimension as well as the number
of regression variables are fixed while the sample size grows, several
asymptotic approximations are proposed in the literature for Wilk's \bLa
including the widely used chi-square approximation. In this paper, we consider
necessary modifications to Wilk's test in a high-dimensional context,
specifically assuming a high data dimension and a large sample size .
Based on recent random matrix theory, the correction we propose to Wilk's test
is asymptotically Gaussian under the null and simulations demonstrate that the
corrected LRT has very satisfactory size and power, surely in the large and
large context, but also for moderately large data dimensions like or
. As a byproduct, we give a reason explaining why the standard chi-square
approximation fails for high-dimensional data. We also introduce a new
procedure for the classical multiple sample significance test in MANOVA which
is valid for high-dimensional data.Comment: Accepted 02/2012 for publication in "Statistics". 20 pages, 2 pages
and 2 table
Preface: Advances in post-processing and blending of deterministic and ensemble forecasts
The special issue on advances in post-processing and blending of deterministic and ensemble forecasts is the outcome of several successful successive sessions organized at the General Assembly of the European Geosciences Union. Statistical post-processing and blending of forecasts are currently topics of important attention and development in many countries to produce optimal forecasts. Ten contributions have been received, covering key aspects of current concerns on statistical post-processing, namely the restoration of inter-variable dependences, the impact of model changes on the statistical relationships and how to cope with it, the operational implementation at forecasting centers, the development of appropriate metrics for forecast verification, and finally two specific applications to snow forecasts and seasonal forecasts of the North Atlantic Oscillation
On the Largest Singular Values of Random Matrices with Independent Cauchy Entries
We apply the method of determinants to study the distribution of the largest
singular values of large real rectangular random matrices with
independent Cauchy entries. We show that statistical properties of the
(rescaled by a factor of \frac{1}{m^2\*n^2})largest singular values agree in
the limit with the statistics of the inhomogeneous Poisson random point process
with the intensity and, therefore, are different
from the Tracy-Widom law. Among other corollaries of our method we show an
interesting connection between the mathematical expectations of the
determinants of complex rectangular standard Wishart ensemble
and real rectangular standard Wishart ensemble.Comment: We have shown in the revised version that the statistics of the
largest eigenavlues of a sample covariance random matrix with i.i.d. Cauchy
entries agree in the limit with the statistics of the inhomogeneous Poisson
random point process with the intensity $\frac{1}{\pi} x^{-3/2}.
Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation
Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources
Optical quenching and recovery of photoconductivity in single-crystal diamond
We study the photocurrent induced by pulsed-light illumination (pulse
duration is several nanoseconds) of single-crystal diamond containing nitrogen
impurities. Application of additional continuous-wave light of the same
wavelength quenches pulsed photocurrent. Characterization of the optically
quenched photocurrent and its recovery is important for the development of
diamond based electronics and sensing
Antimicrobial, mechanical and thermal studies of silver particle-loaded polyurethane.
Silver-particle-incorporated polyurethane films were evaluated for antimicrobial activity towards two different bacteria: Escherichia coli (E. coli) and Staphylococcus aureus (S. aureus). Distributed silver particles sourced from silver nitrate, silver lactate and preformed silver nanoparticles were mixed with polyurethane (PU) and variously characterized by field emission scanning electron microscopy (FESEM), fourier transform infra-red (FTIR) spectroscopy, X-ray diffraction (XRD) and contact angle measurement. Antibacterial activity against E.coli was confirmed for films loaded with 10% (w/w) AgNO3, 1% and 10% (w/w) Ag lactate and preformed Ag nanoparticles. All were active against S. aureus, but Ag nanoparticles loaded with PU had a minor effect. The apparent antibacterial performance of Ag lactate-loaded PU is better than other Ag ion-loaded films, revealed from the zone of inhibition study. The better performance of silver lactate-loaded PU was the likely result of a porous PU structure. FESEM and FTIR indicated direct interaction of silver with the PU backbone, and XRD patterns confirmed that face-centred cubic-type silver, representative of Ag metal, was present. Young's modulus, tensile strength and the hardness of silver containing PU films were not adversely affected and possibly marginally increased with silver incorporation. Dynamic mechanical analysis (DMA) indicated greater thermal stability
Microscopic mechanism for mechanical polishing of diamond (110) surfaces
Mechanically induced degradation of diamond, as occurs during polishing, is
studied using total--energy pseudopotential calculations. The strong asymmetry
in the rate of polishing between different directions on the diamond (110)
surface is explained in terms of an atomistic mechanism for nano--groove
formation. The post--polishing surface morphology and the nature of the
polishing residue predicted by this mechanism are consistent with experimental
evidence.Comment: 4 pages, 5 figure
Measures of Model Performance Based On the Log Accuracy Ratio
Quantitative assessment of modeling and forecasting of continuous quantities uses a variety of approaches. We review existing literature describing metrics for forecast accuracy and bias, concentrating on those based on relative errors and percentage errors. Of these accuracy metrics, the mean absolute percentage error (MAPE) is one of the most common across many fields and has been widely applied in recent space science literature and we highlight the benefits and drawbacks of MAPE and proposed alternatives. We then introduce the log accuracy ratio and derive from it two metrics: the median symmetric accuracy and the symmetric signed percentage bias. Robust methods for estimating the spread of a multiplicative linear model using the log accuracy ratio are also presented. The developed metrics are shown to be easy to interpret, robust, and to mitigate the key drawbacks of their more widely used counterparts based on relative errors and percentage errors. Their use is illustrated with radiation belt electron flux modeling examples.Peer reviewe
A New Technique for Finding Needles in Haystacks: A Geometric Approach to Distinguishing Between a New Source and Random Fluctuations
We propose a new test statistic based on a score process for determining the
statistical significance of a putative signal that may be a small perturbation
to a noisy experimental background. We derive the reference distribution for
this score test statistic; it has an elegant geometrical interpretation as well
as broad applicability. We illustrate the technique in the context of a model
problem from high-energy particle physics. Monte Carlo experimental results
confirm that the score test results in a significantly improved rate of signal
detection.Comment: 5 pages, 4 figure
- …