70 research outputs found
Goodness of fit of prediction models and two step prediction
Given a second order stationary time series it can be shown that there exists an optimum linear predictor of Xk, say X*k which is constructed from {Xt ,t=O,-l,-2 …} the mean square error of prediction being given by ek = E [|Xk- X*k|2].
In some cases however a series can be considered to have started at a point in the past and an attempt is made to see how well the optimum linear form of the predictor behaves in this case.
Using the fundamental result due to Kolmogorov relating the prediction error e1 to the power spectrum f(w) e1 = exp. {1/2 pi Log from – pi to p log 2 pi f(w) dw} estimates of e1 are constructed using the estimated periodogram and power spectrum estimates. As is argued in some detail the quantity e1 is a natural one to look at when considering prediction and estimation problems and the estimates obtained are non-parametric.
The characteristic functions of these estimates are obtained and it is shown that asymptotically they have distributions which are approximately normal. The rate of convergence to normality is also investigated.
A previous author has used a similar estimate as the basis of a test of white noise and the published results are extended and in the light of the simulation results obtained some modifications are suggested.
To increase the value of the estimates e1 their small sample distribution is approximated and extensive tables of percentage points are provided. Using these approximations one can construct a more powerful and versatile test for white noise and simulation results confirm that the theoretical results work well.
The same approximation technique is used to derive the small sample distribution of some new estimates of the coefficients in the model generating {Xt}. These estimates are also based on the power spectrum. While it is shown small sample theory is limited in this situation the asymptotic results are very interesting and useful.
Several suggestions are made as to further fields of investigation in both the univariate and multivariate cases
Goodness of fit of prediction models and two step prediction
Given a second order stationary time series it can be shown that there exists an optimum linear predictor of Xk, say X*k which is constructed from {Xt ,t=O,-l,-2 …} the mean square error of prediction being given by ek = E [|Xk- X*k|2].
In some cases however a series can be considered to have started at a point in the past and an attempt is made to see how well the optimum linear form of the predictor behaves in this case.
Using the fundamental result due to Kolmogorov relating the prediction error e1 to the power spectrum f(w) e1 = exp. {1/2 pi Log from – pi to p log 2 pi f(w) dw} estimates of e1 are constructed using the estimated periodogram and power spectrum estimates. As is argued in some detail the quantity e1 is a natural one to look at when considering prediction and estimation problems and the estimates obtained are non-parametric.
The characteristic functions of these estimates are obtained and it is shown that asymptotically they have distributions which are approximately normal. The rate of convergence to normality is also investigated.
A previous author has used a similar estimate as the basis of a test of white noise and the published results are extended and in the light of the simulation results obtained some modifications are suggested.
To increase the value of the estimates e1 their small sample distribution is approximated and extensive tables of percentage points are provided. Using these approximations one can construct a more powerful and versatile test for white noise and simulation results confirm that the theoretical results work well.
The same approximation technique is used to derive the small sample distribution of some new estimates of the coefficients in the model generating {Xt}. These estimates are also based on the power spectrum. While it is shown small sample theory is limited in this situation the asymptotic results are very interesting and useful.
Several suggestions are made as to further fields of investigation in both the univariate and multivariate cases
Background-free detection of trapped ions
We demonstrate a Doppler cooling and detection scheme for ions with low-lying
D levels which almost entirely suppresses scattered laser light background,
while retaining a high fluorescence signal and efficient cooling. We cool a
single ion with a laser on the 2S1/2 to 2P1/2 transition as usual, but repump
via the 2P3/2 level. By filtering out light on the cooling transition and
detecting only the fluorescence from the 2P_3/2 to 2S1/2 decays, we suppress
the scattered laser light background count rate to 1 per second while
maintaining a signal of 29000 per second with moderate saturation of the
cooling transition. This scheme will be particularly useful for experiments
where ions are trapped in close proximity to surfaces, such as the trap
electrodes in microfabricated ion traps, which leads to high background scatter
from the cooling beam
Heating rate and electrode charging measurements in a scalable, microfabricated, surface-electrode ion trap
We characterise the performance of a surface-electrode ion "chip" trap
fabricated using established semiconductor integrated circuit and
micro-electro-mechanical-system (MEMS) microfabrication processes which are in
principle scalable to much larger ion trap arrays, as proposed for implementing
ion trap quantum information processing. We measure rf ion micromotion parallel
and perpendicular to the plane of the trap electrodes, and find that on-package
capacitors reduce this to <~ 10 nm in amplitude. We also measure ion trapping
lifetime, charging effects due to laser light incident on the trap electrodes,
and the heating rate for a single trapped ion. The performance of this trap is
found to be comparable with others of the same size scale.Comment: 6 pages, 10 figure
Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches
We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents.We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models--BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE.PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts
On allometric equations for predicting body mass of dinosaurs
Packard and colleagues investigate the prediction of the body mass of dinosaurs, using allometric models, advocating parameter estimation via direct optimization of a least-squares criterion on arithmetic axes rather than the conventional approach based on linear least-squares regression on logarithmic axes. We examine the statistical assumptions underpinning each approach, and find the method of Packard to be conceptually unsatisfactory as it assumes absolute rather than relative variability in body mass for a given long-bone circumference, which is biologically implausible. Their proposed approach is thus unduly sensitive to small relative errors for large mammals; as the largest (the elephant) is comparatively light for its large-bone circumference, the resulting model grossly overestimates the body mass of small mammals and is likely to substantially underestimate the body mass of dinosaurs. It is also important to note, however, that the error bars for the conventional model already indicate substantial uncertainty in body mass, such that for example, the body mass of Apatosaurus louisae may be as high as 63 metric tonnes, or as low as 23 metric tonnes, with a modal value of 38 metric tonnes
- …