4,075 research outputs found
Model selection in High-Dimensions: A Quadratic-risk based approach
In this article we propose a general class of risk measures which can be used
for data based evaluation of parametric models. The loss function is defined as
generalized quadratic distance between the true density and the proposed model.
These distances are characterized by a simple quadratic form structure that is
adaptable through the choice of a nonnegative definite kernel and a bandwidth
parameter. Using asymptotic results for the quadratic distances we build a
quick-to-compute approximation for the risk function. Its derivation is
analogous to the Akaike Information Criterion (AIC), but unlike AIC, the
quadratic risk is a global comparison tool. The method does not require
resampling, a great advantage when point estimators are expensive to compute.
The method is illustrated using the problem of selecting the number of
components in a mixture model, where it is shown that, by using an appropriate
kernel, the method is computationally straightforward in arbitrarily high data
dimensions. In this same context it is shown that the method has some clear
advantages over AIC and BIC.Comment: Updated with reviewer suggestion
Prediction and Generation of Binary Markov Processes: Can a Finite-State Fox Catch a Markov Mouse?
Understanding the generative mechanism of a natural system is a vital
component of the scientific method. Here, we investigate one of the fundamental
steps toward this goal by presenting the minimal generator of an arbitrary
binary Markov process. This is a class of processes whose predictive model is
well known. Surprisingly, the generative model requires three distinct
topologies for different regions of parameter space. We show that a previously
proposed generator for a particular set of binary Markov processes is, in fact,
not minimal. Our results shed the first quantitative light on the relative
(minimal) costs of prediction and generation. We find, for instance, that the
difference between prediction and generation is maximized when the process is
approximately independently, identically distributed.Comment: 12 pages, 12 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/gmc.ht
Dynamic fluctuations in the superconductivity of NbN films from microwave conductivity measurements
We have measured the frequency and temperature dependences of complex ac
conductivity, \sigma(\omega)=\sigma_1(\omega)-i\sigma_2(\omega), of NbN films
in zero magnetic field between 0.1 to 10 GHz using a microwave broadband
technique. In the vicinity of superconducting critical temperature, Tc, both
\sigma_1(\omega) and \sigma_2(\omega) showed a rapid increase in the low
frequency limit owing to the fluctuation effect of superconductivity. For the
films thinner than 300 nm, frequency and temperature dependences of fluctuation
conductivity, \sigma(\omega,T), were successfully scaled onto one scaling
function, which was consistent with the Aslamazov and Larkin model for two
dimensional (2D) cases. For thicker films, \sigma(\omega,T) data could not be
scaled, but indicated that the dimensional crossover from three dimensions (3D)
to 2D occurred as the temperature approached Tc from above. This provides a
good reference of ac fluctuation conductivity for more exotic superconductors
of current interest.Comment: 8 pages, 7 Figures, 1 Table, Accepted for publication in PR
Adaptive density estimation for stationary processes
We propose an algorithm to estimate the common density of a stationary
process . We suppose that the process is either or
-mixing. We provide a model selection procedure based on a generalization
of Mallows' and we prove oracle inequalities for the selected estimator
under a few prior assumptions on the collection of models and on the mixing
coefficients. We prove that our estimator is adaptive over a class of Besov
spaces, namely, we prove that it achieves the same rates of convergence as in
the i.i.d framework
Analyzing the House Fly's Exploratory Behavior with Autoregression Methods
This paper presents a detailed characterization of the trajectory of a single
housefly with free range of a square cage. The trajectory of the fly was
recorded and transformed into a time series, which was fully analyzed using an
autoregressive model, which describes a stationary time series by a linear
regression of prior state values with the white noise. The main discovery was
that the fly switched styles of motion from a low dimensional regular pattern
to a higher dimensional disordered pattern. This discovered exploratory
behavior is, irrespective of the presence of food, characterized by anomalous
diffusion.Comment: 20 pages, 9 figures, 1 table, full pape
Uncovering predictability in the evolution of the WTI oil futures curve
Accurately forecasting the price of oil, the world's most actively traded
commodity, is of great importance to both academics and practitioners. We
contribute by proposing a functional time series based method to model and
forecast oil futures. Our approach boasts a number of theoretical and practical
advantages including effectively exploiting underlying process dynamics missed
by classical discrete approaches. We evaluate the finite-sample performance
against established benchmarks using a model confidence set test. A realistic
out-of-sample exercise provides strong support for the adoption of our approach
with it residing in the superior set of models in all considered instances.Comment: 28 pages, 4 figures, to appear in European Financial Managemen
Large-scale structure of time evolving citation networks
In this paper we examine a number of methods for probing and understanding
the large-scale structure of networks that evolve over time. We focus in
particular on citation networks, networks of references between documents such
as papers, patents, or court cases. We describe three different methods of
analysis, one based on an expectation-maximization algorithm, one based on
modularity optimization, and one based on eigenvector centrality. Using the
network of citations between opinions of the United States Supreme Court as an
example, we demonstrate how each of these methods can reveal significant
structural divisions in the network, and how, ultimately, the combination of
all three can help us develop a coherent overall picture of the network's
shape.Comment: 10 pages, 6 figures; journal names for 4 references fixe
Detecting periodicity in experimental data using linear modeling techniques
Fourier spectral estimates and, to a lesser extent, the autocorrelation
function are the primary tools to detect periodicities in experimental data in
the physical and biological sciences. We propose a new method which is more
reliable than traditional techniques, and is able to make clear identification
of periodic behavior when traditional techniques do not. This technique is
based on an information theoretic reduction of linear (autoregressive) models
so that only the essential features of an autoregressive model are retained.
These models we call reduced autoregressive models (RARM). The essential
features of reduced autoregressive models include any periodicity present in
the data. We provide theoretical and numerical evidence from both experimental
and artificial data, to demonstrate that this technique will reliably detect
periodicities if and only if they are present in the data. There are strong
information theoretic arguments to support the statement that RARM detects
periodicities if they are present. Surrogate data techniques are used to ensure
the converse. Furthermore, our calculations demonstrate that RARM is more
robust, more accurate, and more sensitive, than traditional spectral
techniques.Comment: 10 pages (revtex) and 6 figures. To appear in Phys Rev E. Modified
styl
An approximate Bayesian marginal likelihood approach for estimating finite mixtures
Estimation of finite mixture models when the mixing distribution support is
unknown is an important problem. This paper gives a new approach based on a
marginal likelihood for the unknown support. Motivated by a Bayesian Dirichlet
prior model, a computationally efficient stochastic approximation version of
the marginal likelihood is proposed and large-sample theory is presented. By
restricting the support to a finite grid, a simulated annealing method is
employed to maximize the marginal likelihood and estimate the support. Real and
simulated data examples show that this novel stochastic
approximation--simulated annealing procedure compares favorably to existing
methods.Comment: 16 pages, 1 figure, 3 table
- âŠ