2,000 research outputs found
Characterization of the frequency of extreme events by the Generalized Pareto Distribution
Based on recent results in extreme value theory, we use a new technique for
the statistical estimation of distribution tails. Specifically, we use the
Gnedenko-Pickands-Balkema-de Haan theorem, which gives a natural limit law for
peak-over-threshold values in the form of the Generalized Pareto Distribution
(GPD). Useful in finance, insurance, hydrology, we investigate here the
earthquake energy distribution described by the Gutenberg-Richter seismic
moment-frequency law and analyze shallow earthquakes (depth h < 70 km) in the
Harvard catalog over the period 1977-2000 in 18 seismic zones. The whole GPD is
found to approximate the tails of the seismic moment distributions quite well
above moment-magnitudes larger than mW=5.3 and no statistically significant
regional difference is found for subduction and transform seismic zones. We
confirm that the b-value is very different in mid-ocean ridges compared to
other zones (b=1.50=B10.09 versus b=1.00=B10.05 corresponding to a power law
exponent close to 1 versus 2/3) with a very high statistical confidence. We
propose a physical mechanism for this, contrasting slow healing ruptures in
mid-ocean ridges with fast healing ruptures in other zones. Deviations from the
GPD at the very end of the tail are detected in the sample containing
earthquakes from all major subduction zones (sample size of 4985 events). We
propose a new statistical test of significance of such deviations based on the
bootstrap method. The number of events deviating from the tails of GPD in the
studied data sets (15-20 at most) is not sufficient for determining the
functional form of those deviations. Thus, it is practically impossible to give
preference to one of the previously suggested parametric families describing
the ends of tails of seismic moment distributions.Comment: pdf document of 21 pages + 2 tables + 20 figures (ps format) + one
file giving the regionalizatio
Computational statistics using the Bayesian Inference Engine
This paper introduces the Bayesian Inference Engine (BIE), a general
parallel, optimised software package for parameter inference and model
selection. This package is motivated by the analysis needs of modern
astronomical surveys and the need to organise and reuse expensive derived data.
The BIE is the first platform for computational statistics designed explicitly
to enable Bayesian update and model comparison for astronomical problems.
Bayesian update is based on the representation of high-dimensional posterior
distributions using metric-ball-tree based kernel density estimation. Among its
algorithmic offerings, the BIE emphasises hybrid tempered MCMC schemes that
robustly sample multimodal posterior distributions in high-dimensional
parameter spaces. Moreover, the BIE is implements a full persistence or
serialisation system that stores the full byte-level image of the running
inference and previously characterised posterior distributions for later use.
Two new algorithms to compute the marginal likelihood from the posterior
distribution, developed for and implemented in the BIE, enable model comparison
for complex models and data sets. Finally, the BIE was designed to be a
collaborative platform for applying Bayesian methodology to astronomy. It
includes an extensible object-oriented and easily extended framework that
implements every aspect of the Bayesian inference. By providing a variety of
statistical algorithms for all phases of the inference problem, a scientist may
explore a variety of approaches with a single model and data implementation.
Additional technical details and download details are available from
http://www.astro.umass.edu/bie. The BIE is distributed under the GNU GPL.Comment: Resubmitted version. Additional technical details and download
details are available from http://www.astro.umass.edu/bie. The BIE is
distributed under the GNU GP
Modèles auto-régressifs non-causaux mixtes: Problèmes de bimodalité pour l'estimation et le test de racine unitaire
This paper stresses the bimodality of the widely used Student's t likelihood function applied in modelling Mixed causal-noncausal AutoRegressions (MAR). It first shows that a local maximum is very often to be found in addition to the global Maximum Likelihood Estimator (MLE), and that standard estimation algorithms could end up in this local maximum. It then shows that the issue becomes more salient as the causal root of the process approaches unity from below. The consequences are important as the local maximum estimated roots are typically interchanged , attributing the noncausal one to the causal component and vice-versa, which severely changes the interpretation of the results. The properties of unit root tests based on this Student's t MLE of the backward root are obviously affected as well. To circumvent this issues, this paper proposes an estimation strategy which i) increases noticeably the probability to end up in the global MLE and ii) retains the maximum relevant for the unit root test against a MAR stationary alternative. An application to Brent crude oil price illustrates the relevance of the proposed approach. Keywords: Mixed autoregression, non-causal autoregression, maximum likelihood estimation, unit root test, Brent crude oil price
Statistical mechanics of transcription-factor binding site discovery using Hidden Markov Models
Hidden Markov Models (HMMs) are a commonly used tool for inference of
transcription factor (TF) binding sites from DNA sequence data. We exploit the
mathematical equivalence between HMMs for TF binding and the "inverse"
statistical mechanics of hard rods in a one-dimensional disordered potential to
investigate learning in HMMs. We derive analytic expressions for the Fisher
information, a commonly employed measure of confidence in learned parameters,
in the biologically relevant limit where the density of binding sites is low.
We then use techniques from statistical mechanics to derive a scaling principle
relating the specificity (binding energy) of a TF to the minimum amount of
training data necessary to learn it.Comment: 25 pages, 2 figures, 1 table V2 - typos fixed and new references
adde
Segmentation of Fault Networks Determined from Spatial Clustering of Earthquakes
We present a new method of data clustering applied to earthquake catalogs,
with the goal of reconstructing the seismically active part of fault networks.
We first use an original method to separate clustered events from uncorrelated
seismicity using the distribution of volumes of tetrahedra defined by closest
neighbor events in the original and randomized seismic catalogs. The spatial
disorder of the complex geometry of fault networks is then taken into account
by defining faults as probabilistic anisotropic kernels, whose structures are
motivated by properties of discontinuous tectonic deformation and previous
empirical observations of the geometry of faults and of earthquake clusters at
many spatial and temporal scales. Combining this a priori knowledge with
information theoretical arguments, we propose the Gaussian mixture approach
implemented in an Expectation-Maximization (EM) procedure. A cross-validation
scheme is then used and allows the determination of the number of kernels that
should be used to provide an optimal data clustering of the catalog. This
three-steps approach is applied to a high quality relocated catalog of the
seismicity following the 1986 Mount Lewis () event in California and
reveals that events cluster along planar patches of about 2 km, i.e.
comparable to the size of the main event. The finite thickness of those
clusters (about 290 m) suggests that events do not occur on well-defined
euclidean fault core surfaces, but rather that the damage zone surrounding
faults may be seismically active at depth. Finally, we propose a connection
between our methodology and multi-scale spatial analysis, based on the
derivation of spatial fractal dimension of about 1.8 for the set of hypocenters
in the Mnt Lewis area, consistent with recent observations on relocated
catalogs
On the modelling of speculative prices by stable Paretian distributions and regularly varying tails
Earlier studies which applied the family of stable Paretian distributions to financial data are inconclusive and contradictory. In this article I estimate the parameters of the model by the Feuerverger-McDunnough method which enables the application of maximum likelihood rhethods. Based on inferential statistics, stable Paretian distributions can be rejected with monthly data. In order to confirm this result, the model is extended to the family of distributions with regularly varying tails. The result that stable Paretian distributions are not applicable is indeed confirmed by estimating the coefficient of regular variation. --
Gaussian Process applied to modeling the dynamics of a deformable material
En aquesta tesi, establirem la base teòrica d'alguns algoritmes de reducció de la dimensionalitat com el GPLVM i la seva aplicació a la reproducció d'una sèrie temporal d'observables amb el GPDM i la seva generalització amb control CGPDM. Finalment, anem a introduir un nou model computacionalment més eficient, el MoCGPDM aplicant una barreja d'experts. L'última secció consistirà en afinar el model i comparar-lo amb el model previ.En esta tesis, estableceremos la base teórica de algunos algoritmos de reducción de la dimensionalidad como el GPLVM y su aplicación a la reproducción de una serie temporal de observables con el GPDM y su generalización con control CGPDM. Finalmente, vamos a introducir un nuevo modelo computacionálmente más eficiente, el MoCGPDM aplicando una mezcla de expertos. La última sección consistirá en afinar el modelo y compararlo con el modelo previo.In this thesis, we establish the theoretical basis of some dimensional reduction algorithms like the GPLVM and their application to the reproduction of a time series of observable data with the GPDM and its generalization with control CGPDM. Finally, we are going to introduce a new more time efficient model MoCGPDM applying a mixture of experts. And the final section will consist in fine-tuning the model and compare it to the previous model
- …