2,000 research outputs found

    Characterization of the frequency of extreme events by the Generalized Pareto Distribution

    Full text link
    Based on recent results in extreme value theory, we use a new technique for the statistical estimation of distribution tails. Specifically, we use the Gnedenko-Pickands-Balkema-de Haan theorem, which gives a natural limit law for peak-over-threshold values in the form of the Generalized Pareto Distribution (GPD). Useful in finance, insurance, hydrology, we investigate here the earthquake energy distribution described by the Gutenberg-Richter seismic moment-frequency law and analyze shallow earthquakes (depth h < 70 km) in the Harvard catalog over the period 1977-2000 in 18 seismic zones. The whole GPD is found to approximate the tails of the seismic moment distributions quite well above moment-magnitudes larger than mW=5.3 and no statistically significant regional difference is found for subduction and transform seismic zones. We confirm that the b-value is very different in mid-ocean ridges compared to other zones (b=1.50=B10.09 versus b=1.00=B10.05 corresponding to a power law exponent close to 1 versus 2/3) with a very high statistical confidence. We propose a physical mechanism for this, contrasting slow healing ruptures in mid-ocean ridges with fast healing ruptures in other zones. Deviations from the GPD at the very end of the tail are detected in the sample containing earthquakes from all major subduction zones (sample size of 4985 events). We propose a new statistical test of significance of such deviations based on the bootstrap method. The number of events deviating from the tails of GPD in the studied data sets (15-20 at most) is not sufficient for determining the functional form of those deviations. Thus, it is practically impossible to give preference to one of the previously suggested parametric families describing the ends of tails of seismic moment distributions.Comment: pdf document of 21 pages + 2 tables + 20 figures (ps format) + one file giving the regionalizatio

    Computational statistics using the Bayesian Inference Engine

    Full text link
    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimised software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organise and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasises hybrid tempered MCMC schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE is implements a full persistence or serialisation system that stores the full byte-level image of the running inference and previously characterised posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU GPL.Comment: Resubmitted version. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU GP

    Modèles auto-régressifs non-causaux mixtes: Problèmes de bimodalité pour l'estimation et le test de racine unitaire

    Get PDF
    This paper stresses the bimodality of the widely used Student's t likelihood function applied in modelling Mixed causal-noncausal AutoRegressions (MAR). It first shows that a local maximum is very often to be found in addition to the global Maximum Likelihood Estimator (MLE), and that standard estimation algorithms could end up in this local maximum. It then shows that the issue becomes more salient as the causal root of the process approaches unity from below. The consequences are important as the local maximum estimated roots are typically interchanged , attributing the noncausal one to the causal component and vice-versa, which severely changes the interpretation of the results. The properties of unit root tests based on this Student's t MLE of the backward root are obviously affected as well. To circumvent this issues, this paper proposes an estimation strategy which i) increases noticeably the probability to end up in the global MLE and ii) retains the maximum relevant for the unit root test against a MAR stationary alternative. An application to Brent crude oil price illustrates the relevance of the proposed approach. Keywords: Mixed autoregression, non-causal autoregression, maximum likelihood estimation, unit root test, Brent crude oil price

    Statistical mechanics of transcription-factor binding site discovery using Hidden Markov Models

    Full text link
    Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.Comment: 25 pages, 2 figures, 1 table V2 - typos fixed and new references adde

    Segmentation of Fault Networks Determined from Spatial Clustering of Earthquakes

    Full text link
    We present a new method of data clustering applied to earthquake catalogs, with the goal of reconstructing the seismically active part of fault networks. We first use an original method to separate clustered events from uncorrelated seismicity using the distribution of volumes of tetrahedra defined by closest neighbor events in the original and randomized seismic catalogs. The spatial disorder of the complex geometry of fault networks is then taken into account by defining faults as probabilistic anisotropic kernels, whose structures are motivated by properties of discontinuous tectonic deformation and previous empirical observations of the geometry of faults and of earthquake clusters at many spatial and temporal scales. Combining this a priori knowledge with information theoretical arguments, we propose the Gaussian mixture approach implemented in an Expectation-Maximization (EM) procedure. A cross-validation scheme is then used and allows the determination of the number of kernels that should be used to provide an optimal data clustering of the catalog. This three-steps approach is applied to a high quality relocated catalog of the seismicity following the 1986 Mount Lewis (Ml=5.7M_l=5.7) event in California and reveals that events cluster along planar patches of about 2 km2^2, i.e. comparable to the size of the main event. The finite thickness of those clusters (about 290 m) suggests that events do not occur on well-defined euclidean fault core surfaces, but rather that the damage zone surrounding faults may be seismically active at depth. Finally, we propose a connection between our methodology and multi-scale spatial analysis, based on the derivation of spatial fractal dimension of about 1.8 for the set of hypocenters in the Mnt Lewis area, consistent with recent observations on relocated catalogs

    On the modelling of speculative prices by stable Paretian distributions and regularly varying tails

    Get PDF
    Earlier studies which applied the family of stable Paretian distributions to financial data are inconclusive and contradictory. In this article I estimate the parameters of the model by the Feuerverger-McDunnough method which enables the application of maximum likelihood rhethods. Based on inferential statistics, stable Paretian distributions can be rejected with monthly data. In order to confirm this result, the model is extended to the family of distributions with regularly varying tails. The result that stable Paretian distributions are not applicable is indeed confirmed by estimating the coefficient of regular variation. --

    Gaussian Process applied to modeling the dynamics of a deformable material

    Get PDF
    En aquesta tesi, establirem la base teòrica d'alguns algoritmes de reducció de la dimensionalitat com el GPLVM i la seva aplicació a la reproducció d'una sèrie temporal d'observables amb el GPDM i la seva generalització amb control CGPDM. Finalment, anem a introduir un nou model computacionalment més eficient, el MoCGPDM aplicant una barreja d'experts. L'última secció consistirà en afinar el model i comparar-lo amb el model previ.En esta tesis, estableceremos la base teórica de algunos algoritmos de reducción de la dimensionalidad como el GPLVM y su aplicación a la reproducción de una serie temporal de observables con el GPDM y su generalización con control CGPDM. Finalmente, vamos a introducir un nuevo modelo computacionálmente más eficiente, el MoCGPDM aplicando una mezcla de expertos. La última sección consistirá en afinar el modelo y compararlo con el modelo previo.In this thesis, we establish the theoretical basis of some dimensional reduction algorithms like the GPLVM and their application to the reproduction of a time series of observable data with the GPDM and its generalization with control CGPDM. Finally, we are going to introduce a new more time efficient model MoCGPDM applying a mixture of experts. And the final section will consist in fine-tuning the model and compare it to the previous model
    • …
    corecore