52,979 research outputs found

    The minimum description length principle for probability density estimation by regular histograms

    Get PDF
    The minimum description length principle is a general methodology for statistical modeling and inference that selects the best explanation for observed data as the one allowing the shortest description of them. Application of this principle to the important task of probability density estimation by histograms was previously proposed. We review this approach and provide additional illustrative examples and an application to real-world data, with a presentation emphasizing intuition and concrete arguments. We also consider alternative ways of measuring the description lengths, that can be found to be more suited in this context. We explicitly exhibit, analyze and compare, the complete forms of the description lengths with formulas involving the information entropy and redundancy of the data, and not given elsewhere. Histogram estimation as performed here naturally extends to multidimensional data, and offers for them flexible and optimal subquantization schemes. The framework can be very useful for modeling and reduction of complexity of observed data, based on a general principle from statistical information theory, and placed within a unifying informational perspective

    Unsupervised Discretization by Two-dimensional MDL-based Histogram

    Full text link
    Unsupervised discretization is a crucial step in many knowledge discovery tasks. The state-of-the-art method for one-dimensional data infers locally adaptive histograms using the minimum description length (MDL) principle, but the multi-dimensional case is far less studied: current methods consider the dimensions one at a time (if not independently), which result in discretizations based on rectangular cells of adaptive size. Unfortunately, this approach is unable to adequately characterize dependencies among dimensions and/or results in discretizations consisting of more cells (or bins) than is desirable. To address this problem, we propose an expressive model class that allows for far more flexible partitions of two-dimensional data. We extend the state of the art for the one-dimensional case to obtain a model selection problem based on the normalised maximum likelihood, a form of refined MDL. As the flexibility of our model class comes at the cost of a vast search space, we introduce a heuristic algorithm, named PALM, which partitions each dimension alternately and then merges neighbouring regions, all using the MDL principle. Experiments on synthetic data show that PALM 1) accurately reveals ground truth partitions that are within the model class (i.e., the search space), given a large enough sample size; 2) approximates well a wide range of partitions outside the model class; 3) converges, in contrast to its closest competitor IPD; and 4) is self-adaptive with regard to both sample size and local density structure of the data despite being parameter-free. Finally, we apply our algorithm to two geographic datasets to demonstrate its real-world potential.Comment: 30 pages, 9 figure

    The order of the metal to superconductor transition

    Full text link
    We present results from large-scale Monte Carlo simulations on the full Ginzburg-Landau (GL) model, including fluctuations in the amplitude and the phase of the matter-field, as well as fluctuations of the non-compact gauge-field of the theory. {}From this we obtain a precise critical value of the GL parameter \kct separating a first order metal to superconductor transition from a second order one, \kct = (0.76\pm 0.04)/\sqrt{2}. This agrees surprisingly well with earlier analytical results based on a disorder theory of the superconductor to metal transition, where the value \kct=0.798/\sqrt{2} was obtained. To achieve this, we have done careful infinite volume and continuum limit extrapolations. In addition we offer a novel interpretation of \kct, namely that it is also the value separating \typeI and \typeII behaviour.<Comment: Minor corrections, present version accepted for publication in PR

    A theoretical framework for trading experiments

    Get PDF
    A general framework is suggested to describe human decision making in a certain class of experiments performed in a trading laboratory. We are in particular interested in discerning between two different moods, or states of the investors, corresponding to investors using fundamental investment strategies, technical analysis investment strategies respectively. Our framework accounts for two opposite situations already encountered in experimental setups: i) the rational expectations case, and ii) the case of pure speculation. We consider new experimental conditions which allow both elements to be present in the decision making process of the traders, thereby creating a dilemma in terms of investment strategy. Our theoretical framework allows us to predict the outcome of this type of trading experiments, depending on such variables as the number of people trading, the liquidity of the market, the amount of information used in technical analysis strategies, as well as the dividends attributed to an asset. We find that it is possible to give a qualitative prediction of trading behavior depending on a ratio that quantifies the fluctuations in the model

    Symbolic Time Series Analysis in Economics

    Get PDF
    In this paper I describe and apply the methods of Symbolic Time Series Analysis (STSA) to an experimental framework. The idea behind Symbolic Time Series Analysis is simple: the values of a given time series data are transformed into a finite set of symbols obtaining a finite string. Then, we can process the symbolic sequence using tools from information theory and symbolic dynamics. I discuss data symbolization as a tool for identifying temporal patterns in experimental data and use symbol sequence statistics in a model strategy. To explain these applications, I describe methods to select the symbolization of the data (Section 2), I introduce the symbolic sequence histograms and some tools to characterize and compare these histograms (Section 3). I show that the methods of symbolic time series analysis can be a good tool to describe and recognize time patterns in complex dynamical processes and to extract dynamical information about this kind of system. In particular, the method gives us a language in which to express and analyze these time patterns. In section 4 I report some applications of STSA to study the evolution of ifferent economies. In these applications data symbolization is based on economic criteria using the notion of economic regime introduced earlier in this thesis. I use STSA methods to describe the dynamical behavior of these economies and to do comparative analysis of their regime dynamics. In section 5 I use STSA to reconstruct a model of a dynamical system from measured time series data. In particular, I will show how the observed symbolic sequence statistics can be used as a target for measuring the goodness of fit of proposed models.

    MDL Denoising Revisited

    Full text link
    We refine and extend an earlier MDL denoising criterion for wavelet-based denoising. We start by showing that the denoising problem can be reformulated as a clustering problem, where the goal is to obtain separate clusters for informative and non-informative wavelet coefficients, respectively. This suggests two refinements, adding a code-length for the model index, and extending the model in order to account for subband-dependent coefficient distributions. A third refinement is derivation of soft thresholding inspired by predictive universal coding with weighted mixtures. We propose a practical method incorporating all three refinements, which is shown to achieve good performance and robustness in denoising both artificial and natural signals.Comment: Submitted to IEEE Transactions on Information Theory, June 200

    Detecting Current Noise with a Josephson Junction in the Macroscopic Quantum Tunneling Regime

    Full text link
    We discuss the use of a hysteretic Josephson junction to detect current fluctuations with frequencies below the plasma frequency of the junction. These adiabatic fluctuations are probed by switching measurements observing the noise-affected average rate of macroscopic quantum tunneling of the detector junction out of its zero-voltage state. In a proposed experimental scheme, frequencies of the noise are limited by an on-chip filtering circuit. The third cumulant of current fluctuations at the detector is related to an asymmetry of the switching rates.Comment: 26 pages, 10 figures. To appear in Journal of Low Temperature Physics in the proceedings of the ULTI conference organized in Lammi, Finland (2006

    The Electroweak Phase Transition: A Non-Perturbative Analysis

    Get PDF
    We study on the lattice the 3d SU(2)+Higgs model, which is an effective theory of a large class of 4d high temperature gauge theories. Using the exact constant physics curve, continuum (V→∞,a→0V\to\infty, a\to 0) results for the properties of the phase transition (critical temperature, latent heat, interface tension) are given. The 3-loop correction to the effective potential of the scalar field is determined. The masses of scalar and vector excitations are determined and found to be larger in the symmetric than in the broken phase. The vector mass is considerably larger than the scalar one, which suggests a further simplification to a scalar effective theory at large Higgs masses. The use of consistent 1-loop relations between 3d parameters and 4d physics permits one to convert the 3d simulation results to quantitatively accurate numbers for different physical theories, such as the Standard Model -- excluding possible nonperturbative effects of the U(1) subgroup -- for Higgs masses up to about 70 GeV. The applications of our results to cosmology are discussed.Comment: 69 pages, 48 figures as uuencoded compressed postscrip
    • 

    corecore