7,836 research outputs found

    The IBMAP approach for Markov networks structure learning

    Full text link
    In this work we consider the problem of learning the structure of Markov networks from data. We present an approach for tackling this problem called IBMAP, together with an efficient instantiation of the approach: the IBMAP-HC algorithm, designed for avoiding important limitations of existing independence-based algorithms. These algorithms proceed by performing statistical independence tests on data, trusting completely the outcome of each test. In practice tests may be incorrect, resulting in potential cascading errors and the consequent reduction in the quality of the structures learned. IBMAP contemplates this uncertainty in the outcome of the tests through a probabilistic maximum-a-posteriori approach. The approach is instantiated in the IBMAP-HC algorithm, a structure selection strategy that performs a polynomial heuristic local search in the space of possible structures. We present an extensive empirical evaluation on synthetic and real data, showing that our algorithm outperforms significantly the current independence-based algorithms, in terms of data efficiency and quality of learned structures, with equivalent computational complexities. We also show the performance of IBMAP-HC in a real-world application of knowledge discovery: EDAs, which are evolutionary algorithms that use structure learning on each generation for modeling the distribution of populations. The experiments show that when IBMAP-HC is used to learn the structure, EDAs improve the convergence to the optimum

    Describing the complexity of systems: multi-variable "set complexity" and the information basis of systems biology

    Full text link
    Context dependence is central to the description of complexity. Keying on the pairwise definition of "set complexity" we use an information theory approach to formulate general measures of systems complexity. We examine the properties of multi-variable dependency starting with the concept of interaction information. We then present a new measure for unbiased detection of multi-variable dependency, "differential interaction information." This quantity for two variables reduces to the pairwise "set complexity" previously proposed as a context-dependent measure of information in biological systems. We generalize it here to an arbitrary number of variables. Critical limiting properties of the "differential interaction information" are key to the generalization. This measure extends previous ideas about biological information and provides a more sophisticated basis for study of complexity. The properties of "differential interaction information" also suggest new approaches to data analysis. Given a data set of system measurements differential interaction information can provide a measure of collective dependence, which can be represented in hypergraphs describing complex system interaction patterns. We investigate this kind of analysis using simulated data sets. The conjoining of a generalized set complexity measure, multi-variable dependency analysis, and hypergraphs is our central result. While our focus is on complex biological systems, our results are applicable to any complex system.Comment: 44 pages, 12 figures; made revisions after peer revie

    Testing for Collusion in Asymmetric First-Price Auctions

    Get PDF
    This paper proposes fully nonparametric tests to detect possible collusion in first-price procurement (auctions). The aim of the tests is to detect possible collusion before knowing whether or not bidders are colluding. Thus we do not rely on data on anti-competitive hearing, and in that sense is ’ex-ante’. We propose a two steps (model selection) procedure: First, we use a reduced form test of independence and symmetry to shortlist bidders whose bidding behavior is at-odds with competitive bidding, and Second, the recovered (latent) cost for these bidders must be higher under collusion than under competition, because collusion dwarfs competition, hence detecting collusion boils down to testing if the estimated cost distribution under collusion first order stochastically dominates that under competition. We propose rank based and Kolmogorov-Smirnov (K-S) tests. We implement the tests for Highway Procurement data in California and conclude that there is no evidence of collusion even though the reduced form test supports collusion.

    Learning Material-Aware Local Descriptors for 3D Shapes

    Full text link
    Material understanding is critical for design, geometric modeling, and analysis of functional objects. We enable material-aware 3D shape analysis by employing a projective convolutional neural network architecture to learn material- aware descriptors from view-based representations of 3D points for point-wise material classification or material- aware retrieval. Unfortunately, only a small fraction of shapes in 3D repositories are labeled with physical mate- rials, posing a challenge for learning methods. To address this challenge, we crowdsource a dataset of 3080 3D shapes with part-wise material labels. We focus on furniture models which exhibit interesting structure and material variabil- ity. In addition, we also contribute a high-quality expert- labeled benchmark of 115 shapes from Herman-Miller and IKEA for evaluation. We further apply a mesh-aware con- ditional random field, which incorporates rotational and reflective symmetries, to smooth our local material predic- tions across neighboring surface patches. We demonstrate the effectiveness of our learned descriptors for automatic texturing, material-aware retrieval, and physical simulation. The dataset and code will be publicly available.Comment: 3DV 201

    Normalized Information Distance

    Get PDF
    The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in: Information Theory and Statistical Learning, Eds. M. Dehmer, F. Emmert-Streib, Springer-Verlag, New-York, To appea
    • 

    corecore