612 research outputs found

    Learning Latent Tree Graphical Models

    Get PDF
    We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world datasets by modeling the dependency structure of monthly stock returns in the S&P index and of the words in the 20 newsgroups dataset

    Phase transition in the sample complexity of likelihood-based phylogeny inference

    Full text link
    Reconstructing evolutionary trees from molecular sequence data is a fundamental problem in computational biology. Stochastic models of sequence evolution are closely related to spin systems that have been extensively studied in statistical physics and that connection has led to important insights on the theoretical properties of phylogenetic reconstruction algorithms as well as the development of new inference methods. Here, we study maximum likelihood, a classical statistical technique which is perhaps the most widely used in phylogenetic practice because of its superior empirical accuracy. At the theoretical level, except for its consistency, that is, the guarantee of eventual correct reconstruction as the size of the input data grows, much remains to be understood about the statistical properties of maximum likelihood in this context. In particular, the best bounds on the sample complexity or sequence-length requirement of maximum likelihood, that is, the amount of data required for correct reconstruction, are exponential in the number, nn, of tips---far from known lower bounds based on information-theoretic arguments. Here we close the gap by proving a new upper bound on the sequence-length requirement of maximum likelihood that matches up to constants the known lower bound for some standard models of evolution. More specifically, for the rr-state symmetric model of sequence evolution on a binary phylogeny with bounded edge lengths, we show that the sequence-length requirement behaves logarithmically in nn when the expected amount of mutation per edge is below what is known as the Kesten-Stigum threshold. In general, the sequence-length requirement is polynomial in nn. Our results imply moreover that the maximum likelihood estimator can be computed efficiently on randomly generated data provided sequences are as above.Comment: To appear in Probability Theory and Related Field

    The economic importance of being educated

    Get PDF
    Educating people may not sound terribly urgent during difficult economic times, but when it comes to creating jobs and finding people who have the skills to fill them, nothing is more important than education. The latest issue of Forefront presents a package of articles focused on the compelling returns to education.Education - Economic aspects

    Deep Image Retrieval: A Survey

    Get PDF
    In recent years a vast amount of visual content has been generated and shared from various fields, such as social media platforms, medical images, and robotics. This abundance of content creation and sharing has introduced new challenges. In particular, searching databases for similar content, i.e.content based image retrieval (CBIR), is a long-established research area, and more efficient and accurate methods are needed for real time retrieval. Artificial intelligence has made progress in CBIR and has significantly facilitated the process of intelligent search. In this survey we organize and review recent CBIR works that are developed based on deep learning algorithms and techniques, including insights and techniques from recent papers. We identify and present the commonly-used benchmarks and evaluation methods used in the field. We collect common challenges and propose promising future directions. More specifically, we focus on image retrieval with deep learning and organize the state of the art methods according to the types of deep network structure, deep features, feature enhancement methods, and network fine-tuning strategies. Our survey considers a wide variety of recent methods, aiming to promote a global view of the field of instance-based CBIR.Comment: 20 pages, 11 figure

    Advanced Soil Moisture Predictive Methodology in the Maize Cultivation Region using Hybrid Machine Learning Algorithms

    Get PDF
    The moisture level in the soil in which maize is grown is crucial to the plant's health and production. And over 60% of India's maize cultivation comes from the states of South India. Therefore, forecasting the soil moisture of maize will emerge as a crucial factor for regulating the cultivation of maize crops with optimal irrigation. In light of this, this research provides a unique Improved Hybridized Machine Learning (IHML) model, which combines and optimizes several ML models (base learners-BL). The convergence rate of all the considered BL approaches and the preciseness of the proposed approach significantly enhances the process of determining the appropriate parameters to attain the desirable outcome. Consequently, IHML contributes to an improvement in the accuracy of the overall forecast. This research collects data from districts in South India that are primarily committed to maize agriculture to develop a model. The correlation evaluations served as the basis for the model's framework and the parameter selection. This research compares the outcomes of BL models to the IHML model in depth to ensure the model's accuracy. Results reveal that the IHML performs exceptionally well in forecasting soil moisture, comprising Correlation Coefficient (R2) of 0.9762, Root Mean Square Error (RMSE) of 0.293, and Mean Absolute Error (MAE) of 0.731 at a depth of 10 cm. Conceptual IHML models could be used to make smart farming and precise irrigation much better
    • …
    corecore