612 research outputs found
Learning Latent Tree Graphical Models
We study the problem of learning a latent tree graphical model where samples
are available only from a subset of variables. We propose two consistent and
computationally efficient algorithms for learning minimal latent trees, that
is, trees without any redundant hidden nodes. Unlike many existing methods, the
observed nodes (or variables) are not constrained to be leaf nodes. Our first
algorithm, recursive grouping, builds the latent tree recursively by
identifying sibling groups using so-called information distances. One of the
main contributions of this work is our second algorithm, which we refer to as
CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree
over the observed variables is constructed. This global step groups the
observed nodes that are likely to be close to each other in the true latent
tree, thereby guiding subsequent recursive grouping (or equivalent procedures)
on much smaller subsets of variables. This results in more accurate and
efficient learning of latent trees. We also present regularized versions of our
algorithms that learn latent tree approximations of arbitrary distributions. We
compare the proposed algorithms to other methods by performing extensive
numerical experiments on various latent tree graphical models such as hidden
Markov models and star graphs. In addition, we demonstrate the applicability of
our methods on real-world datasets by modeling the dependency structure of
monthly stock returns in the S&P index and of the words in the 20 newsgroups
dataset
Phase transition in the sample complexity of likelihood-based phylogeny inference
Reconstructing evolutionary trees from molecular sequence data is a
fundamental problem in computational biology. Stochastic models of sequence
evolution are closely related to spin systems that have been extensively
studied in statistical physics and that connection has led to important
insights on the theoretical properties of phylogenetic reconstruction
algorithms as well as the development of new inference methods. Here, we study
maximum likelihood, a classical statistical technique which is perhaps the most
widely used in phylogenetic practice because of its superior empirical
accuracy.
At the theoretical level, except for its consistency, that is, the guarantee
of eventual correct reconstruction as the size of the input data grows, much
remains to be understood about the statistical properties of maximum likelihood
in this context. In particular, the best bounds on the sample complexity or
sequence-length requirement of maximum likelihood, that is, the amount of data
required for correct reconstruction, are exponential in the number, , of
tips---far from known lower bounds based on information-theoretic arguments.
Here we close the gap by proving a new upper bound on the sequence-length
requirement of maximum likelihood that matches up to constants the known lower
bound for some standard models of evolution.
More specifically, for the -state symmetric model of sequence evolution on
a binary phylogeny with bounded edge lengths, we show that the sequence-length
requirement behaves logarithmically in when the expected amount of mutation
per edge is below what is known as the Kesten-Stigum threshold. In general, the
sequence-length requirement is polynomial in . Our results imply moreover
that the maximum likelihood estimator can be computed efficiently on randomly
generated data provided sequences are as above.Comment: To appear in Probability Theory and Related Field
The economic importance of being educated
Educating people may not sound terribly urgent during difficult economic times, but when it comes to creating jobs and finding people who have the skills to fill them, nothing is more important than education. The latest issue of Forefront presents a package of articles focused on the compelling returns to education.Education - Economic aspects
Deep Image Retrieval: A Survey
In recent years a vast amount of visual content has been generated and shared
from various fields, such as social media platforms, medical images, and
robotics. This abundance of content creation and sharing has introduced new
challenges. In particular, searching databases for similar content, i.e.content
based image retrieval (CBIR), is a long-established research area, and more
efficient and accurate methods are needed for real time retrieval. Artificial
intelligence has made progress in CBIR and has significantly facilitated the
process of intelligent search. In this survey we organize and review recent
CBIR works that are developed based on deep learning algorithms and techniques,
including insights and techniques from recent papers. We identify and present
the commonly-used benchmarks and evaluation methods used in the field. We
collect common challenges and propose promising future directions. More
specifically, we focus on image retrieval with deep learning and organize the
state of the art methods according to the types of deep network structure, deep
features, feature enhancement methods, and network fine-tuning strategies. Our
survey considers a wide variety of recent methods, aiming to promote a global
view of the field of instance-based CBIR.Comment: 20 pages, 11 figure
Advanced Soil Moisture Predictive Methodology in the Maize Cultivation Region using Hybrid Machine Learning Algorithms
The moisture level in the soil in which maize is grown is crucial to the plant's health and production. And over 60% of India's maize cultivation comes from the states of South India. Therefore, forecasting the soil moisture of maize will emerge as a crucial factor for regulating the cultivation of maize crops with optimal irrigation. In light of this, this research provides a unique Improved Hybridized Machine Learning (IHML) model, which combines and optimizes several ML models (base learners-BL). The convergence rate of all the considered BL approaches and the preciseness of the proposed approach significantly enhances the process of determining the appropriate parameters to attain the desirable outcome. Consequently, IHML contributes to an improvement in the accuracy of the overall forecast. This research collects data from districts in South India that are primarily committed to maize agriculture to develop a model. The correlation evaluations served as the basis for the model's framework and the parameter selection. This research compares the outcomes of BL models to the IHML model in depth to ensure the model's accuracy. Results reveal that the IHML performs exceptionally well in forecasting soil moisture, comprising Correlation Coefficient (R2) of 0.9762, Root Mean Square Error (RMSE) of 0.293, and Mean Absolute Error (MAE) of 0.731 at a depth of 10 cm. Conceptual IHML models could be used to make smart farming and precise irrigation much better
- …