3,260 research outputs found

    High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

    Full text link
    We consider the problem of high-dimensional Gaussian graphical model selection. We identify a set of graphs for which an efficient estimation algorithm exists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structural consistency (or sparsistency) for the proposed algorithm, when the number of samples n=omega(J_{min}^{-2} log p), where p is the number of variables and J_{min} is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walk-summability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel non-asymptotic necessary conditions on the number of samples required for sparsistency

    Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

    Get PDF
    The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n,d,k) are given for the algorithm to satisfy structural and risk consistencies. In addition, the extremal structures for learning are identified; we prove that the independent (resp. tree) model is the hardest (resp. easiest) to learn using the proposed algorithm in terms of error rates for structure learning.Comment: Accepted to the Journal of Machine Learning Research (Feb 2011

    Learning Latent Tree Graphical Models

    Get PDF
    We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world datasets by modeling the dependency structure of monthly stock returns in the S&P index and of the words in the 20 newsgroups dataset

    Methyl 2-(4-ferrocenylbenzamido)thiophene-3-carboxylate and ethyl 2-(4-ferrocenylbenzamido)-1,3-thiazole-4-acetate, a unique ferrocen

    Get PDF
    The conformations and hydrogen bonding in the thiophene and thiazole title compounds, [Fe(C₅H₅)(C₂₀H₁₄NO₃S)], (I), and [Fe(C₅H₅)(C₁₉H₁₇N₂O₃S)], (II), are discussed. The sequence (C₅H₄)-(C₆H₄)-(CONH)-(C₄H₂S)-(CO₂Me) of rings and moieties in (I) is close to being planar; all consecutive interplanar angles are less than 10°. An intramolecular N-H...O=Cester hydrogen bond [graph set S(6), N...O = 2.768 (2) Å and N-H...O = 134 (2)°] effects the molecular planarity, and aggregation occurs via hydrogen-bonded chains formed from intermolecular Car-H...O=Cester/amide interactions along [010], with C...O distances ranging from 3.401 (3) to 3.577 (2) Å. The thiazole system in (II) crystallizes with two molecules in the asymmetric unit; these differ in the conformation along their long molecular axes; for example, the interplanar angle between the phenylene (C₆H₄) and thiazole (C₃NS) rings is 8.1 (2)° in one molecule and 27.66 (14)° in the other. Intermolecular N-H...O=Cester hydrogen bonds [N...O = 2.972 (4) and 2.971 (3) Å], each augmented by a Cphenylene-H...O=Cester interaction [3.184 (5) and 3.395 (4) Å], form motifs with graph set R¹₂(7) and generate chains along [100]. The amide C=O groups do not participate in hydrogen bonding. Compound (II) is the first reported ferrocenyl-containing thiazole structure

    Synthesis and characterisation of novel ferrocenyl thienyl and thiazolyl systems

    Get PDF
    Ferrocenyl derivatives are currently under investigation by our group and several series containing both amidothienyl and amidothiazolyl systems have been synthesised and characterised. The incorporation of thienyl/thiazolyl groups into a ferrocenyl- or ferrocenylphenyl system greatly enhances the number of potential donor atoms for coordination with metal fragments e.g. PtII, PdII with a view to platinum anti-cancer studies and/or interaction with guest molecules through suitable hydrogen bonding interactions. In nature, thiazole has been found to be vital in certain natural products: examples include the antibiotic bacitracin and the siderophore yersiniabactin. In therapeutic studies the antitumour compound epothilone A and myxothiazole (inhibitor) have been extensively studied

    High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions

    Get PDF
    We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. samples from the model. We propose an efficient threshold-based algorithm for structure estimation based known as conditional mutual information test. This simple local algorithm requires only low-order statistics of the data and decides whether two nodes are neighbors in the unknown graph. Under some transparent assumptions, we establish that the proposed algorithm is structurally consistent (or sparsistent) when the number of samples scales as n= Omega(J_{min}^{-4} log p), where p is the number of nodes and J_{min} is the minimum edge potential. We also prove novel non-asymptotic necessary conditions for graphical model selection.United States. Air Force Office of Scientific Research (Grant FA9550-08-1-1080

    High-dimensional structure estimation in Ising models: Local separation criterion

    Get PDF
    We consider the problem of high-dimensional Ising (graphical) model selection. We propose a simple algorithm for structure estimation based on the thresholding of the empirical conditional variation distances. We introduce a novel criterion for tractable graph families, where this method is efficient, based on the presence of sparse local separators between node pairs in the underlying graph. For such graphs, the proposed algorithm has a sample complexity of n=Ω(Jmin2logp)n=\Omega(J_{\min}^{-2}\log p), where pp is the number of variables, and JminJ_{\min} is the minimum (absolute) edge potential in the model. We also establish nonasymptotic necessary and sufficient conditions for structure estimation.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1009 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures

    Get PDF
    The problem of maximum-likelihood (ML) estimation of discrete tree-structured distributions is considered. Chow and Liu established that ML-estimation reduces to the construction of a maximum-weight spanning tree using the empirical mutual information quantities as the edge weights. Using the theory of large-deviations, we analyze the exponent associated with the error probability of the event that the ML-estimate of the Markov tree structure differs from the true tree structure, given a set of independently drawn samples. By exploiting the fact that the output of ML-estimation is a tree, we establish that the error exponent is equal to the exponential rate of decay of a single dominant crossover event. We prove that in this dominant crossover event, a non-neighbor node pair replaces a true edge of the distribution that is along the path of edges in the true tree graph connecting the nodes in the non-neighbor pair. Using ideas from Euclidean information theory, we then analyze the scenario of ML-estimation in the very noisy learning regime and show that the error exponent can be approximated as a ratio, which is interpreted as the signal-to-noise ratio (SNR) for learning tree distributions. We show via numerical experiments that in this regime, our SNR approximation is accurate.Comment: Accepted to the IEEE Transactions on Information Theory on Nov 18, 201

    The L2L System for Second Language Learning Using Visualised Zoom Calls Among Students

    Get PDF
    An important part of second language learning is conversation which is best practised with speakers whose native language is the language being learned. We facilitate this by pairing students from different countries learning each others' native language. Mixed groups of students have Zoom calls, half in one language and half in the other, in order to practice and improve their conversation skills. We use Zoom video recordings with audio transcripts enabled which generates recognised speech from which we extract timestamped utterances and calculate and visualise conversation metrics on a dashboard. A timeline highlights each utterance, colour coded per student, with links to the video in a playback window. L2L was deployed for a semester and recorded almost 250 hours of zoom meetings. The conversation metrics visualised on the dashboard are a beneficial asset for both students and lecturers.Comment: 16th European Conference on Technology-Enhanced Learning (EC-TEL), Bozen-Bolzano, Italy (online), September 202

    Word matching using single closed contours for indexing handwritten historical documents

    Get PDF
    Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature
    corecore