3,260 research outputs found
High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion
We consider the problem of high-dimensional Gaussian graphical model
selection. We identify a set of graphs for which an efficient estimation
algorithm exists, and this algorithm is based on thresholding of empirical
conditional covariances. Under a set of transparent conditions, we establish
structural consistency (or sparsistency) for the proposed algorithm, when the
number of samples n=omega(J_{min}^{-2} log p), where p is the number of
variables and J_{min} is the minimum (absolute) edge potential of the graphical
model. The sufficient conditions for sparsistency are based on the notion of
walk-summability of the model and the presence of sparse local vertex
separators in the underlying graph. We also derive novel non-asymptotic
necessary conditions on the number of samples required for sparsistency
Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates
The problem of learning forest-structured discrete graphical models from
i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu
tree through adaptive thresholding is proposed. It is shown that this algorithm
is both structurally consistent and risk consistent and the error probability
of structure learning decays faster than any polynomial in the number of
samples under fixed model size. For the high-dimensional scenario where the
size of the model d and the number of edges k scale with the number of samples
n, sufficient conditions on (n,d,k) are given for the algorithm to satisfy
structural and risk consistencies. In addition, the extremal structures for
learning are identified; we prove that the independent (resp. tree) model is
the hardest (resp. easiest) to learn using the proposed algorithm in terms of
error rates for structure learning.Comment: Accepted to the Journal of Machine Learning Research (Feb 2011
Learning Latent Tree Graphical Models
We study the problem of learning a latent tree graphical model where samples
are available only from a subset of variables. We propose two consistent and
computationally efficient algorithms for learning minimal latent trees, that
is, trees without any redundant hidden nodes. Unlike many existing methods, the
observed nodes (or variables) are not constrained to be leaf nodes. Our first
algorithm, recursive grouping, builds the latent tree recursively by
identifying sibling groups using so-called information distances. One of the
main contributions of this work is our second algorithm, which we refer to as
CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree
over the observed variables is constructed. This global step groups the
observed nodes that are likely to be close to each other in the true latent
tree, thereby guiding subsequent recursive grouping (or equivalent procedures)
on much smaller subsets of variables. This results in more accurate and
efficient learning of latent trees. We also present regularized versions of our
algorithms that learn latent tree approximations of arbitrary distributions. We
compare the proposed algorithms to other methods by performing extensive
numerical experiments on various latent tree graphical models such as hidden
Markov models and star graphs. In addition, we demonstrate the applicability of
our methods on real-world datasets by modeling the dependency structure of
monthly stock returns in the S&P index and of the words in the 20 newsgroups
dataset
Methyl 2-(4-ferrocenylbenzamido)thiophene-3-carboxylate and ethyl 2-(4-ferrocenylbenzamido)-1,3-thiazole-4-acetate, a unique ferrocen
The conformations and hydrogen bonding in the thiophene and thiazole title compounds, [Fe(C₅H₅)(C₂₀H₁₄NO₃S)], (I), and [Fe(C₅H₅)(C₁₉H₁₇N₂O₃S)], (II), are discussed. The sequence (C₅H₄)-(C₆H₄)-(CONH)-(C₄H₂S)-(CO₂Me) of rings and moieties in (I) is close to being planar; all consecutive interplanar angles are less than 10°. An intramolecular N-H...O=Cester hydrogen bond [graph set S(6), N...O = 2.768 (2) Å and N-H...O = 134 (2)°] effects the molecular planarity, and aggregation occurs via hydrogen-bonded chains formed from intermolecular Car-H...O=Cester/amide interactions along [010], with C...O distances ranging from 3.401 (3) to 3.577 (2) Å. The thiazole system in (II) crystallizes with two molecules in the asymmetric unit; these differ in the conformation along their long molecular axes; for example, the interplanar angle between the phenylene (C₆H₄) and thiazole (C₃NS) rings is 8.1 (2)° in one molecule and 27.66 (14)° in the other. Intermolecular N-H...O=Cester hydrogen bonds [N...O = 2.972 (4) and 2.971 (3) Å], each augmented by a Cphenylene-H...O=Cester interaction [3.184 (5) and 3.395 (4) Å], form motifs with graph set R¹₂(7) and generate chains along [100]. The amide C=O groups do not participate in hydrogen bonding. Compound (II) is the first reported ferrocenyl-containing thiazole structure
Synthesis and characterisation of novel ferrocenyl thienyl and thiazolyl systems
Ferrocenyl derivatives are currently under investigation by our group and several series containing both amidothienyl and amidothiazolyl systems have been synthesised and characterised. The incorporation of thienyl/thiazolyl groups into a ferrocenyl- or ferrocenylphenyl system greatly enhances the number of potential donor atoms for coordination with metal fragments e.g. PtII, PdII with a view to platinum anti-cancer studies and/or interaction with guest molecules through suitable hydrogen bonding interactions.
In nature, thiazole has been found to be vital in certain natural products: examples include the antibiotic bacitracin and the siderophore yersiniabactin. In therapeutic studies the antitumour compound epothilone A and myxothiazole (inhibitor) have been extensively studied
High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. samples from the model. We propose an efficient threshold-based algorithm for structure estimation based known as conditional mutual information test. This simple local algorithm requires only low-order statistics of the data and decides whether two nodes are neighbors in the unknown graph. Under some transparent assumptions, we establish that the proposed algorithm is structurally consistent (or sparsistent) when the number of samples scales as n= Omega(J_{min}^{-4} log p), where p is the number of nodes and J_{min} is the minimum edge potential. We also prove novel non-asymptotic necessary conditions for graphical model selection.United States. Air Force Office of Scientific Research (Grant FA9550-08-1-1080
High-dimensional structure estimation in Ising models: Local separation criterion
We consider the problem of high-dimensional Ising (graphical) model
selection. We propose a simple algorithm for structure estimation based on the
thresholding of the empirical conditional variation distances. We introduce a
novel criterion for tractable graph families, where this method is efficient,
based on the presence of sparse local separators between node pairs in the
underlying graph. For such graphs, the proposed algorithm has a sample
complexity of , where is the number of
variables, and is the minimum (absolute) edge potential in the
model. We also establish nonasymptotic necessary and sufficient conditions for
structure estimation.Comment: Published in at http://dx.doi.org/10.1214/12-AOS1009 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures
The problem of maximum-likelihood (ML) estimation of discrete tree-structured
distributions is considered. Chow and Liu established that ML-estimation
reduces to the construction of a maximum-weight spanning tree using the
empirical mutual information quantities as the edge weights. Using the theory
of large-deviations, we analyze the exponent associated with the error
probability of the event that the ML-estimate of the Markov tree structure
differs from the true tree structure, given a set of independently drawn
samples. By exploiting the fact that the output of ML-estimation is a tree, we
establish that the error exponent is equal to the exponential rate of decay of
a single dominant crossover event. We prove that in this dominant crossover
event, a non-neighbor node pair replaces a true edge of the distribution that
is along the path of edges in the true tree graph connecting the nodes in the
non-neighbor pair. Using ideas from Euclidean information theory, we then
analyze the scenario of ML-estimation in the very noisy learning regime and
show that the error exponent can be approximated as a ratio, which is
interpreted as the signal-to-noise ratio (SNR) for learning tree distributions.
We show via numerical experiments that in this regime, our SNR approximation is
accurate.Comment: Accepted to the IEEE Transactions on Information Theory on Nov 18,
201
The L2L System for Second Language Learning Using Visualised Zoom Calls Among Students
An important part of second language learning is conversation which is best
practised with speakers whose native language is the language being learned. We
facilitate this by pairing students from different countries learning each
others' native language. Mixed groups of students have Zoom calls, half in one
language and half in the other, in order to practice and improve their
conversation skills. We use Zoom video recordings with audio transcripts
enabled which generates recognised speech from which we extract timestamped
utterances and calculate and visualise conversation metrics on a dashboard. A
timeline highlights each utterance, colour coded per student, with links to the
video in a playback window. L2L was deployed for a semester and recorded almost
250 hours of zoom meetings. The conversation metrics visualised on the
dashboard are a beneficial asset for both students and lecturers.Comment: 16th European Conference on Technology-Enhanced Learning (EC-TEL),
Bozen-Bolzano, Italy (online), September 202
Word matching using single closed contours for indexing handwritten historical documents
Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature
- …