67 research outputs found
Maximum Likelihood Estimation in Gaussian Chain Graph Models under the Alternative Markov Property
The AMP Markov property is a recently proposed alternative Markov property
for chain graphs. In the case of continuous variables with a joint multivariate
Gaussian distribution, it is the AMP rather than the earlier introduced LWF
Markov property that is coherent with data-generation by natural
block-recursive regressions. In this paper, we show that maximum likelihood
estimates in Gaussian AMP chain graph models can be obtained by combining
generalized least squares and iterative proportional fitting to an iterative
algorithm. In an appendix, we give useful convergence results for iterative
partial maximization algorithms that apply in particular to the described
algorithm.Comment: 15 pages, article will appear in Scandinavian Journal of Statistic
Binary Models for Marginal Independence
Log-linear models are a classical tool for the analysis of contingency
tables. In particular, the subclass of graphical log-linear models provides a
general framework for modelling conditional independences. However, with the
exception of special structures, marginal independence hypotheses cannot be
accommodated by these traditional models. Focusing on binary variables, we
present a model class that provides a framework for modelling marginal
independences in contingency tables. The approach taken is graphical and draws
on analogies to multivariate Gaussian models for marginal independence. For the
graphical model representation we use bi-directed graphs, which are in the
tradition of path diagrams. We show how the models can be parameterized in a
simple fashion, and how maximum likelihood estimation can be performed using a
version of the Iterated Conditional Fitting algorithm. Finally we consider
combining these models with symmetry restrictions
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Inference in graphical Gaussian models with edge and vertex symmetries with the gRc package for R
In this paper we present the R package gRc for statistical inference in graphical Gaussian models in which symmetry restrictions have been imposed on the concentration or partial correlation matrix. The models are represented by coloured graphs where parameters associated with edges or vertices of same colour are restricted to being identical. We describe algorithms for maximum likelihood estimation and discuss model selection issues. The paper illustrates the practical use of the gRc package
- …