Search CORE

22 research outputs found

Semantic models as metrics for kernel-based interaction identification

Author: Polajnar Tamara
Publication venue
Publication date: 01/01/2010
Field of study

Automatic detection of protein-protein interactions (PPIs) in biomedical publications is vital for efficient biological research. It also presents a host of new challenges for pattern recognition methodologies, some of which will be addressed by the research in this thesis. Proteins are the principal method of communication within a cell; hence, this area of research is strongly motivated by the needs of biologists investigating sub-cellular functions of organisms, diseases, and treatments. These researchers rely on the collaborative efforts of the entire field and communicate through experimental results published in reviewed biomedical journals. The substantial number of interactions detected by automated large-scale PPI experiments, combined with the ease of access to the digitised publications, has increased the number of results made available each day. The ultimate aim of this research is to provide tools and mechanisms to aid biologists and database curators in locating relevant information. As part of this objective this thesis proposes, studies, and develops new methodologies that go some way to meeting this grand challenge. Pattern recognition methodologies are one approach that can be used to locate PPI sentences; however, most accurate pattern recognition methods require a set of labelled examples to train on. For this particular task, the collection and labelling of training data is highly expensive. On the other hand, the digital publications provide a plentiful source of unlabelled data. The unlabelled data is used, along with word cooccurrence models, to improve classification using Gaussian processes, a probabilistic alternative to the state-of-the-art support vector machines. This thesis presents and systematically assesses the novel methods of using the knowledge implicitly encoded in biomedical texts and shows an improvement on the current approaches to PPI sentence detection

Glasgow Theses Service

Protein interaction sentence detection using multiple semantic kernels

Author: Damoulas Theodoros
Girolami Mark
Polajnar Tamara
Publication venue: BioMed Central
Publication date: 01/05/2011
Field of study

Abstract Background Detection of sentences that describe protein-protein interactions (PPIs) in biomedical publications is a challenging and unresolved pattern recognition problem. Many state-of-the-art approaches for this task employ kernel classification methods, in particular support vector machines (SVMs). In this work we propose a novel data integration approach that utilises semantic kernels and a kernel classification method that is a probabilistic analogue to SVMs. Semantic kernels are created from statistical information gathered from large amounts of unlabelled text using lexical semantic models. Several semantic kernels are then fused into an overall composite classification space. In this initial study, we use simple features in order to examine whether the use of combinations of kernels constructed using word-based semantic models can improve PPI sentence detection. Results We show that combinations of semantic kernels lead to statistically significant improvements in recognition rates and receiver operating characteristic (ROC) scores over the plain Gaussian kernel, when applied to a well-known labelled collection of abstracts. The proposed kernel composition method also allows us to automatically infer the most discriminative kernels. Conclusions The results from this paper indicate that using semantic information from unlabelled text, and combinations of such information, can be valuable for classification of short texts such as PPI sentences. This study, however, is only a first step in evaluation of semantic kernels and probabilistic multiple kernel learning in the context of PPI detection. The method described herein is modular, and can be applied with a variety of feature types, kernels, and semantic models, in order to facilitate full extraction of interacting proteins.</p

Directory of Open Access Journals

PubMed Central

Using Sentence Plausibility to Learn the Semantics of Transitive Verbs

Author: Clark Stephen
Polajnar Tamara
Rimell Laura
Publication venue
Publication date: 12/12/2014
Field of study

The functional approach to compositional distributional semantics considers transitive verbs to be linear maps that transform the distributional vectors representing nouns into a vector representing a sentence. We conduct an initial investigation that uses a matrix consisting of the parameters of a logistic regression classifier trained on a plausibility task as a transitive verb function. We compare our method to a commonly used corpus-based method for constructing a verb matrix and find that the plausibility training may be more effective for disambiguation tasks.Comment: Full updated paper for NIPS learning semantics workshop, with some minor errata fixe

arXiv.org e-Print Archive

CiteSeerX

Semi-parametric analysis of multi-rater data

Author: A. Gelman
A.P. Dawid
C.K. Williams
J. Albert
J.H. Albert
J.S. Uebersax
M. Girolami
M.K. Cowles
Mark Girolami
S. Rogers
Simon Rogers
Tamara Polajnar
V. Johnson
V.E. Johnson
W. Chu
W.J. Wilbur
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Domain-independent term extraction through domain modelling

Author: Bordea Georgeta
Buitelaar Paul
Polajnar Tamara
Publication venue: 10th International Conference on Terminology and Artificial Intelligence
Publication date: 14/01/2014
Field of study

Conference paperExtracting general or intermediate level terms is a relevant problem that has not received much attention in literature. Current approaches for term extraction rely on contrastive corpora to identify domain-specific terms, which makes them better suited for specialised terms, that are rarely used outside of the domain. In this work,we propose an alternative measure of domain specificity based on term coherence with an automatically constructed domain model. Although previous systems make use of domain-independent features, their performance varies across domains, while our approach displays a more stable behaviour, with results comparable to, or better than, state-of-the-art methods.EU Grant No. 258191 (PROMISE project); Science Foundation Ireland Grant No. SFI/12/RC/228

Irish Universities

Access to Research at National University of Ireland, Galway