27 research outputs found
A High Accuracy Method for Semi-supervised Information Extraction
Customization to specific domains of dis-course and/or user requirements is one of the greatest challenges for today’s Information Extraction (IE) systems. While demonstrably effective, both rule-based and supervised machine learning approaches to IE customization pose too high a burden on the user. Semi-supervised learning approaches may in principle offer a more resource effective solution but are still insufficiently accurate to grant realistic application. We demonstrate that this limitation can be overcome by integrating fully-supervised learning techniques within a semi-supervised IE approach, without increasing resource requirements
Word Domain Disambiguation via Word Sense Disambiguation
Word subject domains have been widely used to improve the perform-ance of word sense disambiguation al-gorithms. However, comparatively little effort has been devoted so far to the disambiguation of word subject do-mains. The few existing approaches have focused on the development of al-gorithms specific to word domain dis-ambiguation. In this paper we explore an alternative approach where word domain disambiguation is achieved via word sense disambiguation. Our study shows that this approach yields very strong results, suggesting that word domain disambiguation can be ad-dressed in terms of word sense disam-biguation with no need for special purpose algorithms
Recommended from our members
Integrating Ontological Knowledge and Textual Evidence in Estimating Gene and Gene Product Similarity
With the rising influence of the Gene On-tology, new approaches have emerged where the similarity between genes or gene products is obtained by comparing Gene Ontology code annotations associ-ated with them. So far, these approaches have solely relied on the knowledge en-coded in the Gene Ontology and the gene annotations associated with the Gene On-tology database. The goal of this paper is to demonstrate that improvements to these approaches can be obtained by integrating textual evidence extracted from relevant biomedical literature
Recommended from our members
PNNL: A Supervised Maximum Entropy Approach to Word Sense Disambiguation
In this paper, we described the PNNL Word Sense Disambiguation system as applied to the English All-Word task in Se-mEval 2007. We use a supervised learning approach, employing a large number of features and using Information Gain for dimension reduction. Our Maximum Entropy approach combined with a rich set of features produced results that are significantly better than baseline and are the highest F-score for the fined-grained English All-Words subtask
Recommended from our members
Ontological Annotation with WordNet
Semantic Web applications require robust and accurate annotation tools that are capable of automating the assignment of ontological classes to words in naturally occurring text (ontological annotation). Most current ontologies do not include rich lexical databases and are therefore not easily integrated with word sense disambiguation algorithms that are needed to automate ontological annotation. WordNet provides a potentially ideal solution to this problem as it offers a highly structured lexical conceptual representation that has been extensively used to develop word sense disambiguation algorithms. However, WordNet has not been designed as an ontology, and while it can be easily turned into one, the result of doing this would present users with serious practical limitations due to the great number of concepts (synonym sets) it contains. Moreover, mapping WordNet to an existing ontology may be difficult and requires substantial labor. We propose to overcome these limitations by developing an analytical platform that (1) provides a WordNet-based ontology offering a manageable and yet comprehensive set of concept classes, (2) leverages the lexical richness of WordNet to give an extensive characterization of concept class in terms of lexical instances, and (3) integrates a class recognition algorithm that automates the assignment of concept classes to words in naturally occurring text. The ensuing framework makes available an ontological annotation platform that can be effectively integrated with intelligence analysis systems to facilitate evidence marshaling and sustain the creation and validation of inference models