Search CORE

15 research outputs found

Using WordNet for Building WordNets

Author: Farreres Xavier
Rigau German
Rodriguez Horacio
Publication venue
Publication date: 01/01/1997
Field of study

This paper summarises a set of methodologies and techniques for the fast construction of multilingual WordNets. The English WordNet is used in this approach as a backbone for Catalan and Spanish WordNets and as a lexical knowledge resource for several subtasks.Comment: 8 pages, postscript file. In workshop on Usage of WordNet in NL

arXiv.org e-Print Archive

CiteSeerX

A transversal approach to predict gene product networks from ontology-based similarity

Author: A Budanitsky
A Schlicker
A Singhal
Anita Burgun
C Wolting
D Lin
DS Harris
E Agirre
E Camon
E Levy
EB Camon
F Azuaje
FD Gibbons
FJ Field
G Rigau
G Salton
GO Consortium
H Bedrine-Ferran
H Sun
H Wang
IG Wool
J Chabalier
J Chabalier
J Jiang
Jean Mosser
JH Chiang
JM Mariadason
Julie Chabalier
M Gerstein
M Kanehisa
MB Eisen
MD Weiss
ME Brosnan
O Bodenreider
P Joseph
P Khatri
P Resnik
PW Lord
R Baeza-Yates
R Rada
RC Gentleman
T Barrett
T Nakajima
T Yamamoto
TK Jenssen
X Mao
Y Quentin
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Interpretation of transcriptomic data is usually made through a "standard" approach which consists in clustering the genes according to their expression patterns and exploiting Gene Ontology (GO) annotations within each expression cluster. This approach makes it difficult to underline functional relationships between gene products that belong to different expression clusters. To address this issue, we propose a transversal analysis that aims to predict functional networks based on a combination of GO processes and data expression. Results The transversal approach presented in this paper consists in computing the semantic similarity between gene products in a Vector Space Model. Through a weighting scheme over the annotations, we take into account the representativity of the terms that annotate a gene product. Comparing annotation vectors results in a matrix of gene product similarities. Combined with expression data, the matrix is displayed as a set of functional gene networks. The transversal approach was applied to 186 genes related to the enterocyte differentiation stages. This approach resulted in 18 functional networks proved to be biologically relevant. These results were compared with those obtained through a standard approach and with an approach based on information content similarity. Conclusion Complementary to the standard approach, the transversal approach offers new insight into the cellular mechanisms and reveals new research hypotheses by combining gene product networks based on semantic similarity, and data expression.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The interaction of knowledge sources in word sense disambiguation

Author: Brill Eric
Daelemans Walter
Daelemans Walter
Ide Nancy
Kilgarriff Adam
Marcus Mitchell
Mark Stevenson
Masterman Margaret
McRoy Susan
Yorick Wilks
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2001
Field of study

Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94% on our evaluation corpus.Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems

CiteSeerX

Crossref

White Rose Research Online

Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods

Author: Montoyo A.
Palomar M.
Rigau G.
Suarez A.
Publication venue: 'AI Access Foundation'
Publication date: 09/09/2011
Field of study

In this paper we concentrate on the resolution of the lexical ambiguity that arises when a given word has several different meanings. This specific task is commonly referred to as word sense disambiguation (WSD). The task of WSD consists of assigning the correct sense to words using an electronic dictionary as the source of word definitions. We present two WSD methods based on two main methodological approaches in this research area: a knowledge-based method and a corpus-based method. Our hypothesis is that word-sense disambiguation requires several knowledge sources in order to solve the semantic ambiguity of the words. These sources can be of different kinds--- for example, syntagmatic, paradigmatic or statistical information. Our approach combines various sources of knowledge, through combinations of the two WSD methods mentioned above. Mainly, the paper concentrates on how to combine these methods and sources of information in order to achieve good results in the disambiguation. Finally, this paper presents a comprehensive study and experimental work on evaluation of the methods and their combinations

arXiv.org e-Print Archive

Crossref

Multi Domain Semantic Information Retrieval Based on Topic Model

Author: Lee Sanghoon
Publication venue: ScholarWorks @ Georgia State University
Publication date: 07/05/2016
Field of study

Over the last decades, there have been remarkable shifts in the area of Information Retrieval (IR) as huge amount of information is increasingly accumulated on the Web. The gigantic information explosion increases the need for discovering new tools that retrieve meaningful knowledge from various complex information sources. Thus, techniques primarily used to search and extract important information from numerous database sources have been a key challenge in current IR systems. Topic modeling is one of the most recent techniquesthat discover hidden thematic structures from large data collections without human supervision. Several topic models have been proposed in various fields of study and have been utilized extensively for many applications. Latent Dirichlet Allocation (LDA) is the most well-known topic model that generates topics from large corpus of resources, such as text, images, and audio.It has been widely used in many areas in information retrieval and data mining, providing efficient way of identifying latent topics among document collections. However, LDA has a drawback that topic cohesion within a concept is attenuated when estimating infrequently occurring words. Moreover, LDAseems not to consider the meaning of words, but rather to infer hidden topics based on a statisticalapproach. However, LDA can cause either reduction in the quality of topic words or increase in loose relations between topics. In order to solve the previous problems, we propose a domain specific topic model that combines domain concepts with LDA. Two domain specific algorithms are suggested for solving the difficulties associated with LDA. The main strength of our proposed model comes from the fact that it narrows semantic concepts from broad domain knowledge to a specific one which solves the unknown domain problem. Our proposed model is extensively tested on various applications, query expansion, classification, and summarization, to demonstrate the effectiveness of the model. Experimental results show that the proposed model significantly increasesthe performance of applications

ScholarWorks @ Georgia State University

The Dutch Wordnet

Author: Bloksma L.
Boersma P.
Vossen P.J.T.M.
Publication venue: Amsterdam: Vrije Universiteit
Publication date: 01/01/1999
Field of study

VU Research Portal

Combining unsupervised lexical knowledge methods for word sense disambiguation

Author
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/1997
Field of study

Crossref

Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation

Author: Eneko Agirre
Euskal Herriko Unibertsitatea
German Rigau
Jordi Atserias
Publication venue
Publication date
Field of study

This paper presents a method to combine a set of unsupervised algorithms that can accurately disambiguate word senses in a large, completely untagged corpus. Although most of the techniques for word sense resolution have been presented as stand-alone, it is our belief that full-fledged lexical ambiguity resolution should combine several information sources and techniques. The set of techniques have been applied in a combined way to disambiguate the genus terms of two machine-readable dictionaries (MRD), enabling us to construct complete taxonomies for Spanish and French. Tested accuracy is above 80% overall and 95% for two-way ambiguous genus terms, showing that taxonomy building is not limited to structured dictionaries such as LDOCE. 1 Introduction While in English the "lexical bottleneck" problem (Briscoe, 1991) seems to be softened (e.g. WordNet (Miller, 1990), Alvey Lexicon (Grover et al., 1993), COMLEX (Grishman et al., 1994), etc.) there are no available wide range lexicons for..

CiteSeerX