467 research outputs found
Using WordNet for Building WordNets
This paper summarises a set of methodologies and techniques for the fast
construction of multilingual WordNets. The English WordNet is used in this
approach as a backbone for Catalan and Spanish WordNets and as a lexical
knowledge resource for several subtasks.Comment: 8 pages, postscript file. In workshop on Usage of WordNet in NL
Indexing with WordNet synsets can improve Text Retrieval
The classical, vector space model for text retrieval is shown to give better
results (up to 29% better in our experiments) if WordNet synsets are chosen as
the indexing space, instead of word forms. This result is obtained for a
manually disambiguated test collection (of queries and documents) derived from
the Semcor semantic concordance. The sensitivity of retrieval performance to
(automatic) disambiguation errors when indexing documents is also measured.
Finally, it is observed that if queries are not disambiguated, indexing by
synsets performs (at best) only as good as standard word indexing.Comment: 7 pages, LaTeX2e, 3 eps figures, uses epsfig, colacl.st
Experiments on applying relaxation labeling to map multilingual hierarchies
This paper explores the automatic construction of a multilingual
Lexical Knowledge Base from preexisting lexical resources. This paper
presents a new approach for linking already existing hierarchies. The
Relaxation labeling algorithm is used to select --among all the
candidate connections proposed by a bilingual dictionary-- the right
conection for each node in the taxonomy.Postprint (published version
Methodology and evaluation of the Galician WordNet expansion with the WN-Toolkit
In this paper the methodology and a detailed evaluation of the results of the expansion of the Galician WordNet using the WN-Toolkit are presented. This toolkit allows the creation and expansion of wordnets using the expand model. In our experiments we have used methodologies based on dictionaries and parallel corpora. The evaluation of the results has been performed both in an automatic and in a manual way, allowing a comparison of the precision values obtained with both evaluation procedures. The manual evaluation provides details about the source of the errors. This information has been very useful for the improvement of the toolkit and for the correction of some errors in the reference WordNet for Galician.En este artículo se presenta la metodología utilizada en la expansión del WordNet del gallego mediante el WN-Toolkit, así como una evaluación detallada de los resultados obtenidos. El conjunto de herramientas incluido en el WN-Toolkit permite la creación o expansión de wordnets siguiendo la estrategia de expansión. En los experimentos presentados en este artículo se han utilizado estrategias basadas en diccionarios y en corpus paralelos. La evaluación de los resultados se ha realizado de manera tanto automática como manual, permitiendo así la comparación de los valores de precisión obtenidos. La evaluación manual también detalla la fuente de los errores, lo que ha sido de utilidad tanto para mejorar el propio WN-Toolkit, como para corregir los errores del WordNet de referencia para el gallego.En aquest article es presenta la metodologia utilitzada en l'expansió del WordNet del gallec mitjançant el WN-Toolkit, així com una avaluació detallada dels resultats obtinguts. El conjunt d'eines inclòs en el WN-Toolkit permet la creació o expansió de wordnets seguint l'estratègia d'expansió. En els experiments presentats en aquest article s'han utilitzat estratègies basades en diccionaris i en corpus paral·lels. L'avaluació dels resultats s'ha realitzat de manera tant automàtica com a manual, permetent així la comparació dels valors de precisió obtinguts. L'avaluació manual també detalla la font dels errors, la qual cosa ha estat d'utilitat tant per millorar el propi WN-Toolkit, com per corregir els errors del WordNet de referència per al gallec
Normalized Information Distance
The normalized information distance is a universal distance measure for
objects of all kinds. It is based on Kolmogorov complexity and thus
uncomputable, but there are ways to utilize it. First, compression algorithms
can be used to approximate the Kolmogorov complexity if the objects have a
string representation. Second, for names and abstract concepts, page count
statistics from the World Wide Web can be used. These practical realizations of
the normalized information distance can then be applied to machine learning
tasks, expecially clustering, to perform feature-free and parameter-free data
mining. This chapter discusses the theoretical foundations of the normalized
information distance and both practical realizations. It presents numerous
examples of successful real-world applications based on these distance
measures, ranging from bioinformatics to music clustering to machine
translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in:
Information Theory and Statistical Learning, Eds. M. Dehmer, F.
Emmert-Streib, Springer-Verlag, New-York, To appea
Linking a domain thesaurus to WordNet and conversion to WordNet-LMF
We present a methodology to link domain
thesauri to general-domain lexica. This is
applied in the framework of the KYOTO
project to link the Species2000 thesaurus
to the synsets of the English WordNet.
Moreover, we study the formalisation of
this thesaurus according to the ISO LMF
standard and its dialect WordNet-LMF.
This conversion will allow Species2000
to communicate with the other resources
available in the KYOTO architecture.Peer ReviewedPostprint (published version
Metodología y evaluación de la expansión del WordNet del gallego con WN-Toolkit
In this paper the methodology and a detailed evaluation of the results of the expansion of the Galician WordNet using the WN-Toolkit are presented. This toolkit allows the creation and expansion of wordnets using the expand model. In our experiments we have used methodologies based on dictionaries and parallel corpora. The evaluation of the results has been performed both in an automatic and in a manual way, allowing a comparison of the precision values obtained with both evaluation procedures. The manual evaluation provides details about the source of the errors. This information has been very useful for the improvement of the toolkit and for the correction of some errors in the reference WordNet for Galician.En este artículo se presenta la metodología utilizada en la expansión del WordNet del gallego mediante el WN-Toolkit, así como una evaluación detallada de los resultados obtenidos. El conjunto de herramientas incluido en el WN-Toolkit permite la creación o expansión de wordnets siguiendo la estrategia de expansión. En los experimentos presentados en este artículo se han utilizado estrategias basadas en diccionarios y en corpus paralelos. La evaluación de los resultados se ha realizado de manera tanto automática como manual, permitiendo así la comparación de los valores de precisión obtenidos. La evaluación manual también detalla la fuente de los errores, lo que ha sido de utilidad tanto para mejorar el propio WN-Toolkit, como para corregir los errores del WordNet de referencia para el gallego.This research has been carried out thanks to the Project SKATeR (TIN2012-38584-C06-01 and TIN2012-38584-C06-04) supported by the Ministry of Economy and Competitiveness of the Spanish Government
- …