Search CORE

3,988 research outputs found

Corpus-based Ontology Learning for Word Sense Disambiguation

Author: Kang Sin-Jae
Publication venue: COLIPS PUBLICATIONS
Publication date: 01/01/2003
Field of study

MultiFarm: A benchmark for multilingual ontology matching

Author: Andrei Tamilin
Christian Meilicke
Cássia Trojahn
Elena Montiel-Ponsoda
Euzenat
Euzenat
Fred Freitas
Fu
García-Castro
Giunchiglia
Heiner Stuckenschmidt
Jung
Neches
Niepert
Ondřej Šváb-Zamazal
Raúl García-Castro
Ryan Ribeiro de Azevedo
Shenghui Wang
Vojtěch Svátek
Wang
Willem Robert van Hage
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2012
Field of study

In this paper we present the MultiFarm dataset, which has been designed as a benchmark for multilingual ontology matching. The MultiFarm dataset is composed of a set of ontologies translated in different languages and the corresponding alignments between these ontologies. It is based on the OntoFarm dataset, which has been used successfully for several years in the Ontology Alignment Evaluation Initiative (OAEI). By translating the ontologies of the OntoFarm dataset into eight different languages – Chinese, Czech, Dutch, French, German, Portuguese, Russian, and Spanish – we created a comprehensive set of realistic test cases. Based on these test cases, it is possible to evaluate and compare the performance of matching approaches with a special focus on multilingualism

VU Research Portal

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

MAnnheim DOCument Server

Archivo Digital UPM

Natural language processing

Author: Adams
Amsler
Bangalore
Barker
Benoît
Bian
Bondale
Carrick
Ceric
Chandrasekar
Chang
Charniak
Chen
Chowdhury
Chowdhury
Costantino
Cowie
Craven
Craven
Craven
Dogru
Evans
Feldman
Fernandez
Gaizauskas
Glasgow
Haas
Hayes
Hayes
Hedlund
Herath
Ide
Isahara
Jelinek
Jeong
Jurafsky
Kazakov
Kehler
Khoo
Kim
King
Lange
Lee
Lehmam
Lehtokangas
Lewis
Liddy
Liddy
Lovis
Ma
Magnini
Mani
Manning
Marquez
Martinez
Martinez
McMurchie
Meyer
Mihalcea
Mock
Moens
Morin
Narita
Nerbonne
Oard
Ogura
Oudet
Owei
Paris
Pasero
Pedersen
Perez-Carballo
Petreley
Pirkola
Poesio
Rosenfield
Roux
Say
Scarlett
Schenker
Silber
Smeaton
Smeaton
Smith
Sokol
Song
Sparck Jones
Staab
Stock
Tolle
Trybula
Tsuda
Vickery
Waldrop
Warner
Weigard
Wilks
Wong
Yang
Yang
Zadrozny
Zweigenbaum
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

Crossref

University of Strathclyde Institutional Repository

OPUS - University of Technology Sydney

Basic Units of Lexicons and Ontologies: Words, Senses and Concepts

Author: Chae Hee-Rahk
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation

Author: Ferrandez A.
Peral J.
Publication venue: 'AI Access Foundation'
Publication date: 23/06/2011
Field of study

This paper evaluates the different tasks carried out in the translation of pronominal anaphora in a machine translation (MT) system. The MT interlingua approach named AGIR (Anaphora Generation with an Interlingua Representation) improves upon other proposals presented to date because it is able to translate intersentential anaphors, detect co-reference chains, and translate Spanish zero pronouns into English---issues hardly considered by other systems. The paper presents the resolution and evaluation of these anaphora problems in AGIR with the use of different kinds of knowledge (lexical, morphological, syntactic, and semantic). The translation of English and Spanish anaphoric third-person personal pronouns (including Spanish zero pronouns) into the target language has been evaluated on unrestricted corpora. We have obtained a precision of 80.4% and 84.8% in the translation of Spanish and English pronouns, respectively. Although we have only studied the Spanish and English languages, our approach can be easily extended to other languages such as Portuguese, Italian, or Japanese

arXiv.org e-Print Archive

Crossref

Proceedings of the COLING 2004 Post Conference Workshop on Multilingual Linguistic Ressources MLR2004

Author: Armstrong Susan
Boitet Christian
Popescu-Belis Andrei
Sérasset Gilles
Tufis Dan
Publication venue: COLING
Publication date: 01/01/2004
Field of study

International audienceIn an ever expanding information society, most information systems are now facing the "multilingual challenge". Multilingual language resources play an essential role in modern information systems. Such resources need to provide information on many languages in a common framework and should be (re)usable in many applications (for automatic or human use). Many centres have been involved in national and international projects dedicated to building har- monised language resources and creating expertise in the maintenance and further development of standardised linguistic data. These resources include dictionaries, lexicons, thesauri, word-nets, and annotated corpora developed along the lines of best practices and recommendations. However, since the late 90's, most efforts in scaling up these resources remain the responsibility of the local authorities, usually, with very low funding (if any) and few opportunities for academic recognition of this work. Hence, it is not surprising that many of the resource holders and developers have become reluctant to give free access to the latest versions of their resources, and their actual status is therefore currently rather unclear. The goal of this workshop is to study problems involved in the development, management and reuse of lexical resources in a multilingual context. Moreover, this workshop provides a forum for reviewing the present state of language resources. The workshop is meant to bring to the international community qualitative and quantitative information about the most recent developments in the area of linguistic resources and their use in applications. The impressive number of submissions (38) to this workshop and in other workshops and conferences dedicated to similar topics proves that dealing with multilingual linguistic ressources has become a very hot problem in the Natural Language Processing community. To cope with the number of submissions, the workshop organising committee decided to accept 16 papers from 10 countries based on the reviewers' recommendations. Six of these papers will be presented in a poster session. The papers constitute a representative selection of current trends in research on Multilingual Language Resources, such as multilingual aligned corpora, bilingual and multilingual lexicons, and multilingual speech resources. The papers also represent a characteristic set of approaches to the development of multilingual language resources, such as automatic extraction of information from corpora, combination and re-use of existing resources, online collaborative development of multilingual lexicons, and use of the Web as a multilingual language resource. The development and management of multilingual language resources is a long-term activity in which collaboration among researchers is essential. We hope that this workshop will gather many researchers involved in such developments and will give them the opportunity to discuss, exchange, compare their approaches and strengthen their collaborations in the field. The organisation of this workshop would have been impossible without the hard work of the program committee who managed to provide accurate reviews on time, on a rather tight schedule. We would also like to thank the Coling 2004 organising committee that made this workshop possible. Finally, we hope that this workshop will yield fruitful results for all participants

Hal - Université Grenoble Alpes

Hal-Diderot

Ontologies across disciplines

Author: Nickles Matthias
Pease Adam
Schalley Andrea C.
Schalley Andrea C.
Zaefferer Dietmar
Zaefferer Dietmar
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2007
Field of study

Open Access LMU

Imperial Genus: The Formation and Limits of the Human in Modern Korea and Japan

Author: Workman Travis
Publication venue: 'University of California Press'
Publication date: 01/04/2020
Field of study

"Imperial Genus begins with the turn to world culture and ideas of the generally human in Japan’s cultural policy in Korea in 1919. How were concepts of the human’s genus‑being operative in the discourses of the Japanese empire? How did they inform the imagination and representation of modernity in colonial Korea? Travis Workman delves into these questions through texts in philosophy, literature, and social science. Imperial Genus focuses on how notions of human generality mediated uncertainty between the transcendental and the empirical, the universal and the particular, and empire and colony. It shows how cosmopolitan cultural principles, the proletarian arts, and Pan‑Asian imperial nationalism converged with practices of colonial governmentality. It is a genealogy of the various articulations of the human’s genus‑being within modern humanist thinking in East Asia, as well as an exploration of the limits of the human as both concept and historical figure.

Directory of Open Access Books (DOAB)

Detecting Cross-Language Plagiarism using Open Knowledge Graphs

Author: Bauer-Marquart Fabian
Gipp Bela
Meuschke Norman
Ruas Terry
Schubotz Moritz
Stegmüller Johannes
Publication venue: Aachen, Germany : RWTH Aachen
Publication date: 06/08/2020
Field of study

Identifying cross-language plagiarism is challenging, especially for distant language pairs and sense-for-sense translations. We introduce the new multilingual retrieval model Cross-Language Ontology-Based Similarity Analysis (CL-OSA) for this task. CL-OSA represents documents as entity vectors obtained from the open knowledge graph Wikidata. Opposed to other methods, CL-OSA does not require computationally expensive machine translation, nor pre-training using comparable or parallel corpora. It reliably disambiguates homonyms and scales to allow its application toWebscale document collections. We show that CL-OSA outperforms state-of-the-art methods for retrieving candidate documents from five large, topically diverse test corpora that include distant language pairs like Japanese-English. For identifying cross-language plagiarism at the character level, CL-OSA primarily improves the detection of sense-for-sense translations. For these challenging cases, CL-OSA’s performance in terms of the well-established PlagDet score exceeds that of the best competitor by more than factor two. The code and data of our study are openly available

arXiv.org e-Print Archive

Repositorium für Naturwissenschaften und Technik

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY