238 research outputs found
Towards the Integration of Sign Languages Data in the Linguistic Linked Open Data Cloud
Purpose: In the field of electronic lexicography, there is an increasing interest in
offering ways to represent and interlink lexical data originating from different modalities. This topic is particularly discussed within initiatives and projects concerned with
the representation of lexical information in a Linked Data (LD) compliant format, so
that they can be published within the Linguistic Linked Open Data (LLOD) cloud. In
this context, we can observe that Sign Language (SL) lexical data are not currently represented in the datasets included in the LLOD cloud. Looking at the “Overview of Data-sets for the Sign Languages of Europe”, published by the “Easier” European project,3 we
also do not see any mention of a dataset being available in an LD-compliant format. We
therefore investigate ways of representing SL data in the LLOD cloud and linking them
to other types of language data already available in an LD-compliant format
Natural Language Dialogue Service for Appointment Scheduling Agents
Appointment scheduling is a problem faced daily by many individuals and
organizations. Cooperating agent systems have been developed to partially
automate this task. In order to extend the circle of participants as far as
possible we advocate the use of natural language transmitted by e-mail. We
describe COSMA, a fully implemented German language server for existing
appointment scheduling agent systems. COSMA can cope with multiple dialogues in
parallel, and accounts for differences in dialogue behaviour between human and
machine agents. NL coverage of the sublanguage is achieved through both
corpus-based grammar development and the use of message extraction techniques.Comment: 8 or 9 pages, LaTeX; uses aclap.sty, epsf.te
AcciĂłn COST “Red europea para la ciencia de datos lingĂĽĂsticos centrada en la web” (NexusLinguarum)
We present the current state of the large “European network for Web-centred linguistic data science”. In its first phase, the network has put in place several working groups to deal with specific topics. The network also already implemented a first round of Short Term Scientific Missions (STSM).Presentamos el estado actual de la “Red Europea para la ciencia de datos lingĂĽĂsticos centrada en la Web”. En su primera fase, el proyecto ha establecido varios grupos de trabajo para tratar temas especĂficos. La red tambiĂ©n implementĂł una primera ronda de Misiones CientĂficas de Corto Plazo (la sigla STSM en Ingles, para Short Term Scientifc Mission).Work presented here was supported in part by the COST Action CA18209 – NexusLinguarum “European network for Web-centred linguistic data science”, the project PrĂŞt-Ă -LLOD, under grant agreement no. 825182, and the ELEXIS project, under grant agreement no. 731015
Considerations about Uniqueness and Unalterability for the Encoding of Biographical Data in Ontologies
This paper results from observations that have been made while studying ontological and linked data-based approaches to the encoding of biographical data. Based on certain issues we discovered and which will be described here, we aim to call for a collaborative work towards guidelines for modelling biographical data in the standard Semantic Web representation languages. The need for guidelines became even more clear after reading an article, which described various types of errors in biographical data encoding that have been generated due to an unsuitable use of the owl:sameAs property when referring to the linked data-based description of the life of two literary authors. In this context, there is also a need to agree on the core element of which a biographical description constitutes. More specifically, we aim to determine the “biographical unit”, which should be primarily modelled and to which all related information should be linked by using corresponding semantic properties. Apart from that, we will also discuss the need of the definition and
use of synchronic versus diachronic properties associated with the modelled biographical unit. Regarding this point, we come to the conclusion that for the description of a biographical unit, there are probably no properties whose values remain unaltered over time. This is particularly true if the provenance information, that can provide contrasting values which, however, might be correct from different point of views, is taken into account
Ontologies for a Global Language Infrastructure
Given a situation where human language technologies have been maturing considerably and a rapidly growing range of language data resources being now available, together with natural language processing (NLP) tools/systems, a strong need for a global language infrastructure (GLI) is becoming more and more evident, if one wants to ensure re-usability of the resources. A GLI is essentially an open and web-based software platform on which tailored language services can be efficiently composed, disseminated and consumed. An infrastructure of this sort is also expected to facilitate further development of language data resources and NLP functionalities. The aims of this paper are twofold: (1) to discuss necessity of ontologies for a GLI, and (2) to draw a high-level configuration of the ontologies, which are integrated into a comprehensive language service ontology. To these ends, this paper first explores dimensions of GLI, and then draws a triangular view of a language service, from which necessary ontologies are derived. This paper also examines relevant ongoing international standardization efforts such as LAF, MAF, SynAF, DCR and LMF, and discusses how these frameworks are incorporated into our comprehensive language service ontology. The paper concludes in stressing the need for an international collaboration on the development of a standardized language service ontology
Ontology Lexicalisation: The lemon Perspective
Ontologies (Guarino1998) capture knowledge but fail to capture the structure and use of terms in expressing and referring to this knowledge in natural language. The structure and use of terms is the concern of terminology as well as lexicology. In recent years, the relevance of terminology in knowledge representation has been recognized again (for example the advent of SKOS1) but less consideration has been given to lexical and linguistic issues in knowledge representation (Buitelaar2010)
Ontology Lexicalization: The lemon perspective
Buitelaar P, Cimiano P, McCrae J, Montiel-Ponsoda E, Declerck T. Ontology Lexicalization: The lemon perspective. In: Proceedings of the Workshops - 9th International Conference on Terminology and Artificial Intelligence (TIA 2011). 2011: 33-36
The DDI corpus: An annotated corpus with pharmacological substances and drug-drug interactions
The management of drug-drug interactions (DDIs) is a critical issue resulting from the overwhelming amount of information available on them. Natural Language Processing (NLP) techniques can provide an interesting way to reduce the time spent by healthcare professionals on reviewing biomedical literature. However, NLP techniques rely mostly on the availability of the annotated corpora. While there are several annotated corpora with biological entities and their relationships, there is a lack of corpora annotated with pharmacological substances and DDIs. Moreover, other works in this field have focused in pharmacokinetic (PK) DDIs only, but not in pharmacodynamic (PD) DDIs. To address this problem, we have created a manually annotated corpus consisting of 792 texts selected from the DrugBank database and other 233 Medline abstracts. This fined-grained corpus has been annotated with a total of 18,502 pharmacological substances and 5028 DDIs, including both PK as well as PD interactions. The quality and consistency of the annotation process has been ensured through the creation of annotation guidelines and has been evaluated by the measurement of the inter-annotator agreement between two annotators. The agreement was almost perfect (Kappa up to 0.96 and generally over 0.80), except for the DDIs in the MedLine database (0.55-0.72). The DDI corpus has been used in the SemEvaI 2013 DDIExtraction challenge as a gold standard for the evaluation of information extraction techniques applied to the recognition of pharmacological substances and the detection of DDIs from biomedical texts. DDIExtraction 2013 has attracted wide attention with a total of 14 teams from 7 different countries. For the task of recognition and classification of pharmacological names, the best system achieved an F1 of 71.5%, while, for the detection and classification of DDIs, the best result was F1 of 65.1%.Funding: This work was supported by the EU project TrendMiner
[FP7-ICT287863], by the project MULTIMEDICA [TIN2010-
20644-C03-01], and by the Research Network MA2VICMR
[S2009/TIC-1542].Publicad
- …