1,315 research outputs found

    Introduction: Modeling, Learning and Processing of Text-Technological Data Structures

    Get PDF
    Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication

    A new method for interoperability between lexical resources using MDA approach

    Get PDF
    International audienceLexical resources are increasingly multiplatform due to the diverse needs of linguists. Merging, comparing, finding correspondences and deducing differences between these lexical resources remain difficult tasks. Thus, inte-roperability between these resources is hard even impossible to achieve. In this context, we establish a new method based on MDA approach to resolve interoperability between lexical resources. The proposed method consists of building common structure (OWL-DL ontology) for involved resources. This common structure has the ability to communicate involved resources. Hence, we may create a complex grid between involved resources allowing transformation from one format to another. We experiment our new built method on an LMF lexicon

    Multilingual language resources and interoperability

    Get PDF
    This article introduces the topic of ‘‘Multilingual language resources and interoperability’’. We start with a taxonomy and parameters for classifying language resources. Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability

    A Pattern Based Approach for Re-engineering Non-Ontological Resources into Ontologies

    Get PDF
    With the goal of speeding up the ontology development process, ontology engineers are starting to reuse as much as possible available ontologies and non-ontological resources such as classification schemes, thesauri, lexicons and folksonomies, that already have some degree of consensus. The reuse of such non-ontological resources necessarily involves their re-engineering into ontologies. Non-ontological resources are highly heterogeneous in their data model and contents: they encode different types of knowledge, and they can be modeled and implemented in different ways. In this paper we present (1) a typology for non-ontological resources, (2) a pattern based approach for re-engineering non-ontological resources into ontologies, and (3) a use case of the proposed approach

    A Formal Framework for Linguistic Annotation

    Get PDF
    `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.Comment: 49 page

    Conceptual design of an XML FACT repository for dispersed XML document warehouses and XML marts

    Get PDF
    Since the introduction of eXtensible Markup Language (XML), XML repositories have gained a foothold in many global (and government) organizations, where, e-Commerce and e-business models have maturated in handling daily transactional data among heterogeneous information systems in multi-data formats. Due to this, the amount of data available for enterprise decision-making process is increasing exponentially and are being stored and/or communicated in XML. This presents an interesting challenge to investigate models, frameworks and techniques for organizing and analyzing such voluminous, yet distributed XML documents for business intelligence in the form of XML warehouse repositories and XML marts. In this paper, we address such an issue, where we propose a view-driven approach for modelling and designing of a Global XML FACT (GxFACT) repository under the MDA initiatives. Here we propose the GxFACT using logically grouped, geographically dispersed, XML document warehouses and Document Marts in a global enterprise setting. To deal with organizations? evolving decision-making needs, we also provide three design strategies for building and managing of such GxFACT in the context of modelling of further hierarchical dimensions and/or global document warehouses

    Learner Corpora without Error Tagging

    Get PDF
    The article explores the possibility of adopting a form-to-function perspective when annotating learner corpora in order to get deeper insights about systematic features of interlanguage. A split between forms and functions (or categories) is desirable in order to avoid the "comparative fallacy" and because – especially in basic varieties – forms may precede functions (e.g., what resembles to a "noun" might have a different function or a function may show up in unexpected forms). In the computer-aided error analysis tradition, all items produced by learners are traced to a grid of error tags which is based on the categories of the target language. Differently, we believe it is possible to record and make retrievable both words and sequence of characters independently from their functional-grammatical label in the target language. For this purpose at the University of Pavia we adapted a probabilistic POS tagger designed for L1 on L2 data. Despite the criticism that this operation can raise, we found that it is better to work with "virtual categories" rather than with errors. The article outlines the theoretical background of the project and shows some examples in which some potential of SLA-oriented (non error-based) tagging will be possibly made clearer

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Smart Environmental Data Infrastructures: Bridging the Gap between Earth Sciences and Citizens

    Get PDF
    The monitoring and forecasting of environmental conditions is a task to which much effort and resources are devoted by the scientific community and relevant authorities. Representative examples arise in meteorology, oceanography, and environmental engineering. As a consequence, high volumes of data are generated, which include data generated by earth observation systems and different kinds of models. Specific data models, formats, vocabularies and data access infrastructures have been developed and are currently being used by the scientific community. Due to this, discovering, accessing and analyzing environmental datasets requires very specific skills, which is an important barrier for their reuse in many other application domains. This paper reviews earth science data representation and access standards and technologies, and identifies the main challenges to overcome in order to enable their integration in semantic open data infrastructures. This would allow non-scientific information technology practitioners to devise new end-user solutions for citizen problems in new application domainsThis research was co-funded by (i) the TRAFAIR project (2017-EU-IA-0167), co-financed by the Connecting Europe Facility of the European Union, (ii) the RADAR-ON-RAIA project (0461_RADAR_ON_RAIA_1_E) co-financed by the European Regional Development Fund (ERDF) through the Iterreg V-A Spain-Portugal program (POCTEP) 2014-2020, and (iii) the Consellería de Educación, Universidade e Formación Profesional of the regional government of Galicia (Spain), through the support for research groups with growth potential (ED431B 2018/28)S
    corecore