1,490 research outputs found

    ATLAS: A flexible and extensible architecture for linguistic annotation

    Full text link
    We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on ``Annotation Graphs,'' a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic ``signals,'' including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the engineering of natural language processing systems (such as word lists, dictionaries, aligned bilingual corpora, etc.). We conclude with a review of the current efforts towards implementing key pieces of this architecture.Comment: 8 pages, 9 figure

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Learning Transfers over Several Programming Languages

    Full text link
    Large language models (LLMs) have recently become remarkably good at improving developer productivity for high-resource programming languages. These models use two kinds of data: large amounts of unlabeled code samples for pretraining and relatively smaller amounts of labeled code samples for fine-tuning or in-context learning. Unfortunately, many programming languages are low-resource, lacking labeled samples for most tasks and often even lacking unlabeled samples. Therefore, users of low-resource languages (e.g., legacy or new languages) miss out on the benefits of LLMs. Cross-lingual transfer learning uses data from a source language to improve model performance on a target language. It has been well-studied for natural languages, but has received little attention for programming languages. This paper reports extensive experiments on four tasks using a transformer-based LLM and 11 to 41 programming languages to explore the following questions. First, how well cross-lingual transfer works for a given task across different language pairs. Second, given a task and target language, how to best choose a source language. Third, the characteristics of a language pair that are predictive of transfer performance, and fourth, how that depends on the given task.Comment: 16 pages, 5 figures, 5 table

    E(nhanced)-research and the future role and tasks of research libraries

    Get PDF
    Ettekanne TÜ raamatukogus Saksa-Eesti akadeemilise nädala Academica raames 04.11.2008

    Web services for distributed and interoperable hydro-information systems

    Get PDF
    Web services support the integration and interoperability of Web-based applications and enable machineto- machine interaction. The concepts of web services and open distributed architecture were applied to the development of T-DSS, the prototype customised for web based hydro-information systems. T-DSS provides mapping services, database related services and access to remote components, with special emphasis placed on the output flexibility (e.g. multilingualism), where SOAP web services are mainly used for communication. The remote components are represented above all by remote data and mapping services (e.g. meteorological predictions), modelling and analytical systems (currently HEC-HMS, MODFLOW and additional utilities), which support decision making in water management

    AXMEDIS 2007 Conference Proceedings

    Get PDF
    The AXMEDIS International Conference series has been established since 2005 and is focused on the research, developments and applications in the cross-media domain, exploring innovative technologies to meet the challenges of the sector. AXMEDIS2007 deals with all subjects and topics related to cross-media and digital-media content production, processing, management, standards, representation, sharing, interoperability, protection and rights management. It addresses the latest developments and future trends of the technologies and their applications, their impact and exploitation within academic, business and industrial communities

    Selected proceedings of the 50th Linguistic Symposium on Romance Languages

    Get PDF
    Synopsis: The present volume presents a selection of the revised and peer-reviewed proceedings articles of the 50th Linguistic Symposium on Romance Languages (LSRL 50) which was hosted virtually by the faculty and students from the University of Texas at Austin. With contributions from rising and senior scholars from Europe and the Americas, the volume demonstrates the breadth of research in contemporary Romance linguistics with articles that apply corpus-based and laboratory methods, as well as theory, to explore the structure, use, and development of the Romance languages. The articles cover a wide range of fields including morphosyntax, semantics, language variation and change, sociophonetics, historical linguistics, language acquisition, and computational linguistics. In an introductory article, the editors document the sudden transition of LSRL 50 to a virtual format and acknowledge those who helped them to ensure the continuity of this annual scholarly meeting
    corecore