47,186 research outputs found

    GATE -- an Environment to Support Research and Development in Natural Language Engineering

    Get PDF
    We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available

    New Methods, Current Trends and Software Infrastructure for NLP

    Full text link
    The increasing use of `new methods' in NLP, which the NeMLaP conference series exemplifies, occurs in the context of a wider shift in the nature and concerns of the discipline. This paper begins with a short review of this context and significant trends in the field. The review motivates and leads to a set of requirements for support software of general utility for NLP research and development workers. A freely-available system designed to meet these requirements is described (called GATE - a General Architecture for Text Engineering). Information Extraction (IE), in the sense defined by the Message Understanding Conferences (ARPA \cite{Arp95}), is an NLP application in which many of the new methods have found a home (Hobbs \cite{Hob93}; Jacobs ed. \cite{Jac92}). An IE system based on GATE is also available for research purposes, and this is described. Lastly we review related work.Comment: 12 pages, LaTeX, uses nemlap.sty (included

    Crowdsourcing Linked Data on listening experiences through reuse and enhancement of library data

    Get PDF
    Research has approached the practice of musical reception in a multitude of ways, such as the analysis of professional critique, sales figures and psychological processes activated by the act of listening. Studies in the Humanities, on the other hand, have been hindered by the lack of structured evidence of actual experiences of listening as reported by the listeners themselves, a concern that was voiced since the early Web era. It was however assumed that such evidence existed, albeit in pure textual form, but could not be leveraged until it was digitised and aggregated. The Listening Experience Database (LED) responds to this research need by providing a centralised hub for evidence of listening in the literature. Not only does LED support search and reuse across nearly 10,000 records, but it also provides machine-readable structured data of the knowledge around the contexts of listening. To take advantage of the mass of formal knowledge that already exists on the Web concerning these contexts, the entire framework adopts Linked Data principles and technologies. This also allows LED to directly reuse open data from the British Library for the source documentation that is already published. Reused data are re-published as open data with enhancements obtained by expanding over the model of the original data, such as the partitioning of published books and collections into individual stand-alone documents. The database was populated through crowdsourcing and seamlessly incorporates data reuse from the very early data entry phases. As the sources of the evidence often contain vague, fragmentary of uncertain information, facilities were put in place to generate structured data out of such fuzziness. Alongside elaborating on these functionalities, this article provides insights into the most recent features of the latest instalment of the dataset and portal, such as the interlinking with the MusicBrainz database, the relaxation of geographical input constraints through text mining, and the plotting of key locations in an interactive geographical browser

    Safe to Be Open: Study on the Protection of Research Data and Recommendations for Access and Usage

    Get PDF
    Openness has become a common concept in a growing number of scientific and academic fields. Expressions such as Open Access (OA) or Open Content (OC) are often employed for publications of papers and research results, or are contained as conditions in tenders issued by a number of funding agencies. More recently the concept of Open Data (OD) is of growing interest in some fields, particularly those that produce large amounts of data – which are not usually protected by standard legal tools such as copyright. However, a thorough understanding of the meaning of Openness – especially its legal implications – is usually lacking. Open Access, Public Access, Open Content, Open Data, Public Domain. All these terms are often employed to indicate that a given paper, repository or database does not fall under the traditional “closed” scheme of default copyright rules. However, the differences between all these terms are often largely ignored or misrepresented, especially when the scientist in question is not familiar with the law generally and copyright in particular – a very common situation in all scientific fields. On 17 July 2012 the European Commission published its Communication to the European Parliament and the Council entitled “Towards better access to scientific information: Boosting the benefits of public investments in research”. As the Commission observes, “discussions of the scientific dissemination system have traditionally focused on access to scientific publications – journals and monographs. However, it is becoming increasingly important to improve access to research data (experimental results, observations and computer-generated information), which forms the basis for the quantitative analysis underpinning many scientific publications”. The Commission believes that through more complete and wider access to scientific publications and data, the pace of innovation will accelerate and researchers will collaborate so that duplication of efforts will be avoided. Moreover, open research data will allow other researchers to build on previous research results, as it will allow involvement of citizens and society in the scientific process. In the Communication the Commission makes explicit reference to open access models of publications and dissemination of research results, and the reference is not only to access and use but most significantly to reuse of publications as well as research data. The Communication marks an official new step on the road to open access to publicly funded research results in science and the humanities in Europe. Scientific publications are no longer the only elements of its open access policy: research data upon which publications are based should now also be made available to the public. As noble as the open access goal is, however, the expansion of the open access policy to publicly funded research data raises a number of legal and policy issues that are often distinct from those concerning the publication of scientific articles and monographs. Since open access to research data – rather than publications – is a relatively new policy objective, less attention has been paid to the specific features of research data. An analysis of the legal status of such data, and on how to make it available under the correct licence terms, is therefore the subject of the following sections

    Hypotheses, evidence and relationships: The HypER approach for representing scientific knowledge claims

    Get PDF
    Biological knowledge is increasingly represented as a collection of (entity-relationship-entity) triplets. These are queried, mined, appended to papers, and published. However, this representation ignores the argumentation contained within a paper and the relationships between hypotheses, claims and evidence put forth in the article. In this paper, we propose an alternate view of the research article as a network of 'hypotheses and evidence'. Our knowledge representation focuses on scientific discourse as a rhetorical activity, which leads to a different direction in the development of tools and processes for modeling this discourse. We propose to extract knowledge from the article to allow the construction of a system where a specific scientific claim is connected, through trails of meaningful relationships, to experimental evidence. We discuss some current efforts and future plans in this area

    Ocean Governance

    Get PDF
    There are a range of legal instruments, institutions, and organizations that collectively establish rules and policies for managing, conserving, and using the ocean. The United Nations Convention on the Law of the Sea (UNCLOS) provides the overarching legal framework for ocean governance and management on a global scale, but there are a number of other important ocean governance-related institutions, instruments and processes. This document provides a brief overview of those institutions and processes that are most relevant to multi-sectoral business and industry interests, with a particular emphasis on opportunities for industry to get involved in the policy-making process. It does not include policies, institutions, and processes that are primarily relevant to a single sector. After first reviewing key aspects of UNCLOS, this document discusses other key ocean policy and governance processes and bodies

    Property and the Construction of the Information Economy: A Neo-Polanyian Ontology

    Get PDF
    This chapter considers the changing roles and forms of information property within the political economy of informational capitalism. I begin with an overview of the principal methods used in law and in media and communications studies, respectively, to study information property, considering both what each disciplinary cluster traditionally has emphasized and newer, hybrid directions. Next, I develop a three-part framework for analyzing information property as a set of emergent institutional formations that both work to produce and are themselves produced by other evolving political-economic arrangements. The framework considers patterns of change in existing legal institutions for intellectual property, the ongoing dematerialization and datafication of both traditional and new inputs to economic production, and the emerging logics of economic organization within which information resources (and property rights) are mobilized. Finally, I consider the implications of that framing for two very different contemporary information property projects, one relating to data flows within platform-based business models and the other to information commons

    A Factoid Question Answering System for Vietnamese

    Full text link
    In this paper, we describe the development of an end-to-end factoid question answering system for the Vietnamese language. This system combines both statistical models and ontology-based methods in a chain of processing modules to provide high-quality mappings from natural language text to entities. We present the challenges in the development of such an intelligent user interface for an isolating language like Vietnamese and show that techniques developed for inflectional languages cannot be applied "as is". Our question answering system can answer a wide range of general knowledge questions with promising accuracy on a test set.Comment: In the proceedings of the HQA'18 workshop, The Web Conference Companion, Lyon, Franc
    corecore