12 research outputs found

    Software Infrastructure for Natural Language Processing

    Full text link
    We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems. The task is motivated by a discussion of current trends in the field of NLP and Language Engineering. We describe a system called GATE (a General Architecture for Text Engineering) that provides a software infrastructure on top of which heterogeneous NLP processing modules may be evaluated and refined individually, or may be combined into larger application systems. GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE promotes reuse of component technology, permits specialisation and collaboration in large-scale projects, and allows for the comparison and evaluation of alternative technologies. The first release of GATE is now available - see http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page

    LREP: A language repository exchange protocol

    No full text
    The recent increase in the number and complexity of the language resources available on the Internet is followed by a similar increase of available tools for linguistic analysis. Ideally the user does not need to be confronted with the question in how to match tools with resources. If resource repositories and tool repositories offer adequate metadata information and a suitable exchange protocol is developed this matching process could be performed (semi-) automatically

    Основные задачи автоматической обработки текстов и подходы к их решению

    Get PDF
    Секция 2. Интеллектуальные информационные системыДанная статья посвящена анализу основных подходов к решению задач автоматической обработки текстов, возникающих при создании высокотехнологичных интеллектуальных систем, обеспечивающих замену человеческого труда в интеллектуальной сфере, опирающейся на использование естественного языка

    Mise en oeuvre d'un accès ouvert à des ressources linguistiques — bilan du projet Silfide et perspectives.

    Get PDF
    This paper presents the main technologies implemented within the Silfide project and aiming at providing a distributed access to language resources. In particular it shows the role of standardized formats (XML, TEI) and present a mechanism for distributed queries across several servers

    LREP: A Language Repository Exchange Protocol

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceThe recent increase in the number and complexity of the language resources available on the Internet is followed by a similar increase of available tools for linguistic analysis. Ideally the user does not need to be confronted with the question in how to match tools with resources. If resource repositories and tool repositories offer adequate metadata information and a suitable exchange protocol is developed this matching process could be performed (semi-) automatically

    The Mate Workbench - a tool for annotating XML corpora

    Get PDF
    This paper describes the design and implementation of the MATE workbench, a program which provides support for flexible display and editing of XML annotations, and complex querying of a set of linked files. The workbench was designed to support the annotation of XML coded linguistic corpora, but it could be used to annotate any kind of data, as it is not dependent on any particular annotation scheme. Rather than being a general purpose XMLaware editor it is a system for writing specialised editors tailored to a particular annotation task. A particular editor is defined using a transformation language, with suitable display formats and allowable editing operations. The workbench is written in Java, which means that it is platform-independent. This paper outlines the design of the workbench software and compares it with other annotation programs. 1. Introduction The annotation or markup of files with linguistic or other complex information usually requires either human coding or human ..

    Factors that Influence the Synergy between Development and IT Operations in a DevOps Environment

    Get PDF
    Software development processes have been associated with severe conflicts between the development and operations teams. The problems further worsened by the occasional performance of activities such as planning, testing, integration, and releases. Many developing software development concepts reveal attempts to address these challenges. For instance, continuous integration is a practice that has emerged to reduce disconnects between development and IT operational deployments. In a comparable thread, the current emphasis on DevOps acknowledges that the integration between software development and its operational deployment needs to be a continuous whole. Problems involving the integration of software development and operations require positive synergy within DevOps teams. Team synergy brings about team effectiveness and performance as well as creating opportunities for innovation. The purpose of this study is to identify the factors that influence team synergy between the development and operations teams in a DevOps environment. The researcher conducted a case study at one of South Africa's leading information and communication technology services providers. Thirteen participants were interviewed to provide insight into the research questions. Interviews were conducted at the premises of the participating organization in Cape Town. The participants in the study preferred pseudonyms instead of their actual names to preserve anonymity. Interviews were transcribed and analysed using thematic analysis. During the analysis of the transcribed data, themes and categories were identified. The themes and categories that emerged from the data sources were aligned to the theoretical framework. The findings from this study describe enabling and inhibiting factors that influence the synergy between development and operations teams in a DevOps environment. Recognizing that DevOps teams face several challenges, the factors identified in this study provide insights into how organizations can influence the build and motivate their DevOps teams to achieve team synergy. The contribution to DevOps research is the application of a theoretical framework that suggests the importance of team social capital dimensions in the formation of team synergy. Based on its findings, this study recommends that further investigation and improvement on strategies to mitigate the factors that inhibit the dimensions of team social capital and prevent team synergy in a DevOps environment. The study also recommends a more detailed and practical demonstration to validate the value of the theoretical framework and continue to improve or extend it. This study revealed that DevOps teams operate in a complex and dynamic environment with many stakeholders and complex technical infrastructure. Based on this outcome, the study also suggests that future studies can take a different approach to create a different perspective on the synergy between DevOps teams by focusing on the behavior of the actors and complex problematic situations involving social activities

    MULTI-MODAL TASK INSTRUCTIONS TO ROBOTS BY NAIVE USERS

    Get PDF
    This thesis presents a theoretical framework for the design of user-programmable robots. The objective of the work is to investigate multi-modal unconstrained natural instructions given to robots in order to design a learning robot. A corpus-centred approach is used to design an agent that can reason, learn and interact with a human in a natural unconstrained way. The corpus-centred design approach is formalised and developed in detail. It requires the developer to record a human during interaction and analyse the recordings to find instruction primitives. These are then implemented into a robot. The focus of this work has been on how to combine speech and gesture using rules extracted from the analysis of a corpus. A multi-modal integration algorithm is presented, that can use timing and semantics to group, match and unify gesture and language. The algorithm always achieves correct pairings on a corpus and initiates questions to the user in ambiguous cases or missing information. The domain of card games has been investigated, because of its variety of games which are rich in rules and contain sequences. A further focus of the work is on the translation of rule-based instructions. Most multi-modal interfaces to date have only considered sequential instructions. The combination of frame-based reasoning, a knowledge base organised as an ontology and a problem solver engine is used to store these rules. The understanding of rule instructions, which contain conditional and imaginary situations require an agent with complex reasoning capabilities. A test system of the agent implementation is also described. Tests to confirm the implementation by playing back the corpus are presented. Furthermore, deployment test results with the implemented agent and human subjects are presented and discussed. The tests showed that the rate of errors that are due to the sentences not being defined in the grammar does not decrease by an acceptable rate when new grammar is introduced. This was particularly the case for complex verbal rule instructions which have a large variety of being expressed

    Semantic Routing in Peer-to-Peer Systems

    Get PDF
    Currently search engines like Google, Yahoo and Excite are centralized, which means that all queries that users post are sent to some big servers (or server group) that handle them. In this way it is easy for the systems to relate IP-addresses with the queries posted from them. Clearly privacy is a problem here. Also censoring out certain information which is not 'appropriate' is simple, and shown in recent examples. To give more privacy to the users and make censoring information more difficult, Peer-to-Peer (P2P) systems are a good alternative to the centralized approach. In P2P systems the search functionality can be devided over a large group of autonomous computers (Peers), where each computer only has a very small piece of information instead of everything. Now the problem in such a distributed system is to make the search process efficient in terms of bandwith, storage, time and CPU usage. In this Ph.D. thesis, three approaches are described that try to reach goal of finding the short routes between seeker and providers with high efficiency. These routing algorithms are all applied on 'Semantic-Overlay-Networks' (SONs). In a SON, peers maintain pointers to semantically relevant peers based on content descriptions, which makes them able to choose the relevant peers for queries instead of, for example, choosing random peers. This work tries to show that decentralized search algorithms based on semantic routing are a good alternative to centralized approaches.Harmelen, F.A.H. van [Promotor

    Integrating deep and shallow natural language processing components : representations and hybrid architectures

    Get PDF
    We describe basic concepts and software architectures for the integration of shallow and deep (linguistics-based, semantics-oriented) natural language processing (NLP) components. The main goal of this novel, hybrid integration paradigm is improving robustness of deep processing. After an introduction to constraint-based natural language parsing, we give an overview of typical shallow processing tasks. We introduce XML standoff markup as an additional abstraction layer that eases integration of NLP components, and propose the use of XSLT as a standardized and efficient transformation language for online NLP integration. In the main part of the thesis, we describe our contributions to three hybrid architecture frameworks that make use of these fundamentals. SProUT is a shallow system that uses elements of deep constraint-based processing, namely type hierarchy and typed feature structures. WHITEBOARD is the first hybrid architecture to integrate not only part-of-speech tagging, but also named entity recognition and topological parsing, with deep parsing. Finally, we present Heart of Gold, a middleware architecture that generalizes WHITEBOARD into various dimensions such as configurability, multilinguality and flexible processing strategies. We describe various applications that have been implemented using the hybrid frameworks such as structured named entity recognition, information extraction, creative document authoring support, deep question analysis, as well as evaluations. In WHITEBOARD, e.g., it could be shown that shallow pre-processing increases both coverage and efficiency of deep parsing by a factor of more than two. Heart of Gold not only forms the basis for applications that utilize semanticsoriented natural language analysis, but also constitutes a complex research instrument for experimenting with novel processing strategies combining deep and shallow methods, and eases replication and comparability of results.Diese Arbeit beschreibt Grundlagen und Software-Architekturen für die Integration von flachen mit tiefen (linguistikbasierten und semantikorientierten) Verarbeitungskomponenten für natürliche Sprache. Das Hauptziel dieses neuartigen, hybriden Integrationparadigmas ist die Verbesserung der Robustheit der tiefen Verarbeitung. Nach einer Einführung in constraintbasierte Analyse natürlicher Sprache geben wir einen Überblick über typische Aufgaben flacher Sprachverarbeitungskomponenten. Wir führen XML Standoff-Markup als zusätzliche Abstraktionsebene ein, mit deren Hilfe sich Sprachverarbeitungskomponenten einfacher integrieren lassen. Ferner schlagen wir XSLT als standardisierte und effiziente Transformationssprache für die Online-Integration vor. Im Hauptteil der Arbeit stellen wir unsere Beiträge zu drei hybriden Architekturen vor, welche auf den beschriebenen Grundlagen aufbauen. SProUT ist ein flaches System, das Elemente tiefer Verarbeitung wie Typhierarchie und getypte Merkmalsstrukturen nutzt. WHITEBOARD ist das erste System, welches nicht nur Part-of-speech-Tagging, sondern auch Eigennamenerkennung und flaches topologisches Parsing mit tiefer Verarbeitung kombiniert. Schließlich wird Heart of Gold vorgestellt, eine Middleware-Architektur, welche WHITEBOARD hinsichtlich verschiedener Dimensionen wie Konfigurierbarkeit, Mehrsprachigkeit und Unterstützung flexibler Verarbeitungsstrategien generalisiert. Wir beschreiben verschiedene, mit Hilfe der hybriden Architekturen implementierte Anwendungen wie strukturierte Eigennamenerkennung, Informationsextraktion, Kreativitätsunterstützung bei der Dokumenterstellung, tiefe Frageanalyse, sowie Evaluationen. So konnte z.B. in WHITEBOARD gezeigt werden, dass durch flache Vorverarbeitung sowohl Abdeckung als auch Effizienz des tiefen Parsers mehr als verdoppelt werden. Heart of Gold bildet nicht nur Grundlage für semantikorientierte Sprachanwendungen, sondern stellt auch eine wissenschaftliche Experimentierplattform für weitere, neuartige Kombinationsstrategien dar, welche zudem die Replizierbarkeit und Vergleichbarkeit von Ergebnissen erleichtert
    corecore