5,553 research outputs found

    Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

    Get PDF
    This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium

    References to graphical objects in interactive multimodel queries

    Get PDF
    This thesis describes a computational model for interpreting natural language expressions in an interactive multimodal query system integrating both natural language text and graphic displays. The primary concern of the model is to interpret expressions that might involve graphical attributes, and expressions whose referents could be objects on the screen.Graphical objects on the screen are used to visualise entities in the application domain and their attributes (in short, domain entities and domain attributes). This is why graphical objects are treated as descriptions of those domain entities/attributes in the literature. However, graphical objects and their attributes are visible during the interaction, and are thus known by the participants of the interaction. Therefore, they themselves should be part of the mutual knowledge of the interaction.This poses some interesting problems in language processing. As part of the mutual knowledge, graphical attributes could be used in expressions, and graphical objects could be referred to by expressions. In consequence, there could be ambiguities about whether an attribute in an expression belongs to a graphical object or to a domain entity. There could also be ambiguities about whether the referent of an expression is a graphical object or a domain entity.The main contributions of this thesis consist of analysing the above ambiguities, de¬ signing, implementing and testing a computational model and a demonstration system for resolving these ambiguities. Firstly, a structure and corresponding terminology are set up, so these ambiguities can be clarified as ambiguities derived from referring to different databases, the screen or the application domain (source ambiguities). Secondly, a meaning representation language is designed which explicitly represents the information about which database an attribute/entity comes from. Several linguistic regularities inside and among referring expressions are described so that they can be used as heuristics in the ambiguity resolution. Thirdly, a computational model based on constraint satisfaction is constructed to resolve simultaneously some reference ambiguities and source ambiguities. Then, a demonstration system integrating natural language text and graphics is implemented, whose core is the computational model.This thesis ends with an evaluation of the computational model. It provides some concrete evidence about the advantages and disadvantages of the above approach

    Semi-Supervised Named Entity Recognition:\ud Learning to Recognize 100 Entity Types with Little Supervision\ud

    Get PDF
    Named Entity Recognition (NER) aims to extract and to classify rigid designators in text such as proper names, biological species, and temporal expressions. There has been growing interest in this field of research since the early 1990s. In this thesis, we document a trend moving away from handcrafted rules, and towards machine learning approaches. Still, recent machine learning approaches have a problem with annotated data availability, which is a serious shortcoming in building and maintaining large-scale NER systems. \ud \ud In this thesis, we present an NER system built with very little supervision. Human supervision is indeed limited to listing a few examples of each named entity (NE) type. First, we introduce a proof-of-concept semi-supervised system that can recognize four NE types. Then, we expand its capacities by improving key technologies, and we apply the system to an entire hierarchy comprised of 100 NE types. \ud \ud Our work makes the following contributions: the creation of a proof-of-concept semi-supervised NER system; the demonstration of an innovative noise filtering technique for generating NE lists; the validation of a strategy for learning disambiguation rules using automatically identified, unambiguous NEs; and finally, the development of an acronym detection algorithm, thus solving a rare but very difficult problem in alias resolution. \ud \ud We believe semi-supervised learning techniques are about to break new ground in the machine learning community. In this thesis, we show that limited supervision can build complete NER systems. On standard evaluation corpora, we report performances that compare to baseline supervised systems in the task of annotating NEs in texts. \u

    Proceedings of the international conference on cooperative multimodal communication CMC/95, Eindhoven, May 24-26, 1995:proceedings

    Get PDF

    Natural language software registry (second edition)

    Get PDF

    An Interoperable Access Control System based on Self-Sovereign Identities

    Get PDF
    The extreme growth of the World Wide Web in the last decade together with recent scandals related to theft or abusive use of personal information have left users unsatisfied withtheir digital identity providers and concerned about their online privacy. Self-SovereignIdentity (SSI) is a new identity management paradigm which gives back control over personal information to its rightful owner - the individual. However, adoption of SSI on theWeb is complicated by the high overhead costs for the service providers due to the lackinginteroperability of the various emerging SSI solutions. In this work, we propose an AccessControl System based on Self-Sovereign Identities with a semantically modelled AccessControl Logic. Our system relies on the Web Access Control authorization rules usedin the Solid project and extends them to additionally express requirements on VerifiableCredentials, i.e., digital credentials adhering to a standardized data model. Moreover,the system achieves interoperability across multiple DID Methods and types of VerifiableCredentials allowing for incremental extensibility of the supported SSI technologies bydesign. A Proof-of-Concept prototype is implemented and its performance as well as multiple system design choices are evaluated: The End-to-End latency of the authorizationprocess takes between 2-5 seconds depending on the used DID Methods and can theoretically be further optimized to 1.5-3 seconds. Evaluating the potential interoperabilityachieved by the system shows that multiple DID Methods and different types of VerifiableCredentials can be supported. Lastly, multiple approaches for modelling required Verifiable Credentials are compared and the suitability of the SHACL language for describingthe RDF graphs represented by the required Linked Data credentials is shown

    Workshop proceedings of the 1st workshop on quality in modeling

    Get PDF
    Quality assessment and assurance constitute an important part of software engineering. The issues of software quality management are widely researched and approached from multiple perspectives and viewpoints. The introduction of a new paradigm in software development – namely Model Driven Development (MDD) and its variations (e.g., MDA [Model Driven Architecture], MDE [Model Driven Engineering], MBD [Model Based Development], MIC [Model Integrated Computing]) – raises new challenges in software quality management, and as such should be given a special attention. In particular, the issues of early quality assessment, based on models at a high abstraction level, and building (or customizing the existing) prediction models for software quality based on model metrics are of central importance for the software engineering community. The workshop is continuation of a series of workshops on consistency that have taken place during the subsequent annual UML conferences and recently MDA-FA. The idea behind this workshop is to extend the scope of interests and address a wide spectrum of problems related to MDD. It is also in line with the overall initiative of the shift from UML to MoDELS. The goal of this workshop is to gather researchers and practitioners interested in the emerging issues of quality in the context of MDD. The workshop is intended to provide a premier forum for discussions related to software quality and MDD. And the aims of the workshop are: - Presenting ongoing research related to quality in modeling in the context of MDD, - Defining and organizing issues related to quality in the MDD. The format of the workshop consists of two parts: presentation and discussion. The presentation part is aimed at reporting research results related to quality aspects in modeling. Seven papers were selected for the presentation out of 16 submissions; the selected papers are included in these proceedings. The discussion part is intended to be a forum for exchange of ideas related to understanding of quality and approaching it in a systematic way

    Workshop proceedings of the 1st workshop on quality in modeling

    Get PDF
    Quality assessment and assurance constitute an important part of software engineering. The issues of software quality management are widely researched and approached from multiple perspectives and viewpoints. The introduction of a new paradigm in software development – namely Model Driven Development (MDD) and its variations (e.g., MDA [Model Driven Architecture], MDE [Model Driven Engineering], MBD [Model Based Development], MIC [Model Integrated Computing]) – raises new challenges in software quality management, and as such should be given a special attention. In particular, the issues of early quality assessment, based on models at a high abstraction level, and building (or customizing the existing) prediction models for software quality based on model metrics are of central importance for the software engineering community. The workshop is continuation of a series of workshops on consistency that have taken place during the subsequent annual UML conferences and recently MDA-FA. The idea behind this workshop is to extend the scope of interests and address a wide spectrum of problems related to MDD. It is also in line with the overall initiative of the shift from UML to MoDELS. The goal of this workshop is to gather researchers and practitioners interested in the emerging issues of quality in the context of MDD. The workshop is intended to provide a premier forum for discussions related to software quality and MDD. And the aims of the workshop are: - Presenting ongoing research related to quality in modeling in the context of MDD, - Defining and organizing issues related to quality in the MDD. The format of the workshop consists of two parts: presentation and discussion. The presentation part is aimed at reporting research results related to quality aspects in modeling. Seven papers were selected for the presentation out of 16 submissions; the selected papers are included in these proceedings. The discussion part is intended to be a forum for exchange of ideas related to understanding of quality and approaching it in a systematic way
    • …
    corecore