4,454 research outputs found

    The REVERE project:Experiments with the application of probabilistic NLP to systems engineering

    Get PDF
    Despite natural language’s well-documented shortcomings as a medium for precise technical description, its use in software-intensive systems engineering remains inescapable. This poses many problems for engineers who must derive problem understanding and synthesise precise solution descriptions from free text. This is true both for the largely unstructured textual descriptions from which system requirements are derived, and for more formal documents, such as standards, which impose requirements on system development processes. This paper describes experiments that we have carried out in the REVERE1 project to investigate the use of probabilistic natural language processing techniques to provide systems engineering support

    A Machine-Aided Approach to Intelligent Index Generation

    Get PDF
    Back-of-the-book indexing is the process of generating a list of relevant terms, sub-terms and cross-references from a corpus and providing the user with corresponding page references. Several cognitive tasks are necessary to produce a good index, and are performed primarily by the human indexer. Indexing has become somewhat automated through computer applications, which at best generate a concordance, and exist to reduce the mundane portions of the process. However, none of these tools determines which terms to index, nor do they capture context-sensitive information about terms and their relationships. Human indexers perform these time-consuming tasks. The challenge is to develop software that bridges the gap between computerized concordances and manual indexing. The prototype application described herein is unique in its ability to incorporate the intelligent portions of the process. Because of this, it provides a robust draft index that a human indexer can refine in a fraction of the time

    Methods for Amharic part-of-speech tagging

    Get PDF
    The paper describes a set of experiments involving the application of three state-of- the-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for Eng- lish, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy ap- proach, while HMM-based and SVM- based taggers got comparable results

    Semantic Heterogeneity Issues on the Web

    Full text link
    The Semantic Web is an extension of the traditional Web in which meaning of information is well defined, thus allowing a better interaction between people and computers. To accomplish its goals, mechanisms are required to make explicit the semantics of Web resources, to be automatically processed by software agents (this semantics being described by means of online ontologies). Nevertheless, issues arise caused by the semantic heterogeneity that naturally happens on the Web, namely redundancy and ambiguity. For tackling these issues, we present an approach to discover and represent, in a non-redundant way, the intended meaning of words in Web applications, while taking into account the (often unstructured) context in which they appear. To that end, we have developed novel ontology matching, clustering, and disambiguation techniques. Our work is intended to help bridge the gap between syntax and semantics for the Semantic Web construction

    Ontology-Aware Token Embeddings for Prepositional Phrase Attachment

    Full text link
    Type-level word embeddings use the same set of parameters to represent all instances of a word regardless of its context, ignoring the inherent lexical ambiguity in language. Instead, we embed semantic concepts (or synsets) as defined in WordNet and represent a word token in a particular context by estimating a distribution over relevant semantic concepts. We use the new, context-sensitive embeddings in a model for predicting prepositional phrase(PP) attachments and jointly learn the concept embeddings and model parameters. We show that using context-sensitive embeddings improves the accuracy of the PP attachment model by 5.4% absolute points, which amounts to a 34.4% relative reduction in errors.Comment: ACL 201

    UmobiTalk: Ubiquitous Mobile Speech Based Learning Language Translator for Sesotho Language

    Get PDF
    Published ThesisThe need to conserve the under-resourced languages is becoming more urgent as some of them are becoming extinct; natural language processing can be used to redress this. Currently, most initiatives around language processing technologies are focusing on western languages such as English and French, yet resources for such languages are already available. The Sesotho language is one of the under-resourced Bantu languages; it is mostly spoken in Free State province of South Africa and in Lesotho. Like other parts of South Africa, Free State has experienced high number of migrants and non-Sesotho speakers from neighboring provinces and countries; such people are faced with serious language barrier problems especially in the informal settlements where everyone tends to speak only Sesotho. Non-Sesotho speakers refers to the racial groups such as Xhosas, Zulus, Coloureds, Whites and more, in which Sesotho language is not their native language. As a solution to this, we developed a parallel corpus that has English as source and Sesotho as a target language and packaged it in UmobiTalk - Ubiquitous mobile speech based learning translator. UmobiTalk is a mobile-based tool for learning Sesotho for English speakers. The development of this tool was based on the combination of automatic speech recognition, machine translation and speech synthesis

    The pictures we like are our image: continuous mapping of favorite pictures into self-assessed and attributed personality traits

    Get PDF
    Flickr allows its users to tag the pictures they like as “favorite”. As a result, many users of the popular photo-sharing platform produce galleries of favorite pictures. This article proposes new approaches, based on Computational Aesthetics, capable to infer the personality traits of Flickr users from the galleries above. In particular, the approaches map low-level features extracted from the pictures into numerical scores corresponding to the Big-Five Traits, both self-assessed and attributed. The experiments were performed over 60,000 pictures tagged as favorite by 300 users (the PsychoFlickr Corpus). The results show that it is possible to predict beyond chance both self-assessed and attributed traits. In line with the state-of-the art of Personality Computing, these latter are predicted with higher effectiveness (correlation up to 0.68 between actual and predicted traits)
    • …
    corecore