4,454 research outputs found
The REVERE project:Experiments with the application of probabilistic NLP to systems engineering
Despite natural language’s well-documented shortcomings as a medium for precise technical description, its use in software-intensive systems engineering remains inescapable. This poses many problems for engineers who must derive problem understanding and synthesise precise solution descriptions from free text. This is true both for the largely unstructured textual descriptions from which system requirements are derived, and for more formal documents, such as standards, which impose requirements on system development processes. This paper describes experiments that we have carried out in the REVERE1 project to investigate the use of probabilistic natural language processing techniques to provide systems engineering support
A Machine-Aided Approach to Intelligent Index Generation
Back-of-the-book indexing is the process of generating a list of relevant terms, sub-terms and cross-references from a corpus and providing the user with corresponding page references.
Several cognitive tasks are necessary to produce a good index, and are performed primarily by the human indexer. Indexing has become somewhat automated through computer applications, which at best generate a concordance, and exist to reduce the mundane portions of the process. However, none of these tools determines which terms to index, nor do they capture context-sensitive information about terms and their relationships. Human indexers perform these time-consuming tasks.
The challenge is to develop software that bridges the gap between computerized concordances and manual indexing. The prototype application described herein is unique in its ability to incorporate the intelligent portions of the process. Because of this, it provides a robust draft index that a human indexer can refine in a fraction of the time
Methods for Amharic part-of-speech tagging
The paper describes a set of experiments
involving the application of three state-of-
the-art part-of-speech taggers to Ethiopian
Amharic, using three different tagsets.
The taggers showed worse performance
than previously reported results for Eng-
lish, in particular having problems with
unknown words. The best results were
obtained using a Maximum Entropy ap-
proach, while HMM-based and SVM-
based taggers got comparable results
Semantic Heterogeneity Issues on the Web
The Semantic Web is an extension of the traditional Web in which meaning of information is well defined, thus allowing a better interaction between people and computers. To accomplish its goals, mechanisms are required to make explicit the semantics of Web resources, to be automatically processed by software agents (this semantics being described by means of online ontologies). Nevertheless, issues arise caused by the semantic heterogeneity that naturally happens on the Web, namely redundancy and ambiguity. For tackling these issues, we present an approach to discover and represent, in a non-redundant way, the intended meaning of words in Web applications, while taking into account the (often unstructured) context in which they appear. To that end, we have developed novel ontology matching, clustering, and disambiguation techniques. Our work is intended to help bridge the gap between syntax and semantics for the Semantic Web construction
Ontology-Aware Token Embeddings for Prepositional Phrase Attachment
Type-level word embeddings use the same set of parameters to represent all
instances of a word regardless of its context, ignoring the inherent lexical
ambiguity in language. Instead, we embed semantic concepts (or synsets) as
defined in WordNet and represent a word token in a particular context by
estimating a distribution over relevant semantic concepts. We use the new,
context-sensitive embeddings in a model for predicting prepositional phrase(PP)
attachments and jointly learn the concept embeddings and model parameters. We
show that using context-sensitive embeddings improves the accuracy of the PP
attachment model by 5.4% absolute points, which amounts to a 34.4% relative
reduction in errors.Comment: ACL 201
UmobiTalk: Ubiquitous Mobile Speech Based Learning Language Translator for Sesotho Language
Published ThesisThe need to conserve the under-resourced languages is becoming more urgent as some of them are becoming extinct; natural language processing can be used to redress this. Currently, most initiatives around language processing technologies are focusing on western languages such as English and French, yet resources for such languages are already available. The Sesotho language is one of the under-resourced Bantu languages; it is mostly spoken in Free State province of South Africa and in Lesotho. Like other parts of South Africa, Free State has experienced high number of migrants and non-Sesotho speakers from neighboring provinces and countries; such people are faced with serious language barrier problems especially in the informal settlements where everyone tends to speak only Sesotho. Non-Sesotho speakers refers to the racial groups such as Xhosas, Zulus, Coloureds, Whites and more, in which Sesotho language is not their native language.
As a solution to this, we developed a parallel corpus that has English as source and Sesotho as a target language and packaged it in UmobiTalk - Ubiquitous mobile speech based learning translator. UmobiTalk is a mobile-based tool for learning Sesotho for English speakers. The development of this tool was based on the combination of automatic speech recognition, machine translation and speech synthesis
The pictures we like are our image: continuous mapping of favorite pictures into self-assessed and attributed personality traits
Flickr allows its users to tag the pictures they like as “favorite”. As a result, many users of the popular photo-sharing platform produce galleries of favorite pictures. This article proposes new approaches, based on Computational Aesthetics, capable to infer the personality traits of Flickr users from the galleries above. In particular, the approaches map low-level features extracted from the pictures into numerical scores corresponding to the Big-Five Traits, both self-assessed and attributed. The experiments were performed over 60,000 pictures tagged as favorite by 300 users (the PsychoFlickr Corpus). The results show that it is possible to predict beyond chance both self-assessed and attributed traits. In line with the state-of-the art of Personality Computing, these latter are predicted with higher effectiveness (correlation up to 0.68 between actual and predicted traits)
- …