6,012 research outputs found

    A usability evaluation of the prototype Afrikaanse idiome-woordeboek

    Get PDF
    The Afrikaanse idiome-woordeboek is a prototype e-dictionary of Afrikaans fixed expres-sions developed with the intention to test the functionalities of the e-dictionary. This dictionary is based on the function theory of lexicography. The e-dictionary makes use of various technologies. When digital tools are developed it is important to consider the usability of the tool. Usability evaluation was done on the Afrikaanse idiome-woordeboek to determine with what success it can be used. Discount usability methods, viz. heuristic evaluation and usability testing were used. This article reports on the findings from the usability tests which are discussed under the categories of content, information architecture, navigation, access (searching and browsing), help, customisation and the use of innovative technologies to manage data in e-dictionaries for search and display. The usability evaluation showed that the users did not always use the e-dictionary as the designers intended. Various recommendations are made to the designers of the Afrikaanse idiome-woordeboek, as well as for the design of e-dictionaries in general. Recommendations appropriate to e-dictionaries in general are made regarding usability evaluation, information architecture, searching in e-dic-tionaries, the data that can be included in e-dictionaries and training of users of e-dictionaries. Keywords: E-Dictionaries, Online Dictionaries, Electronic Dictionaries, Usability Evaluation, Usability Tests, Discount Usability, Dictionary Literacy

    Evaluating the word-expert approach for Named-Entity Disambiguation

    Full text link
    Named Entity Disambiguation (NED) is the task of linking a named-entity mention to an instance in a knowledge-base, typically Wikipedia. This task is closely related to word-sense disambiguation (WSD), where the supervised word-expert approach has prevailed. In this work we present the results of the word-expert approach to NED, where one classifier is built for each target entity mention string. The resources necessary to build the system, a dictionary and a set of training instances, have been automatically derived from Wikipedia. We provide empirical evidence of the value of this approach, as well as a study of the differences between WSD and NED, including ambiguity and synonymy statistics

    Exploiting Lists of Names for Named Entity Identification of Financial Institutions from Unstructured Documents

    Full text link
    There is a wealth of information about financial systems that is embedded in document collections. In this paper, we focus on a specialized text extraction task for this domain. The objective is to extract mentions of names of financial institutions, or FI names, from financial prospectus documents, and to identify the corresponding real world entities, e.g., by matching against a corpus of such entities. The tasks are Named Entity Recognition (NER) and Entity Resolution (ER); both are well studied in the literature. Our contribution is to develop a rule-based approach that will exploit lists of FI names for both tasks; our solution is labeled Dict-based NER and Rank-based ER. Since the FI names are typically represented by a root, and a suffix that modifies the root, we use these lists of FI names to create specialized root and suffix dictionaries. To evaluate the effectiveness of our specialized solution for extracting FI names, we compare Dict-based NER with a general purpose rule-based NER solution, ORG NER. Our evaluation highlights the benefits and limitations of specialized versus general purpose approaches, and presents additional suggestions for tuning and customization for FI name extraction. To our knowledge, our proposed solutions, Dict-based NER and Rank-based ER, and the root and suffix dictionaries, are the first attempt to exploit specialized knowledge, i.e., lists of FI names, for rule-based NER and ER

    Sentiment/Subjectivity Analysis Survey for Languages other than English

    Full text link
    Subjective and sentiment analysis have gained considerable attention recently. Most of the resources and systems built so far are done for English. The need for designing systems for other languages is increasing. This paper surveys different ways used for building systems for subjective and sentiment analysis for languages other than English. There are three different types of systems used for building these systems. The first (and the best) one is the language specific systems. The second type of systems involves reusing or transferring sentiment resources from English to the target language. The third type of methods is based on using language independent methods. The paper presents a separate section devoted to Arabic sentiment analysis.Comment: This is an accepted version in Social Network Analysis and Mining journal. The final publication will be available at Springer via http://dx.doi.org/10.1007/s13278-016-0381-

    Filling Knowledge Gaps in a Broad-Coverage Machine Translation System

    Full text link
    Knowledge-based machine translation (KBMT) techniques yield high quality in domains with detailed semantic models, limited vocabulary, and controlled input grammar. Scaling up along these dimensions means acquiring large knowledge resources. It also means behaving reasonably when definitive knowledge is not yet available. This paper describes how we can fill various KBMT knowledge gaps, often using robust statistical techniques. We describe quantitative and qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9

    Towards Turkish ASR: Anatomy of a rule-based Turkish g2p

    Full text link
    This paper describes the architecture and implementation of a rule-based grapheme to phoneme converter for Turkish. The system accepts surface form as input, outputs SAMPA mapping of the all parallel pronounciations according to the morphological analysis together with stress positions. The system has been implemented in Pytho

    Efficient Call Path Detection for Android-OS Size of Huge Source Code

    Full text link
    Today most developers utilize source code written by other parties. Because the code is modified frequently, the developers need to grasp the impact of the modification repeatedly. A call graph and especially its special type, a call path, help the developers comprehend the modification. Source code written by other parties, however, becomes too huge to be held in memory in the form of parsed data for a call graph or path. This paper offers a bidirectional search algorithm for a call graph of too huge amount of source code to store all parse results of the code in memory. It refers to a method definition in source code corresponding to the visited node in the call graph. The significant feature of the algorithm is the referenced information is used not in order to select a prioritized node to visit next but in order to select a node to postpone visiting. It reduces path extraction time by 8% for a case in which ordinary path search algorithms do not reduce the time.Comment: in Sixth International Conference on Computer Science, Engineering and Applications (CCSEA 2016), Dubai, UAE, January 23~24, 201

    Blind Source Separation with Optimal Transport Non-negative Matrix Factorization

    Full text link
    Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention. Building upon recent advances in computational optimal transport, we develop an optimal transport non-negative matrix factorization (NMF) algorithm for supervised speech blind source separation (BSS). Optimal transport allows us to design and leverage a cost between short-time Fourier transform (STFT) spectrogram frequencies, which takes into account how humans perceive sound. We give empirical evidence that using our proposed optimal transport NMF leads to perceptually better results than Euclidean NMF, for both isolated voice reconstruction and BSS tasks. Finally, we demonstrate how to use optimal transport for cross domain sound processing tasks, where frequencies represented in the input spectrograms may be different from one spectrogram to another.Comment: 22 pages, 7 figures, 2 additional file

    Lexical typology : a programmatic sketch

    Get PDF
    The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar

    Sparse Dictionary-based Attributes for Action Recognition and Summarization

    Full text link
    We present an approach for dictionary learning of action attributes via information maximization. We unify the class distribution and appearance information into an objective function for learning a sparse dictionary of action attributes. The objective function maximizes the mutual information between what has been learned and what remains to be learned in terms of appearance information and class distribution for each dictionary atom. We propose a Gaussian Process (GP) model for sparse representation to optimize the dictionary objective function. The sparse coding property allows a kernel with compact support in GP to realize a very efficient dictionary learning process. Hence we can describe an action video by a set of compact and discriminative action attributes. More importantly, we can recognize modeled action categories in a sparse feature space, which can be generalized to unseen and unmodeled action categories. Experimental results demonstrate the effectiveness of our approach in action recognition and summarization
    corecore