6,012 research outputs found
A usability evaluation of the prototype Afrikaanse idiome-woordeboek
The Afrikaanse idiome-woordeboek is a prototype e-dictionary of Afrikaans fixed expres-sions developed with the intention to test the functionalities of the e-dictionary. This dictionary is based on the function theory of lexicography. The e-dictionary makes use of various technologies. When digital tools are developed it is important to consider the usability of the tool. Usability evaluation was done on the Afrikaanse idiome-woordeboek to determine with what success it can be used. Discount usability methods, viz. heuristic evaluation and usability testing were used. This article reports on the findings from the usability tests which are discussed under the categories of content, information architecture, navigation, access (searching and browsing), help, customisation and the use of innovative technologies to manage data in e-dictionaries for search and display. The usability evaluation showed that the users did not always use the e-dictionary as the designers intended. Various recommendations are made to the designers of the Afrikaanse idiome-woordeboek, as well as for the design of e-dictionaries in general. Recommendations appropriate to e-dictionaries in general are made regarding usability evaluation, information architecture, searching in e-dic-tionaries, the data that can be included in e-dictionaries and training of users of e-dictionaries. Keywords: E-Dictionaries, Online Dictionaries, Electronic Dictionaries, Usability Evaluation, Usability Tests, Discount Usability, Dictionary Literacy
Evaluating the word-expert approach for Named-Entity Disambiguation
Named Entity Disambiguation (NED) is the task of linking a named-entity
mention to an instance in a knowledge-base, typically Wikipedia. This task is
closely related to word-sense disambiguation (WSD), where the supervised
word-expert approach has prevailed. In this work we present the results of the
word-expert approach to NED, where one classifier is built for each target
entity mention string. The resources necessary to build the system, a
dictionary and a set of training instances, have been automatically derived
from Wikipedia. We provide empirical evidence of the value of this approach, as
well as a study of the differences between WSD and NED, including ambiguity and
synonymy statistics
Exploiting Lists of Names for Named Entity Identification of Financial Institutions from Unstructured Documents
There is a wealth of information about financial systems that is embedded in
document collections. In this paper, we focus on a specialized text extraction
task for this domain. The objective is to extract mentions of names of
financial institutions, or FI names, from financial prospectus documents, and
to identify the corresponding real world entities, e.g., by matching against a
corpus of such entities. The tasks are Named Entity Recognition (NER) and
Entity Resolution (ER); both are well studied in the literature. Our
contribution is to develop a rule-based approach that will exploit lists of FI
names for both tasks; our solution is labeled Dict-based NER and Rank-based ER.
Since the FI names are typically represented by a root, and a suffix that
modifies the root, we use these lists of FI names to create specialized root
and suffix dictionaries. To evaluate the effectiveness of our specialized
solution for extracting FI names, we compare Dict-based NER with a general
purpose rule-based NER solution, ORG NER. Our evaluation highlights the
benefits and limitations of specialized versus general purpose approaches, and
presents additional suggestions for tuning and customization for FI name
extraction. To our knowledge, our proposed solutions, Dict-based NER and
Rank-based ER, and the root and suffix dictionaries, are the first attempt to
exploit specialized knowledge, i.e., lists of FI names, for rule-based NER and
ER
Sentiment/Subjectivity Analysis Survey for Languages other than English
Subjective and sentiment analysis have gained considerable attention
recently. Most of the resources and systems built so far are done for English.
The need for designing systems for other languages is increasing. This paper
surveys different ways used for building systems for subjective and sentiment
analysis for languages other than English. There are three different types of
systems used for building these systems. The first (and the best) one is the
language specific systems. The second type of systems involves reusing or
transferring sentiment resources from English to the target language. The third
type of methods is based on using language independent methods. The paper
presents a separate section devoted to Arabic sentiment analysis.Comment: This is an accepted version in Social Network Analysis and Mining
journal. The final publication will be available at Springer via
http://dx.doi.org/10.1007/s13278-016-0381-
Filling Knowledge Gaps in a Broad-Coverage Machine Translation System
Knowledge-based machine translation (KBMT) techniques yield high quality in
domains with detailed semantic models, limited vocabulary, and controlled input
grammar. Scaling up along these dimensions means acquiring large knowledge
resources. It also means behaving reasonably when definitive knowledge is not
yet available. This paper describes how we can fill various KBMT knowledge
gaps, often using robust statistical techniques. We describe quantitative and
qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT
system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9
Towards Turkish ASR: Anatomy of a rule-based Turkish g2p
This paper describes the architecture and implementation of a rule-based
grapheme to phoneme converter for Turkish. The system accepts surface form as
input, outputs SAMPA mapping of the all parallel pronounciations according to
the morphological analysis together with stress positions. The system has been
implemented in Pytho
Efficient Call Path Detection for Android-OS Size of Huge Source Code
Today most developers utilize source code written by other parties. Because
the code is modified frequently, the developers need to grasp the impact of the
modification repeatedly. A call graph and especially its special type, a call
path, help the developers comprehend the modification. Source code written by
other parties, however, becomes too huge to be held in memory in the form of
parsed data for a call graph or path. This paper offers a bidirectional search
algorithm for a call graph of too huge amount of source code to store all parse
results of the code in memory. It refers to a method definition in source code
corresponding to the visited node in the call graph. The significant feature of
the algorithm is the referenced information is used not in order to select a
prioritized node to visit next but in order to select a node to postpone
visiting. It reduces path extraction time by 8% for a case in which ordinary
path search algorithms do not reduce the time.Comment: in Sixth International Conference on Computer Science, Engineering
and Applications (CCSEA 2016), Dubai, UAE, January 23~24, 201
Blind Source Separation with Optimal Transport Non-negative Matrix Factorization
Optimal transport as a loss for machine learning optimization problems has
recently gained a lot of attention. Building upon recent advances in
computational optimal transport, we develop an optimal transport non-negative
matrix factorization (NMF) algorithm for supervised speech blind source
separation (BSS). Optimal transport allows us to design and leverage a cost
between short-time Fourier transform (STFT) spectrogram frequencies, which
takes into account how humans perceive sound. We give empirical evidence that
using our proposed optimal transport NMF leads to perceptually better results
than Euclidean NMF, for both isolated voice reconstruction and BSS tasks.
Finally, we demonstrate how to use optimal transport for cross domain sound
processing tasks, where frequencies represented in the input spectrograms may
be different from one spectrogram to another.Comment: 22 pages, 7 figures, 2 additional file
Lexical typology : a programmatic sketch
The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar
Sparse Dictionary-based Attributes for Action Recognition and Summarization
We present an approach for dictionary learning of action attributes via
information maximization. We unify the class distribution and appearance
information into an objective function for learning a sparse dictionary of
action attributes. The objective function maximizes the mutual information
between what has been learned and what remains to be learned in terms of
appearance information and class distribution for each dictionary atom. We
propose a Gaussian Process (GP) model for sparse representation to optimize the
dictionary objective function. The sparse coding property allows a kernel with
compact support in GP to realize a very efficient dictionary learning process.
Hence we can describe an action video by a set of compact and discriminative
action attributes. More importantly, we can recognize modeled action categories
in a sparse feature space, which can be generalized to unseen and unmodeled
action categories. Experimental results demonstrate the effectiveness of our
approach in action recognition and summarization
- …