583,260 research outputs found

    English-Hindi transliteration using context-informed PB-SMT: the DCU system for NEWS 2009

    Get PDF
    This paper presents English—Hindi transliteration in the NEWS 2009 Machine Transliteration Shared Task adding source context modeling into state-of-the-art log-linear phrase-based statistical machine translation (PB-SMT). Source context features enable us to exploit source similarity in addition to target similarity, as modelled by the language model. We use a memory-based classification framework that enables efficient estimation of these features while avoiding data sparseness problems.We carried out experiments both at character and transliteration unit (TU) level. Position-dependent source context features produce significant improvements in terms of all evaluation metrics

    Streaming Similarity Self-Join

    Full text link
    We introduce and study the problem of computing the similarity self-join in a streaming context (SSSJ), where the input is an unbounded stream of items arriving continuously. The goal is to find all pairs of items in the stream whose similarity is greater than a given threshold. The simplest formulation of the problem requires unbounded memory, and thus, it is intractable. To make the problem feasible, we introduce the notion of time-dependent similarity: the similarity of two items decreases with the difference in their arrival time. By leveraging the properties of this time-dependent similarity function, we design two algorithmic frameworks to solve the sssj problem. The first one, MiniBatch (MB), uses existing index-based filtering techniques for the static version of the problem, and combines them in a pipeline. The second framework, Streaming (STR), adds time filtering to the existing indexes, and integrates new time-based bounds deeply in the working of the algorithms. We also introduce a new indexing technique (L2), which is based on an existing state-of-the-art indexing technique (L2AP), but is optimized for the streaming case. Extensive experiments show that the STR algorithm, when instantiated with the L2 index, is the most scalable option across a wide array of datasets and parameters

    Exploring cultural factors in human-robot interaction: A matter of personality?

    Get PDF
    This paper proposes an experimental study to investigate task-dependence and cultural-background dependence of the personality trait attribution on humanoid robots. In Human-Robot Interaction, as well as in Human-Agent Interaction research, the attribution of personality traits towards intelligent agents has already been researched intensively in terms of the social similarity or complementary rule. These two rules imply that humans either tend to like others with similar personality traits or complementary personality traits more. Even though state of the art literature suggests that similarity attraction happens for virtual agents, and complementary attraction for robots, there are many contradictions in the findings. We assume that searching the explanation for personality trait attribution in the similarity and complementary rule does not take into account important contextual factors. Just like people equate certain personality types to certain professions, we expect that people may have certain personality expectations depending on the context of the task the robot carries out. Because professions have different social meaning in different national culture, we also expect that these task-dependent personality preferences differ across cultures. Therefore suggest an experiment that considers the task-context and the cultural background of users

    CoSimLex : A Resource for Evaluating Graded Word Similarity in Context

    Get PDF
    State of the art natural language processing tools are built on context-dependent word embeddings, but no direct method for evaluating these representations currently exists. Standard tasks and datasets for intrinsic evaluation of embeddings are based on judgements of similarity, but ignore context; standard tasks for word sense disambiguation take account of context but do not provide continuous measures of meaning similarity. This paper describes an effort to build a new dataset, CoSimLex, intended to fill this gap. Building on the standard pairwise similarity task of SimLex-999, it provides context-dependent similarity measures; covers not only discrete differences in word sense but more subtle, graded changes in meaning; and covers not only a well-resourced language (English) but a number of less-resourced languages. We define the task and evaluation metrics, outline the dataset collection methodology, and describe the status of the dataset so far.Peer reviewe
    corecore