114 research outputs found

    Investigating cross-language speech retrieval for a spontaneous conversational speech collection

    Get PDF
    Cross-language retrieval of spontaneous speech combines the challenges of working with noisy automated transcription and language translation. The CLEF 2005 Cross-Language Speech Retrieval (CL-SR) task provides a standard test collection to investigate these challenges. We show that we can improve retrieval performance: by careful selection of the term weighting scheme; by decomposing automated transcripts into phonetic substrings to help ameliorate transcription errors; and by combining automatic transcriptions with manually-assigned metadata. We further show that topic translation with online machine translation resources yields effective CL-SR

    A New Approach of Intelligent Data Retrieval Paradigm

    Get PDF
    What is a real time agent, how does it remedy ongoing daily frustrations for users, and how does it improve the retrieval performance in World Wide Web? These are the main question we focus on this manuscript. In many distributed information retrieval systems, information in agents should be ranked based on a combination of multiple criteria. Linear combination of ranks has been the dominant approach due to its simplicity and effectiveness. Such a combination scheme in distributed infrastructure requires that the ranks in resources or agents are comparable to each other before combined. The main challenge is transforming the raw rank values of different criteria appropriately to make them comparable before any combination. Different ways for ranking agents make this strategy difficult. In this research, we will demonstrate how to rank Web documents based on resource-provided information how to combine several resources raking schemas in one time. The proposed system was implemented specifically in data provided by agents to create a comparable combination for different attributes. The proposed approach was tested on the queries provided by Text Retrieval Conference (TREC). Experimental results showed that our approach is effective and robust compared with offline search platforms

    Text Segmentation Using Roget-Based Weighted Lexical Chains

    Get PDF
    In this article we present a new method for text segmentation. The method relies on the number of lexical chains (LCs) which end in a sentence, which begin in the following sentence and which traverse the two successive sentences. The lexical chains are based on Roget's thesaurus (the 1987 and the 1911 version). We evaluate the method on ten texts from the DUC 2002 conference and on twenty texts from the CAST project corpus, using a manual segmentation as gold standard

    Identification of translationese: a machine learning approach

    Get PDF
    This paper presents a machine learning approach to the study of translationese. The goal is to train a computer system to distinguish between translated and non-translated text, in order to determine the characteristic features that influence the classifiers. Several algorithms reach up to 97.62% success rate on a technical dataset. Moreover, the SVM classifier consistently reports a statistically significant improved accuracy when the learning system benefits from the addition of simplification features to the basic translational classifier system. Therefore, these findings may be considered an argument for the existence of the Simplification Universal

    Multi-task learning to detect suicide ideation and mental disorders among social media users

    Get PDF
    Mental disorders and suicide are considered global health problems faced by many countries worldwide. Even though advancements have been made to improve mental wellbeing through research, there is room for improvement. Using Artificial Intelligence to early detect individuals susceptible to mental illness and suicide ideation based on their social media postings is one way to start. This research investigates the effectiveness of using a shared representation to automatically extract features between the two different yet related tasks of mental illness and suicide ideation detection using data in parallel from social media platforms with different distributions. In addition to discovering the shared features between users with suicidal thoughts and users who self-declared a single mental disorder, we further investigate the impact of comorbidity on suicide ideation and use two datasets during inference to test the generalizability of the trained models and provide satisfactory evidence to validate the increased predictive accurateness of suicide risk when using data from users diagnosed with multiple mental disorders compared to a single mental disorder for the mental illness detection task. Our results also demonstrate different mental disorders' impact on suicidal risk and discover a noticeable impact when using data from users diagnosed with Post-Traumatic Stress Disorder. We use multi-task learning (MTL) with soft and hard parameter sharing to produce state-of-the-art results for detecting users with suicide ideation who require urgent attention. We further improve the predictability of the proposed model by demonstrating the effectiveness of cross-platform knowledge sharing and predefined auxiliary inputs

    Local-Global Vectors to Improve Unigram Terminology Extraction

    Get PDF
    The present paper explores a novel method that integrates efficient distributed representations with terminology extraction. We show that the information from a small number of observed instances can be combined with local and global word embeddings to remarkably improve the term extraction results on unigram terms. To do so, we pass the terms extracted by other tools to a filter made of the local-global embeddings and a classifier which in turn decides whether or not a term candidate is a term. The filter can also be used as a hub to merge different term extraction tools into a single higher-performing system. We compare filters that use the skipgram architecture and filters that employ the CBOW architecture for the task at hand
    corecore