8 research outputs found

    A unified framework to identify and extract uncertainty cues, holders, and scopes in one fell-swoop

    Get PDF
    Uncertainty refers to the language aspects that express hypotheses and speculations where propositions are held as (un)certain, (im)probable, or (im)possible. Automatic uncertainty analysis is crucial for several Natural Language Processing (NLP) applications that need to distinguish between factual (i.e. certain) and nonfactual (i.e. negated or uncertain) information. Typically, a comprehensive automatic uncertainty analyzer has three machine learning models for uncertainty detection, attribution, and scope extraction. To-date, and to the best of my knowledge, current research on uncertainty automatic analysis has only focused on uncertainty attribution and scope extraction, and has typically tackled each task with a different machine learning approach. Furthermore, current research on uncertainty automatic analysis has been restricted to specific languages, particularly English, and to specific linguistic genres, including biomedical and newswire texts, Wikipedia articles, and product reviews. In this research project, I attempt to address the aforementioned limitations of current research on automatic uncertainty analysis. First, I develop a machine learning model for uncertainty attribution, the task typically neglected in automatic uncertainty analysis. Second, I propose a unified framework to identify and extract uncertainty cues, holders, and scopes in one-fell swoop by casting each task as a supervised token sequence labeling problem. Third, I choose to work on the Arabic language, in contrast to English, the most commonly studied language in the literature of automatic uncertainty analysis. Finally, I work on the understudied linguistic genre of tweets. This research project results in a novel NLP tool, i.e., a comprehensive automatic uncertainty analyzer for Arabic tweets, with a practical impact on NLP applications that rely on uncertainty automatic analysis. The tool yields an F1 score of 0.759, averaged across its three machine learning models. Furthermore, through this research, the research community and I gain insights into (1) the challenges presented by Arabic as an agglutinative morphologically-rich language with a flexible word order, in contrast to English; (2) the challenges of the linguistic genre of tweets for uncertainty automatic analysis; and (3) the type of challenges that my proposed unified framework successfully addresses and boosts performance for

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Tune your brown clustering, please

    Get PDF
    Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal

    A reception study of machine translated subtitles for MOOCs

    Get PDF
    As MOOCs (Massive Open Online Courses) grow rapidly around the world, the language barrier is becoming a serious issue. Removing this obstacle by creating translated subtitles is an indispensable part of developing MOOCs and improving accessibility. Given the large quantity of MOOCs available worldwide and the considerable demand for them, machine translation (MT) appears to offer an alternative or complementary translation solution, thus providing the motivation for this research. The main goal of this research is to test the impact machine translated subtitles have on Chinese viewers’ reception of MOOC content. More specifically, the author is interested in whether there is any difference between viewers’ reception of raw machine translated subtitles as opposed to fully post-edited machine translated subtitles and human translated subtitles. Reception is operationalized by adapting Gambier's (2007) model, which divides ‘reception’ into ‘the three Rs’: (i) response, (ii) reaction and (iii) repercussion. Response refers to the initial physical response of a viewer to an audio-visual stimulus, in this case the subtitle and the rest of the image. Reaction involves the cognitive follow-on from initial response, and is linked to how much effort is involved in processing the subtitling stimulus and what is understood by the viewer. Repercussion refers to attitudinal and sociocultural dimensions of AVT consumption. The research contains a pilot study and a main experiment. Mixed methods of eye-tracking, questionnaires, translation quality assessment and frequency analysis were adopted. Over 60 native Chinese speakers were recruited as participants for this research. They were divided into three groups, those who read subtitles created by raw MT, post-edited MT (PE) and human translation (HT). Results show that most participants had a positive attitude towards the subtitles regardless of their type. Participants who were offered PE subtitles scored the best overall on the selected reception metrics. Participants who were offered HT subtitles performed the worst in some of the selected reception metrics

    A Conflictive Triuvirate Consruct of Epidemiologic Systems Failure

    Get PDF
    Epidemiologic systems failure (ESF) is a major hurdle in minimizing the spread of infectious diseases during outbreaks. The reasons for ESF include the technical limitation of personnel handling epidemic crises, strictly defined health policies that limit the actions of epidemiologists, and personal perspective\u27s reservations towards the intentions of health agencies. The purpose of this triumvirate mixed-methods case study was to examine factors of infectious disease control mechanisms useful for determining ESF. Three juxtaposed pre-emptive factors (technical [T], organizational [O], and personal [P] perspectives were used to determine how the multiple perspectives inquiring systems and fuzzy logic revealed factors causing ESF so that remedial tools may be constructed. The juxtaposed ESF-TOP model formed the research theoretical framework and allowed for clustering the ESF factors. Data sources were direct quotations from TOP based secondary data of 4 well-publicized participants; who had Ebola, HIV-AIDS, Tuberculosis, or Typhoid disease; and randomized quantitative TOP hypothetical data sets were created with Microsoft Excel software and used to model an Ebola outbreak of 10 theoretical subjects. Data were analyzed using TOP guidelines from which T, O, and P perspective themes emerged. The findings indicated that a disjointed TOP perspective specifies a serious ESF, a strictly overlapped TOP indicates an effective containment of ESF, and the overall fuzzy set with T given O and P indicates the actual ESF. The findings may result in positive social change by helping epidemiologists identify critical outbreak control factors which may minimize the outbreak impact

    A Conflictive Triuvirate Consruct of Epidemiologic Systems Failure

    Get PDF
    Epidemiologic systems failure (ESF) is a major hurdle in minimizing the spread of infectious diseases during outbreaks. The reasons for ESF include the technical limitation of personnel handling epidemic crises, strictly defined health policies that limit the actions of epidemiologists, and personal perspective\u27s reservations towards the intentions of health agencies. The purpose of this triumvirate mixed-methods case study was to examine factors of infectious disease control mechanisms useful for determining ESF. Three juxtaposed pre-emptive factors (technical [T], organizational [O], and personal [P] perspectives were used to determine how the multiple perspectives inquiring systems and fuzzy logic revealed factors causing ESF so that remedial tools may be constructed. The juxtaposed ESF-TOP model formed the research theoretical framework and allowed for clustering the ESF factors. Data sources were direct quotations from TOP based secondary data of 4 well-publicized participants; who had Ebola, HIV-AIDS, Tuberculosis, or Typhoid disease; and randomized quantitative TOP hypothetical data sets were created with Microsoft Excel software and used to model an Ebola outbreak of 10 theoretical subjects. Data were analyzed using TOP guidelines from which T, O, and P perspective themes emerged. The findings indicated that a disjointed TOP perspective specifies a serious ESF, a strictly overlapped TOP indicates an effective containment of ESF, and the overall fuzzy set with T given O and P indicates the actual ESF. The findings may result in positive social change by helping epidemiologists identify critical outbreak control factors which may minimize the outbreak impact
    corecore