112 research outputs found

    Investigating Rumor News Using Agreement-Aware Search

    Full text link
    Recent years have witnessed a widespread increase of rumor news generated by humans and machines. Therefore, tools for investigating rumor news have become an urgent necessity. One useful function of such tools is to see ways a specific topic or event is represented by presenting different points of view from multiple sources. In this paper, we propose Maester, a novel agreement-aware search framework for investigating rumor news. Given an investigative question, Maester will retrieve related articles to that question, assign and display top articles from agree, disagree, and discuss categories to users. Splitting the results into these three categories provides the user a holistic view towards the investigative question. We build Maester based on the following two key observations: (1) relatedness can commonly be determined by keywords and entities occurring in both questions and articles, and (2) the level of agreement between the investigative question and the related news article can often be decided by a few key sentences. Accordingly, we use gradient boosting tree models with keyword/entity matching features for relatedness detection, and leverage recurrent neural network to infer the level of agreement. Our experiments on the Fake News Challenge (FNC) dataset demonstrate up to an order of magnitude improvement of Maester over the original FNC winning solution, for agreement-aware search

    Holistic corpus-based dialectology

    Get PDF
    This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpusbased dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i) abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii) draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis), and (iii) aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain

    Feature extraction and selection for Arabic tweets authorship authentication

    Get PDF
    © 2017, Springer-Verlag Berlin Heidelberg. In tweet authentication, we are concerned with correctly attributing a tweet to its true author based on its textual content. The more general problem of authenticating long documents has been studied before and the most common approach relies on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Inspired by the success of modern automatic document classification problem, some researchers followed the Bag-Of-Words (BOW) approach for authenticating long documents. In this work, we consider both approaches and their application on authenticating tweets, which represent additional challenges due to the limitation in their sizes. We focus on the Arabic language due to its importance and the scarcity of works related on it. We create different sets of features from both approaches and compare the performance of different classifiers using them. We experiment with various feature selection techniques in order to extract the most discriminating features. To the best of our knowledge, this is the first study of its kind to combine these different sets of features for authorship analysis of Arabic tweets. The results show that combining all the feature sets we compute yields the best results

    Hardship financing of healthcare among rural poor in Orissa, India

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This study examines health-related "hardship financing" in order to get better insights on how poor households finance their out-of-pocket healthcare costs. We define hardship financing as having to borrow money with interest or to sell assets to pay out-of-pocket healthcare costs.</p> <p>Methods</p> <p>Using survey data of 5,383 low-income households in Orissa, one of the poorest states of India, we investigate factors influencing the risk of hardship financing with the use of a logistic regression.</p> <p>Results</p> <p>Overall, about 25% of the households (that had any healthcare cost) reported hardship financing during the year preceding the survey. Among households that experienced a hospitalization, this percentage was nearly 40%, but even among households with outpatient or maternity-related care around 25% experienced hardship financing.</p> <p>Hardship financing is explained not merely by the wealth of the household (measured by assets) or how much is spent out-of-pocket on healthcare costs, but also by when the payment occurs, its frequency and its duration (e.g. more severe in cases of chronic illnesses). The location where a household resides remains a major predictor of the likelihood to have hardship financing despite all other household features included in the model.</p> <p>Conclusions</p> <p>Rural poor households are subjected to considerable and protracted financial hardship due to the indirect and longer-term deleterious effects of how they cope with out-of-pocket healthcare costs. The social network that households can access influences exposure to hardship financing. Our findings point to the need to develop a policy solution that would limit that exposure both in quantum and in time. We therefore conclude that policy interventions aiming to ensure health-related financial protection would have to demonstrate that they have reduced the frequency and the volume of hardship financing.</p

    Lifestyle and diet in relation to risk of type 2 diabetes in Vietnam: a hospital-based case-control study.

    Get PDF
    BACKGROUND: Lifestyle and diet are important determinants of type 2 diabetes (T2D). Their impact on T2D can be evaluated using clinical and epidemiological approaches. Randomised controlled trials are the most rigorous design but expensive to conduct, whereas prospective cohort studies are time-consuming and less powerful for populations with a low incidence of the disease. Case-control studies are considered appropriate in resource-limited settings. A hospital-based case-control study protocol has been developed to investigate the role of lifestyle and dietary factors in T2D aetiology for adults in Vietnam. METHODS: A total of 1100 patients aged 40-65 years (550 T2D cases and 550 controls) will be recruited from a tertiary hospital in Hanoi, the capital city of Vietnam. Cases and controls will be frequency-matched on age (±3 years), gender, and residential location. T2D will be diagnosed according to the 2006 World Health Organisation criteria. Habitual physical activity will be assessed by the Vietnamese version of the International Physical Activity Questionnaire-Short Form. Food and beverage consumption will be ascertained using a Validated Food Frequency Questionnaire, specifically developed for the Vietnamese population. Information on demographic and other personal characteristics will be collected, together with anthropometric and blood pressure measurements. Descriptive statistics and unconditional logistic regression analyses will be performed to examine factors associated with the T2D prevalence. DISCUSSION: The proposed study will elucidate the role of lifestyle and diet in T2D prevalence among Vietnamese adults. Findings concerning pertinent factors will provide epidemiological evidence for the development of focused interventions, and contribute to the formulation of national policies to prevent and control T2D in Vietnam

    Discrimination in lexical decision.

    Get PDF
    In this study we present a novel set of discrimination-based indicators of language processing derived from Naive Discriminative Learning (ndl) theory. We compare the effectiveness of these new measures with classical lexical-distributional measures-in particular, frequency counts and form similarity measures-to predict lexical decision latencies when a complete morphological segmentation of masked primes is or is not possible. Data derive from a re-analysis of a large subset of decision latencies from the English Lexicon Project, as well as from the results of two new masked priming studies. Results demonstrate the superiority of discrimination-based predictors over lexical-distributional predictors alone, across both the simple and primed lexical decision tasks. Comparable priming after masked corner and cornea type primes, across two experiments, fails to support early obligatory segmentation into morphemes as predicted by the morpho-orthographic account of reading. Results fit well with ndl theory, which, in conformity with Word and Paradigm theory, rejects the morpheme as a relevant unit of analysis. Furthermore, results indicate that readers with greater spelling proficiency and larger vocabularies make better use of orthographic priors and handle lexical competition more efficiently
    corecore