61 research outputs found

    Feature extraction and selection for Arabic tweets authorship authentication

    Get PDF
    Ā© 2017, Springer-Verlag Berlin Heidelberg. In tweet authentication, we are concerned with correctly attributing a tweet to its true author based on its textual content. The more general problem of authenticating long documents has been studied before and the most common approach relies on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Inspired by the success of modern automatic document classification problem, some researchers followed the Bag-Of-Words (BOW) approach for authenticating long documents. In this work, we consider both approaches and their application on authenticating tweets, which represent additional challenges due to the limitation in their sizes. We focus on the Arabic language due to its importance and the scarcity of works related on it. We create different sets of features from both approaches and compare the performance of different classifiers using them. We experiment with various feature selection techniques in order to extract the most discriminating features. To the best of our knowledge, this is the first study of its kind to combine these different sets of features for authorship analysis of Arabic tweets. The results show that combining all the feature sets we compute yields the best results

    Hardship financing of healthcare among rural poor in Orissa, India

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This study examines health-related "hardship financing" in order to get better insights on how poor households finance their out-of-pocket healthcare costs. We define hardship financing as having to borrow money with interest or to sell assets to pay out-of-pocket healthcare costs.</p> <p>Methods</p> <p>Using survey data of 5,383 low-income households in Orissa, one of the poorest states of India, we investigate factors influencing the risk of hardship financing with the use of a logistic regression.</p> <p>Results</p> <p>Overall, about 25% of the households (that had any healthcare cost) reported hardship financing during the year preceding the survey. Among households that experienced a hospitalization, this percentage was nearly 40%, but even among households with outpatient or maternity-related care around 25% experienced hardship financing.</p> <p>Hardship financing is explained not merely by the wealth of the household (measured by assets) or how much is spent out-of-pocket on healthcare costs, but also by when the payment occurs, its frequency and its duration (e.g. more severe in cases of chronic illnesses). The location where a household resides remains a major predictor of the likelihood to have hardship financing despite all other household features included in the model.</p> <p>Conclusions</p> <p>Rural poor households are subjected to considerable and protracted financial hardship due to the indirect and longer-term deleterious effects of how they cope with out-of-pocket healthcare costs. The social network that households can access influences exposure to hardship financing. Our findings point to the need to develop a policy solution that would limit that exposure both in quantum and in time. We therefore conclude that policy interventions aiming to ensure health-related financial protection would have to demonstrate that they have reduced the frequency and the volume of hardship financing.</p

    Lifestyle and diet in relation to risk of type 2 diabetes in Vietnam: a hospital-based case-control study.

    Get PDF
    BACKGROUND: Lifestyle and diet are important determinants of type 2 diabetes (T2D). Their impact on T2D can be evaluated using clinical and epidemiological approaches. Randomised controlled trials are the most rigorous design but expensive to conduct, whereas prospective cohort studies are time-consuming and less powerful for populations with a low incidence of the disease. Case-control studies are considered appropriate in resource-limited settings. A hospital-based case-control study protocol has been developed to investigate the role of lifestyle and dietary factors in T2D aetiology for adults in Vietnam. METHODS: A total of 1100 patients aged 40-65Ā years (550 T2D cases and 550 controls) will be recruited from a tertiary hospital in Hanoi, the capital city of Vietnam. Cases and controls will be frequency-matched on age (Ā±3Ā years), gender, and residential location. T2D will be diagnosed according to the 2006 World Health Organisation criteria. Habitual physical activity will be assessed by the Vietnamese version of the International Physical Activity Questionnaire-Short Form. Food and beverage consumption will be ascertained using a Validated Food Frequency Questionnaire, specifically developed for the Vietnamese population. Information on demographic and other personal characteristics will be collected, together with anthropometric and blood pressure measurements. Descriptive statistics and unconditional logistic regression analyses will be performed to examine factors associated with the T2D prevalence. DISCUSSION: The proposed study will elucidate the role of lifestyle and diet in T2D prevalence among Vietnamese adults. Findings concerning pertinent factors will provide epidemiological evidence for the development of focused interventions, and contribute to the formulation of national policies to prevent and control T2D in Vietnam

    Discrimination in lexical decision.

    Get PDF
    In this study we present a novel set of discrimination-based indicators of language processing derived from Naive Discriminative Learning (ndl) theory. We compare the effectiveness of these new measures with classical lexical-distributional measures-in particular, frequency counts and form similarity measures-to predict lexical decision latencies when a complete morphological segmentation of masked primes is or is not possible. Data derive from a re-analysis of a large subset of decision latencies from the English Lexicon Project, as well as from the results of two new masked priming studies. Results demonstrate the superiority of discrimination-based predictors over lexical-distributional predictors alone, across both the simple and primed lexical decision tasks. Comparable priming after masked corner and cornea type primes, across two experiments, fails to support early obligatory segmentation into morphemes as predicted by the morpho-orthographic account of reading. Results fit well with ndl theory, which, in conformity with Word and Paradigm theory, rejects the morpheme as a relevant unit of analysis. Furthermore, results indicate that readers with greater spelling proficiency and larger vocabularies make better use of orthographic priors and handle lexical competition more efficiently

    A DEA Analysis of Risk, Cost, and Revenues in Insurance

    No full text
    Insurance companies have to take risk and cost into account when pricing car insurance policies that cover the risk of private use of cars. In this paper we use data from 80 000 car insurance policies in order to assess, once risk and cost have been taken into account, the combinations of risk that generate the highest returns for the company under existing pricing practices. We use data envelopment analysis (DEA) and frame the study within an analysis of experiments context. The results of DEA are interpreted in a multivariate statistical analysis context using factor analysis, and property fitting techniques. The impact of risk factors in the efficiency is explored by means of regression analysis with dummy variables. There are consequences for the pricing policy of the company
    • ā€¦
    corecore