2,945 research outputs found

    Semi-supervised latent variable models for sentence-level sentiment analysis

    Get PDF
    We derive two variants of a semi-supervised model for fine-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classifiers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efficient estimation and inference algorithms with rich feature definitions. We describe the two variants as well as their component models and verify experimentally that both variants give significantly improved results for sentence-level sentiment analysis compared to all baselines

    A Universal Part-of-Speech Tagset

    Full text link
    To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via two experiments, including one that reports competitive accuracies for unsupervised grammar induction without gold standard part-of-speech tags

    Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure

    Get PDF
    It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%

    Cancer-related health behaviors and health service use among Inuit and other residents of Canada’s north

    Get PDF
    Objective – To identify the extent to which differences between Inuit and other residents of Canada’s North in a set of health behaviors and health service use related to cancer incidence and diagnosis can be accounted for by demographic, socio-economic and geographic factors. Study Design – Data on residents aged 21-65 who live in Canada’s North are drawn from the 2000-01 and 2004-05 Canadian Community Health Surveys and the 2001 Aboriginal People’s Survey. Methods – Multivariate Logistic regression analysis is applied to 1) a set of health behaviors including smoking, binge drinking and obesity, and 2) a set of basic health service use measures including consultations with a physician and with any medical professional, Pap smear testing and mammography. Results – Higher smoking and binge drinking rates and lower rates of female cancer screening among Inuit are not accounted for by differences in demographic characteristics, education, location of residence or distance from a hospital. Conclusions – Factors specific to Inuit individuals and communities may be contributing to negative health behaviors associated with increased cancer risk, and to a lower incidence of diagnostic cancer screening. Policy interventions to address these issues may need to be targeted specifically to Inuit Canadians.Inuit, aboriginal, cancer screening, smoking, health

    A dynamic modelling environment for the evaluation of wide area protection systems

    Get PDF
    This paper introduces the concept of dynamic modelling for wide area and adaptive power system protection. Although not limited to these types of protection schemes, these were chosen due to their potential role in solving a multitude of protection challenges facing future power systems. The dynamic modelling will be implemented using a bespoke simulation environment. This tool allows for a fully integrated testing methodology which enables the validation of protection solutions prior to their operational deployment. Furthermore the paper suggests a distributed protection architecture, which when applied to existing and future protection schemes, has the potential to enhance their functionality and avoid mal-operation given that safety and reliability of power systems are paramount. This architecture also provides a means to better understand the underlying dynamics of the aforementioned protection schemes and will be rigorously validated using the modelling environment

    PMKNS for PIE: Parsed Morphological KATR Networks of Sanskrit for Proto-Indo-European

    Get PDF
    In this thesis, I construct two computational networks for Sanskrit to test theories of nominal accentuation as a way of examining the simplicity of each theory. I will be examining the Paradigmatic Approach and the Compositional Approach to nominal accentuation. For the Paradigmatic Approach, nominals are categorized into mobile and static categories based on how the accent appears in the paradigm (Fortson 2010). For the Compositional Approach, accent mobility is a result of the combination of morphemes and their inherent accent states (Kirparsky 2010). To construct these networks, I use the KATR extension to the DATR language for lexical knowledge representation (Finkel et al. 2002). In Chapter 1, I give an overview of Proto-Indo-European (PIE) accentuation and KATR. Chapter 2 presents my methods and connects the hypothetical nature of PIE to the well-documented Indo-European (IE) language Sanskrit. In Chapters 3 and 4, I use a guided derivation of a Sanskrit r-stem nominal pitr̥- and a Sanskrit a-stem nominal sukha- to walk us through each step. Chapter 5 is an analysis of my results for the two networks from chapters 3 and 4 and then the overall conclusions I have drawn from the project and suggests further areas of expansion

    Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging

    Get PDF
    We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages