1,081 research outputs found

    Word Embeddings for Entity-annotated Texts

    Full text link
    Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.Comment: This paper is accepted in 41st European Conference on Information Retrieva

    Selective Breeding for a Behavioral Trait Changes Digit Ratio

    Get PDF
    The ratio of the length of the second digit (index finger) divided by the fourth digit (ring finger) tends to be lower in men than in women. This 2D∶4D digit ratio is often used as a proxy for prenatal androgen exposure in studies of human health and behavior. For example, 2D∶4D ratio is lower (i.e. more “masculinized”) in both men and women of greater physical fitness and/or sporting ability. Lab mice have also shown variation in 2D∶4D as a function of uterine environment, and mouse digit ratios seem also to correlate with behavioral traits, including daily activity levels. Selective breeding for increased rates of voluntary exercise (wheel running) in four lines of mice has caused correlated increases in aerobic exercise capacity, circulating corticosterone level, and predatory aggression. Here, we show that this selection regime has also increased 2D∶4D. This apparent “feminization” in mice is opposite to the relationship seen between 2D∶4D and physical fitness in human beings. The present results are difficult to reconcile with the notion that 2D∶4D is an effective proxy for prenatal androgen exposure; instead, it may more accurately reflect effects of glucocorticoids, or other factors that regulate any of many genes

    A Human Development Framework for CO2 Reductions

    Get PDF
    Although developing countries are called to participate in CO2 emission reduction efforts to avoid dangerous climate change, the implications of proposed reduction schemes in human development standards of developing countries remain a matter of debate. We show the existence of a positive and time-dependent correlation between the Human Development Index (HDI) and per capita CO2 emissions from fossil fuel combustion. Employing this empirical relation, extrapolating the HDI, and using three population scenarios, the cumulative CO2 emissions necessary for developing countries to achieve particular HDI thresholds are assessed following a Development As Usual approach (DAU). If current demographic and development trends are maintained, we estimate that by 2050 around 85% of the world's population will live in countries with high HDI (above 0.8). In particular, 300Gt of cumulative CO2 emissions between 2000 and 2050 are estimated to be necessary for the development of 104 developing countries in the year 2000. This value represents between 20% to 30% of previously calculated CO2 budgets limiting global warming to 2{\deg}C. These constraints and results are incorporated into a CO2 reduction framework involving four domains of climate action for individual countries. The framework reserves a fair emission path for developing countries to proceed with their development by indexing country-dependent reduction rates proportional to the HDI in order to preserve the 2{\deg}C target after a particular development threshold is reached. Under this approach, global cumulative emissions by 2050 are estimated to range from 850 up to 1100Gt of CO2. These values are within the uncertainty range of emissions to limit global temperatures to 2{\deg}C.Comment: 14 pages, 7 figures, 1 tabl

    Dalhousie dyspnea scales: construct and content validity of pictorial scales for measuring dyspnea

    Get PDF
    BACKGROUND: Because there are no child-friendly, validated, self-report measures of dyspnea or breathlessness, we developed, and provided initial validation, of three, 7-item, pictorial scales depicting three sub-constructs of dyspnea: throat closing, chest tightness, and effort. METHODS: We developed the three scales (Throat closing, Chest tightness, and Effort) using focus groups with 25 children. Subsequently, seventy-nine children (29 children with asthma, 30 children with cystic fibrosis. and 20 children who were healthy) aged 6 to 18 years rated each picture in each series, using a 0–10 scale. In addition, each child placed each picture in each series on a 100-cm long Visual Analogue Scale, with the anchors "not at all" and "a lot". RESULTS: Children aged eight years or older rated the scales in the correct order 75% to 98% correctly, but children less than 8 years of age performed unreliably. The mean distance between each consecutive item in each pictorial scale was equal. CONCLUSION: Preliminary results revealed that children aged 8 to 18 years understood and used these three scales measuring throat closing, chest tightness, and effort appropriately. The scales appear to accurately measure the construct of breathlessness, at least at an interval level. Additional research applying these scales to clinical situations is warranted

    The direct healthcare costs associated with psychological distress and major depression : A population-based cohort study in Ontario, Canada

    Get PDF
    The objective of our study was to estimate direct healthcare costs incurred by a population-based sample of people with psychological distress or depression. We used the 2002 Canadian Community Health Survey on Mental Health and Well Being and categorized individuals as having psychological distress using the Kessler-6, major depressive disorder (MDD) using DSM-IV criteria and a comparison group of participants without MDD or psychological distress. Costs in 2013 USD were estimated by linking individuals to health administrative databases and following them until March 31, 2013. Our sample consisted of 9,965 individuals, of whom 651 and 409 had psychological distress and MDD, respectively. Although the age-and-sex adjusted per-capita costs were similarly high among the psychologically distressed (3,364,953,364, 95% CI: 2,791, 3,937)andthosewithMDD(3,937) and those with MDD (3,210, 95% CI: 2,413,2,413, 4,008) compared to the comparison group (2,629,952,629, 95% CI: 2,312, 2,945),thepopulationwideexcesscostsforpsychologicaldistress(2,945), the population-wide excess costs for psychological distress (441 million) were more than twice that for MDD ($210 million) as there was a greater number of people with psychological distress than depression. We found substantial healthcare costs associated with psychological distress and depression, suggesting that psychological distress and MDD have a high cost burden and there may be public health intervention opportunities to relieve distress. Further research examining how individuals with these conditions use the healthcare system may provide insight into the allocation of limited healthcare resources while maintaining high quality care

    Exome Sequencing Reveals Comprehensive Genomic Alterations across Eight Cancer Cell Lines

    Get PDF
    It is well established that genomic alterations play an essential role in oncogenesis, disease progression, and response of tumors to therapeutic intervention. The advances of next-generation sequencing technologies (NGS) provide unprecedented capabilities to scan genomes for changes such as mutations, deletions, and alterations of chromosomal copy number. However, the cost of full-genome sequencing still prevents the routine application of NGS in many areas. Capturing and sequencing the coding exons of genes (the “exome”) can be a cost-effective approach for identifying changes that result in alteration of protein sequences. We applied an exome-sequencing technology (Roche Nimblegen capture paired with 454 sequencing) to identify sequence variation and mutations in eight commonly used cancer cell lines from a variety of tissue origins (A2780, A549, Colo205, GTL16, NCI-H661, MDA-MB468, PC3, and RD). We showed that this technology can accurately identify sequence variation, providing ∼95% concordance with Affymetrix SNP Array 6.0 performed on the same cell lines. Furthermore, we detected 19 of the 21 mutations reported in Sanger COSMIC database for these cell lines. We identified an average of 2,779 potential novel sequence variations/mutations per cell line, of which 1,904 were non-synonymous. Many non-synonymous changes were identified in kinases and known cancer-related genes. In addition we confirmed that the read-depth of exome sequence data can be used to estimate high-level gene amplifications and identify homologous deletions. In summary, we demonstrate that exome sequencing can be a reliable and cost-effective way for identifying alterations in cancer genomes, and we have generated a comprehensive catalogue of genomic alterations in coding regions of eight cancer cell lines. These findings could provide important insights into cancer pathways and mechanisms of resistance to anti-cancer therapies

    Characterization of pathogenic germline mutations in human Protein Kinases

    Get PDF
    Background Protein Kinases are a superfamily of proteins involved in crucial cellular processes such as cell cycle regulation and signal transduction. Accordingly, they play an important role in cancer biology. To contribute to the study of the relation between kinases and disease we compared pathogenic mutations to neutral mutations as an extension to our previous analysis of cancer somatic mutations. First, we analyzed native and mutant proteins in terms of amino acid composition. Secondly, mutations were characterized according to their potential structural effects and finally, we assessed the location of the different classes of polymorphisms with respect to kinase-relevant positions in terms of subfamily specificity, conservation, accessibility and functional sites.<p></p> Results Pathogenic Protein Kinase mutations perturb essential aspects of protein function, including disruption of substrate binding and/or effector recognition at family-specific positions. Interestingly these mutations in Protein Kinases display a tendency to avoid structurally relevant positions, what represents a significant difference with respect to the average distribution of pathogenic mutations in other protein families.<p></p> Conclusions Disease-associated mutations display sound differences with respect to neutral mutations: several amino acids are specific of each mutation type, different structural properties characterize each class and the distribution of pathogenic mutations within the consensus structure of the Protein Kinase domain is substantially different to that for non-pathogenic mutations. This preferential distribution confirms previous observations about the functional and structural distribution of the controversial cancer driver and passenger somatic mutations and their use as a proxy for the study of the involvement of somatic mutations in cancer development.<p></p&gt
    corecore