1,081 research outputs found
Word Embeddings for Entity-annotated Texts
Learned vector representations of words are useful tools for many information
retrieval and natural language processing tasks due to their ability to capture
lexical semantics. However, while many such tasks involve or even rely on named
entities as central components, popular word embedding models have so far
failed to include entities as first-class citizens. While it seems intuitive
that annotating named entities in the training corpus should result in more
intelligent word features for downstream tasks, performance issues arise when
popular embedding approaches are naively applied to entity annotated corpora.
Not only are the resulting entity embeddings less useful than expected, but one
also finds that the performance of the non-entity word embeddings degrades in
comparison to those trained on the raw, unannotated corpus. In this paper, we
investigate approaches to jointly train word and entity embeddings on a large
corpus with automatically annotated and linked entities. We discuss two
distinct approaches to the generation of such embeddings, namely the training
of state-of-the-art embeddings on raw-text and annotated versions of the
corpus, as well as node embeddings of a co-occurrence graph representation of
the annotated corpus. We compare the performance of annotated embeddings and
classical word embeddings on a variety of word similarity, analogy, and
clustering evaluation tasks, and investigate their performance in
entity-specific tasks. Our findings show that it takes more than training
popular word embedding models on an annotated corpus to create entity
embeddings with acceptable performance on common test cases. Based on these
results, we discuss how and when node embeddings of the co-occurrence graph
representation of the text can restore the performance.Comment: This paper is accepted in 41st European Conference on Information
Retrieva
Selective Breeding for a Behavioral Trait Changes Digit Ratio
The ratio of the length of the second digit (index finger) divided by the fourth digit (ring finger) tends to be lower in men than in women. This 2D∶4D digit ratio is often used as a proxy for prenatal androgen exposure in studies of human health and behavior. For example, 2D∶4D ratio is lower (i.e. more “masculinized”) in both men and women of greater physical fitness and/or sporting ability. Lab mice have also shown variation in 2D∶4D as a function of uterine environment, and mouse digit ratios seem also to correlate with behavioral traits, including daily activity levels. Selective breeding for increased rates of voluntary exercise (wheel running) in four lines of mice has caused correlated increases in aerobic exercise capacity, circulating corticosterone level, and predatory aggression. Here, we show that this selection regime has also increased 2D∶4D. This apparent “feminization” in mice is opposite to the relationship seen between 2D∶4D and physical fitness in human beings. The present results are difficult to reconcile with the notion that 2D∶4D is an effective proxy for prenatal androgen exposure; instead, it may more accurately reflect effects of glucocorticoids, or other factors that regulate any of many genes
A Human Development Framework for CO2 Reductions
Although developing countries are called to participate in CO2 emission
reduction efforts to avoid dangerous climate change, the implications of
proposed reduction schemes in human development standards of developing
countries remain a matter of debate. We show the existence of a positive and
time-dependent correlation between the Human Development Index (HDI) and per
capita CO2 emissions from fossil fuel combustion. Employing this empirical
relation, extrapolating the HDI, and using three population scenarios, the
cumulative CO2 emissions necessary for developing countries to achieve
particular HDI thresholds are assessed following a Development As Usual
approach (DAU). If current demographic and development trends are maintained,
we estimate that by 2050 around 85% of the world's population will live in
countries with high HDI (above 0.8). In particular, 300Gt of cumulative CO2
emissions between 2000 and 2050 are estimated to be necessary for the
development of 104 developing countries in the year 2000. This value represents
between 20% to 30% of previously calculated CO2 budgets limiting global warming
to 2{\deg}C. These constraints and results are incorporated into a CO2
reduction framework involving four domains of climate action for individual
countries. The framework reserves a fair emission path for developing countries
to proceed with their development by indexing country-dependent reduction rates
proportional to the HDI in order to preserve the 2{\deg}C target after a
particular development threshold is reached. Under this approach, global
cumulative emissions by 2050 are estimated to range from 850 up to 1100Gt of
CO2. These values are within the uncertainty range of emissions to limit global
temperatures to 2{\deg}C.Comment: 14 pages, 7 figures, 1 tabl
Dalhousie dyspnea scales: construct and content validity of pictorial scales for measuring dyspnea
BACKGROUND: Because there are no child-friendly, validated, self-report measures of dyspnea or breathlessness, we developed, and provided initial validation, of three, 7-item, pictorial scales depicting three sub-constructs of dyspnea: throat closing, chest tightness, and effort. METHODS: We developed the three scales (Throat closing, Chest tightness, and Effort) using focus groups with 25 children. Subsequently, seventy-nine children (29 children with asthma, 30 children with cystic fibrosis. and 20 children who were healthy) aged 6 to 18 years rated each picture in each series, using a 0–10 scale. In addition, each child placed each picture in each series on a 100-cm long Visual Analogue Scale, with the anchors "not at all" and "a lot". RESULTS: Children aged eight years or older rated the scales in the correct order 75% to 98% correctly, but children less than 8 years of age performed unreliably. The mean distance between each consecutive item in each pictorial scale was equal. CONCLUSION: Preliminary results revealed that children aged 8 to 18 years understood and used these three scales measuring throat closing, chest tightness, and effort appropriately. The scales appear to accurately measure the construct of breathlessness, at least at an interval level. Additional research applying these scales to clinical situations is warranted
The direct healthcare costs associated with psychological distress and major depression : A population-based cohort study in Ontario, Canada
The objective of our study was to estimate direct healthcare costs incurred by a population-based sample of people with psychological distress or depression. We used the 2002 Canadian Community Health Survey on Mental Health and Well Being and categorized individuals as having psychological distress using the Kessler-6, major depressive disorder (MDD) using DSM-IV criteria and a comparison group of participants without MDD or psychological distress. Costs in 2013 USD were estimated by linking individuals to health administrative databases and following them until March 31, 2013. Our sample consisted of 9,965 individuals, of whom 651 and 409 had psychological distress and MDD, respectively. Although the age-and-sex adjusted per-capita costs were similarly high among the psychologically distressed (2,791, 3,210, 95% CI: 4,008) compared to the comparison group (2,312, 441 million) were more than twice that for MDD ($210 million) as there was a greater number of people with psychological distress than depression. We found substantial healthcare costs associated with psychological distress and depression, suggesting that psychological distress and MDD have a high cost burden and there may be public health intervention opportunities to relieve distress. Further research examining how individuals with these conditions use the healthcare system may provide insight into the allocation of limited healthcare resources while maintaining high quality care
Exome Sequencing Reveals Comprehensive Genomic Alterations across Eight Cancer Cell Lines
It is well established that genomic alterations play an essential role in oncogenesis, disease progression, and response of tumors to therapeutic intervention. The advances of next-generation sequencing technologies (NGS) provide unprecedented capabilities to scan genomes for changes such as mutations, deletions, and alterations of chromosomal copy number. However, the cost of full-genome sequencing still prevents the routine application of NGS in many areas. Capturing and sequencing the coding exons of genes (the “exome”) can be a cost-effective approach for identifying changes that result in alteration of protein sequences. We applied an exome-sequencing technology (Roche Nimblegen capture paired with 454 sequencing) to identify sequence variation and mutations in eight commonly used cancer cell lines from a variety of tissue origins (A2780, A549, Colo205, GTL16, NCI-H661, MDA-MB468, PC3, and RD). We showed that this technology can accurately identify sequence variation, providing ∼95% concordance with Affymetrix SNP Array 6.0 performed on the same cell lines. Furthermore, we detected 19 of the 21 mutations reported in Sanger COSMIC database for these cell lines. We identified an average of 2,779 potential novel sequence variations/mutations per cell line, of which 1,904 were non-synonymous. Many non-synonymous changes were identified in kinases and known cancer-related genes. In addition we confirmed that the read-depth of exome sequence data can be used to estimate high-level gene amplifications and identify homologous deletions. In summary, we demonstrate that exome sequencing can be a reliable and cost-effective way for identifying alterations in cancer genomes, and we have generated a comprehensive catalogue of genomic alterations in coding regions of eight cancer cell lines. These findings could provide important insights into cancer pathways and mechanisms of resistance to anti-cancer therapies
Characterization of pathogenic germline mutations in human Protein Kinases
Background
Protein Kinases are a superfamily of proteins involved in crucial cellular processes such as cell cycle regulation and signal transduction. Accordingly, they play an important role in cancer biology. To contribute to the study of the relation between kinases and disease we compared pathogenic mutations to neutral mutations as an extension to our previous analysis of cancer somatic mutations. First, we analyzed native and mutant proteins in terms of amino acid composition. Secondly, mutations were characterized according to their potential structural effects and finally, we assessed the location of the different classes of polymorphisms with respect to kinase-relevant positions in terms of subfamily specificity, conservation, accessibility and functional sites.<p></p>
Results
Pathogenic Protein Kinase mutations perturb essential aspects of protein function, including disruption of substrate binding and/or effector recognition at family-specific positions. Interestingly these mutations in Protein Kinases display a tendency to avoid structurally relevant positions, what represents a significant difference with respect to the average distribution of pathogenic mutations in other protein families.<p></p>
Conclusions
Disease-associated mutations display sound differences with respect to neutral mutations: several amino acids are specific of each mutation type, different structural properties characterize each class and the distribution of pathogenic mutations within the consensus structure of the Protein Kinase domain is substantially different to that for non-pathogenic mutations. This preferential distribution confirms previous observations about the functional and structural distribution of the controversial cancer driver and passenger somatic mutations and their use as a proxy for the study of the involvement of somatic mutations in cancer development.<p></p>
- …