219 research outputs found

    A Profile-Based Method for Authorship Verification

    Get PDF
    Abstract. Authorship verification is one of the most challenging tasks in stylebased text categorization. Given a set of documents, all by the same author, and another document of unknown authorship the question is whether or not the latter is also by that author. Recently, in the framework of the PAN-2013 evaluation lab, a competition in authorship verification was organized and the vast majority of submitted approaches, including the best performing models, followed the instance-based paradigm where each text sample by one author is treated separately. In this paper, we show that the profile-based paradigm (where all samples by one author are treated cumulatively) can be very effective surpassing the performance of PAN-2013 winners without using any information from external sources. The proposed approach is fully-trainable and we demonstrate an appropriate tuning of parameter settings for PAN-2013 corpora achieving accurate answers especially when the cost of false negatives is high.

    Chat mining for gender prediction

    Get PDF
    The aim of this paper is to investigate the feasibility of predicting the gender of a text document's author using linguistic evidence. For this purpose, term- and style-based classification techniques are evaluated over a large collection of chat messages. Prediction accuracies up to 84.2% are achieved, illustrating the applicability of these techniques to gender prediction. Moreover, the reverse problem is exploited, and the effect of gender on the writing style is discussed. © Springer-Verlag Berlin Heidelberg 2006

    Association of untargeted urinary metabolomics and lung cancer risk among never-smoking women in China

    Get PDF
    Importance Chinese women have the highest rate of lung cancer among female never-smokers in the world, and the etiology is poorly understood. Objective To assess the association between metabolomics and lung cancer risk among never-smoking women. Design, Setting, and Participants This nested case-control study included 275 never-smoking female patients with lung cancer and 289 never-smoking cancer-free control participants from the prospective Shanghai Women’s Health Study recruited from December 28, 1996, to May 23, 2000. Validated food frequency questionnaires were used for the collection of dietary information. Metabolomic analysis was conducted from November 13, 2015, to January 6, 2016. Data analysis was conducted from January 6, 2016, to November 29, 2018. Exposures Untargeted ultra-high-performance liquid chromatography–tandem mass spectrometry and nuclear magnetic resonance metabolomic profiles were characterized using prediagnosis urine samples. A total of 39 416 metabolites were measured. Main Outcomes and Measures Incident lung cancer. Results Among the 564 women, those who developed lung cancer (275 participants; median [interquartile range] age, 61.0 [52-65] years) and those who did not develop lung cancer (289 participants; median [interquartile range] age, 62.0 [53-66] years) at follow-up (median [interquartile range] follow-up, 10.9 [9.0-11.7] years) were similar in terms of their secondhand smoke exposure, history of respiratory diseases, and body mass index. A peak metabolite, identified as 5-methyl-2-furoic acid, was significantly associated with lower lung cancer risk (odds ratio, 0.57 [95% CI, 0.46-0.72]; P < .001; false discovery rate = 0.039). Furthermore, this peak was weakly correlated with self-reported dietary soy intake (ρ = 0.21; P < .001). Increasing tertiles of this metabolite were associated with lower lung cancer risk (in comparison with first tertile, odds ratio for second tertile, 0.52 [95% CI, 0.34-0.80]; and odds ratio for third tertile, 0.46 [95% CI, 0.30-0.70]), and the association was consistent across different histological subtypes and follow-up times. Additionally, metabolic pathway analysis found several systemic biological alterations that were associated with lung cancer risk, including 1-carbon metabolism, nucleotide metabolism, oxidative stress, and inflammation. Conclusions and Relevance This prospective study of the untargeted urinary metabolome and lung cancer among never-smoking women in China provides support for the hypothesis that soy-based metabolites are associated with lower lung cancer risk in never-smoking women and suggests that biological processes linked to air pollution may be associated with higher lung cancer risk in this population

    Advances in ab-initio theory of Multiferroics. Materials and mechanisms: modelling and understanding

    Full text link
    Within the broad class of multiferroics (compounds showing a coexistence of magnetism and ferroelectricity), we focus on the subclass of "improper electronic ferroelectrics", i.e. correlated materials where electronic degrees of freedom (such as spin, charge or orbital) drive ferroelectricity. In particular, in spin-induced ferroelectrics, there is not only a {\em coexistence} of the two intriguing magnetic and dipolar orders; rather, there is such an intimate link that one drives the other, suggesting a giant magnetoelectric coupling. Via first-principles approaches based on density functional theory, we review the microscopic mechanisms at the basis of multiferroicity in several compounds, ranging from transition metal oxides to organic multiferroics (MFs) to organic-inorganic hybrids (i.e. metal-organic frameworks, MOFs)Comment: 22 pages, 9 figure

    Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure

    Get PDF
    Heart failure (HF) is a leading cause of morbidity and mortality worldwide. A small proportion of HF cases are attributable to monogenic cardiomyopathies and existing genome-wide association studies (GWAS) have yielded only limited insights, leaving the observed heritability of HF largely unexplained. We report results from a GWAS meta-analysis of HF comprising 47,309 cases and 930,014 controls. Twelve independent variants at 11 genomic loci are associated with HF, all of which demonstrate one or more associations with coronary artery disease (CAD), atrial fibrillation, or reduced left ventricular function, suggesting shared genetic aetiology. Functional analysis of non-CAD-associated loci implicate genes involved in cardiac development (MYOZ1, SYNPO2L), protein homoeostasis (BAG3), and cellular senescence (CDKN1A). Mendelian randomisation analysis supports causal roles for several HF risk factors, and demonstrates CAD-independent effects for atrial fibrillation, body mass index, and hypertension. These findings extend our knowledge of the pathways underlying HF and may inform new therapeutic strategies

    Novel genetic loci associated with hippocampal volume

    Get PDF
    The hippocampal formation is a brain structure integrally involved in episodic memory, spatial navigation, cognition and stress responsiveness. Structural abnormalities in hippocampal volume and shape are found in several common neuropsychiatric disorders. To identify the genetic underpinnings of hippocampal structure here we perform a genome-wide association study (GWAS) of 33,536 individuals and discover six independent loci significantly associated with hippocampal volume, four of them novel. Of the novel loci, three lie within genes (ASTN2, DPP4 and MAST4) and one is found 200 kb upstream of SHH. A hippocampal subfield analysis shows that a locus within the MSRB3 gene shows evidence of a localized effect along the dentate gyrus, subiculum, CA1 and fissure. Further, we show that genetic variants associated with decreased hippocampal volume are also associated with increased risk for Alzheimer's disease (rg =-0.155). Our findings suggest novel biological pathways through which human genetic variation influences hippocampal volume and risk for neuropsychiatric illness

    The genetic architecture of the human cerebral cortex

    Get PDF
    INTRODUCTION The cerebral cortex underlies our complex cognitive capabilities. Variations in human cortical surface area and thickness are associated with neurological, psychological, and behavioral traits and can be measured in vivo by magnetic resonance imaging (MRI). Studies in model organisms have identified genes that influence cortical structure, but little is known about common genetic variants that affect human cortical structure. RATIONALE To identify genetic variants associated with human cortical structure at both global and regional levels, we conducted a genome-wide association meta-analysis of brain MRI data from 51,665 individuals across 60 cohorts. We analyzed the surface area and average thickness of the whole cortex and 34 cortical regions with known functional specializations. RESULTS We identified 306 nominally genome-wide significant loci (P < 5 × 10−8) associated with cortical structure in a discovery sample of 33,992 participants of European ancestry. Of the 299 loci for which replication data were available, 241 loci influencing surface area and 14 influencing thickness remained significant after replication, with 199 loci passing multiple testing correction (P < 8.3 × 10−10; 187 influencing surface area and 12 influencing thickness). Common genetic variants explained 34% (SE = 3%) of the variation in total surface area and 26% (SE = 2%) in average thickness; surface area and thickness showed a negative genetic correlation (rG = −0.32, SE = 0.05, P = 6.5 × 10−12), which suggests that genetic influences have opposing effects on surface area and thickness. Bioinformatic analyses showed that total surface area is influenced by genetic variants that alter gene regulatory activity in neural progenitor cells during fetal development. By contrast, average thickness is influenced by active regulatory elements in adult brain samples, which may reflect processes that occur after mid-fetal development, such as myelination, branching, or pruning. When considered together, these results support the radial unit hypothesis that different developmental mechanisms promote surface area expansion and increases in thickness. To identify specific genetic influences on individual cortical regions, we controlled for global measures (total surface area or average thickness) in the regional analyses. After multiple testing correction, we identified 175 loci that influence regional surface area and 10 that influence regional thickness. Loci that affect regional surface area cluster near genes involved in the Wnt signaling pathway, which is known to influence areal identity. We observed significant positive genetic correlations and evidence of bidirectional causation of total surface area with both general cognitive functioning and educational attainment. We found additional positive genetic correlations between total surface area and Parkinson’s disease but did not find evidence of causation. Negative genetic correlations were evident between total surface area and insomnia, attention deficit hyperactivity disorder, depressive symptoms, major depressive disorder, and neuroticism. CONCLUSION This large-scale collaborative work enhances our understanding of the genetic architecture of the human cerebral cortex and its regional patterning. The highly polygenic architecture of the cortex suggests that distinct genes are involved in the development of specific cortical areas. Moreover, we find evidence that brain structure is a key phenotype along the causal pathway that leads from genetic variation to differences in general cognitive function
    • 

    corecore