65 research outputs found

    Combined SVM-CRFs for Biological Named Entity Recognition with Maximal Bidirectional Squeezing

    Get PDF
    Biological named entity recognition, the identification of biological terms in text, is essential for biomedical information extraction. Machine learning-based approaches have been widely applied in this area. However, the recognition performance of current approaches could still be improved. Our novel approach is to combine support vector machines (SVMs) and conditional random fields (CRFs), which can complement and facilitate each other. During the hybrid process, we use SVM to separate biological terms from non-biological terms, before we use CRFs to determine the types of biological terms, which makes full use of the power of SVM as a binary-class classifier and the data-labeling capacity of CRFs. We then merge the results of SVM and CRFs. To remove any inconsistencies that might result from the merging, we develop a useful algorithm and apply two rules. To ensure biological terms with a maximum length are identified, we propose a maximal bidirectional squeezing approach that finds the longest term. We also add a positive gain to rare events to reinforce their probability and avoid bias. Our approach will also gradually extend the context so more contextual information can be included. We examined the performance of four approaches with GENIA corpus and JNLPBA04 data. The combination of SVM and CRFs improved performance. The macro-precision, macro-recall, and macro-F1 of the SVM-CRFs hybrid approach surpassed conventional SVM and CRFs. After applying the new algorithms, the macro-F1 reached 91.67% with the GENIA corpus and 84.04% with the JNLPBA04 data

    Reuse of terminological resources for efficient ontological engineering in Life Sciences

    Get PDF
    This paper is intended to explore how to use terminological resources for ontology engineering. Nowadays there are several biomedical ontologies describing overlapping domains, but there is not a clear correspondence between the concepts that are supposed to be equivalent or just similar. These resources are quite precious but their integration and further development are expensive. Terminologies may support the ontological development in several stages of the lifecycle of the ontology; e.g. ontology integration. In this paper we investigate the use of terminological resources during the ontology lifecycle. We claim that the proper creation and use of a shared thesaurus is a cornerstone for the successful application of the Semantic Web technology within life sciences. Moreover, we have applied our approach to a real scenario, the Health-e-Child (HeC) project, and we have evaluated the impact of filtering and re-organizing several resources. As a result, we have created a reference thesaurus for this project, named HeCTh

    Genome-Wide Interaction Analysis with DASH Diet Score Identified Novel Loci for Systolic Blood Pressure.

    Get PDF
    OBJECTIVE: We examined interactions between genotype and a Dietary Approaches to Stop Hypertension (DASH) diet score in relation to systolic blood pressure (SBP). METHODS: We analyzed up to 9,420,585 biallelic imputed single nucleotide polymorphisms (SNPs) in up to 127,282 individuals of six population groups (91% of European population) from the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium (CHARGE; n=35,660) and UK Biobank (n=91,622) and performed European population-specific and cross-population meta-analyses. RESULTS: We identified three loci in European-specific analyses and an additional four loci in cross-population analyses at P for interaction < 5e-8. We observed a consistent interaction between rs117878928 at 15q25.1 (minor allele frequency = 0.03) and the DASH diet score (P for interaction = 4e-8; P for heterogeneity = 0.35) in European population, where the interaction effect size was 0.42±0.09 mm Hg (P for interaction = 9.4e-7) and 0.20±0.06 mm Hg (P for interaction = 0.001) in CHARGE and the UK Biobank, respectively. The 1 Mb region surrounding rs117878928 was enriched with cis-expression quantitative trait loci (eQTL) variants (P = 4e-273) and cis-DNA methylation quantitative trait loci (mQTL) variants (P = 1e-300). While the closest gene for rs117878928 is MTHFS, the highest narrow sense heritability accounted by SNPs potentially interacting with the DASH diet score in this locus was for gene ST20 at 15q25.1. CONCLUSION: We demonstrated gene-DASH diet score interaction effects on SBP in several loci. Studies with larger diverse populations are needed to validate our findings

    Deafness mutation mining using regular expression based pattern matching

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>While keyword based queries of databases such as Pubmed are frequently of great utility, the ability to use regular expressions in place of a keyword can often improve the results output by such databases. Regular expressions can allow for the identification of element types that cannot be readily specified by a single keyword and can allow for different words with similar character sequences to be distinguished.</p> <p>Results</p> <p>A Perl based utility was developed to allow the use of regular expressions in Pubmed searches, thereby improving the accuracy of the searches.</p> <p>Conclusion</p> <p>This utility was then utilized to create a comprehensive listing of all DFN deafness mutations discussed in Pubmed records containing the keywords "human ear".</p

    An Analysis of the Abstracts Presented at the Annual Meetings of the Society for Neuroscience from 2001 to 2006

    Get PDF
    Annual meeting abstracts published by scientific societies often contain rich arrays of information that can be computationally mined and distilled to elucidate the state and dynamics of the subject field. We extracted and processed abstract data from the Society for Neuroscience (SFN) annual meeting abstracts during the period 2001–2006 in order to gain an objective view of contemporary neuroscience. An important first step in the process was the application of data cleaning and disambiguation methods to construct a unified database, since the data were too noisy to be of full utility in the raw form initially available. Using natural language processing, text mining, and other data analysis techniques, we then examined the demographics and structure of the scientific collaboration network, the dynamics of the field over time, major research trends, and the structure of the sources of research funding. Some interesting findings include a high geographical concentration of neuroscience research in the north eastern United States, a surprisingly large transient population (66% of the authors appear in only one out of the six studied years), the central role played by the study of neurodegenerative disorders in the neuroscience community, and an apparent growth of behavioral/systems neuroscience with a corresponding shrinkage of cellular/molecular neuroscience over the six year period. The results from this work will prove useful for scientists, policy makers, and funding agencies seeking to gain a complete and unbiased picture of the community structure and body of knowledge encapsulated by a specific scientific domain

    Interventions to Promote Fundamental Movement Skills in Childcare and Kindergarten: A Systematic Review and Meta-Analysis

    Get PDF

    Plant-based diets are associated with a lower risk of incident cardiovascular disease, cardiovascular disease mortality, and all-cause mortality in a general population of middle-aged adults

    No full text
    Background Previous studies have documented the cardiometabolic health benefits of plant-based diets; however, these studies were conducted in selected study populations that had narrow generalizability. Methods and Results We used data from a community-based cohort of middle-aged adults (n=12 168) in the ARIC (Atherosclerosis Risk in Communities) study who were followed up from 1987 through 2016. Participants' diet was classified using 4 diet indexes. In the overall plant-based diet index and provegetarian diet index, higher intakes of all or selected plant foods received higher scores; in the healthy plant-based diet index, higher intakes of only the healthy plant foods received higher scores; in the less healthy plant-based diet index, higher intakes of only the less healthy plant foods received higher scores. In all indexes, higher intakes of animal foods received lower scores. Results from Cox proportional hazards models showed that participants in the highest versus lowest quintile for adherence to overall plant-based diet index or provegetarian diet had a 16%, 31% to 32%, and 18% to 25% lower risk of cardiovascular disease, cardiovascular disease mortality, and all-cause mortality, respectively, after adjusting for important confounders (all P<0.05 for trend). Higher adherence to a healthy plant-based diet index was associated with a 19% and 11% lower risk of cardiovascular disease mortality and all-cause mortality, respectively, but not incident cardiovascular disease (P<0.05 for trend). No associations were observed between the less healthy plant-based diet index and the outcomes. Conclusions Diets higher in plant foods and lower in animal foods were associated with a lower risk of cardiovascular morbidity and mortality in a general population
    • …
    corecore