33,978 research outputs found

    OSIRISv1.2: A named entity recognition system for sequence variants of genes in biomedical literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required.</p> <p>Results</p> <p>Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes <url>http://ibi.imim.es/osirisform.html</url>. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, <url>http://ibi.imim.es/OSIRISv1.2.html</url>) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented.</p> <p>Conclusion</p> <p>OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.</p

    A cascaded approach to normalising gene mentions in biomedical literature

    Get PDF
    Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where pre-processing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%

    Histiocytoid cardiomyopathy and microphthalmia with linear skin defects syndrome: phenotypes linked by truncating variants in NDUFB11

    Get PDF
    Variants in NDUFB11, which encodes a structural component of complex I of the mitochondrial respiratory chain (MRC), were recently independently reported to cause histiocytoid cardiomyopathy (histiocytoid CM) and microphthalmia with linear skin defects syndrome (MLS syndrome). Here we report an additional case of histiocytoid CM, which carries a de novo nonsense variant in NDUFB11 (ENST00000276062.8: c.262C > T; p.[Arg88*]) identified using whole-exome sequencing (WES) of a family trio. An identical variant has been previously reported in association with MLS syndrome. The case we describe here lacked the diagnostic features of MLS syndrome, but a detailed clinical comparison of the two cases revealed significant phenotypic overlap. Heterozygous variants in HCCS (which encodes an important mitochondrially targeted protein) and COX7B, which, like NDUFB11, encodes a protein of the MRC, have also previously been identified in MLS syndrome including a case with features of both MLS syndrome and histiocytoid CM. However, a systematic review of WES data from previously published histiocytoid CM cases, alongside four additional cases presented here for the first time, did not identify any variants in these genes. We conclude that NDUFB11 variants play a role in the pathogenesis of both histiocytoid CM and MLS and that these disorders are allelic (genetically related)

    Pediatric asthma and autism-genomic perspectives.

    Get PDF
    High-throughput technologies, ranging from microarrays to NexGen sequencing of RNA and genomic DNA, have opened new avenues for exploration of the pathobiology of human disease. Comparisons of the architecture of the genome, identification of mutated or modified sequences, and pre-and post- transcriptional regulation of gene expression as disease specific biomarkers are revolutionizing our understanding of the causes of disease and are guiding the development of new therapies. There is enormous heterogeneity in types of genomic variation that occur in human disease. Some are inherited, while others are the result of new somatic or germline mutations or errors in chromosomal replication. In this review, we provide examples of changes that occur in the human genome in two of the most common chronic pediatric disorders, autism and asthma. The incidence and economic burden of both of these disorders are increasing worldwide. Genomic variations have the potential to serve as biomarkers for personalization of therapy and prediction of outcomes
    • …
    corecore