351 research outputs found

    Looking at Cerebellar Malformations through Text-Mined Interactomes of Mice and Humans

    Get PDF
    WE HAVE GENERATED AND MADE PUBLICLY AVAILABLE TWO VERY LARGE NETWORKS OF MOLECULAR INTERACTIONS: 49,493 mouse-specific and 52,518 human-specific interactions. These networks were generated through automated analysis of 368,331 full-text research articles and 8,039,972 article abstracts from the PubMed database, using the GeneWays system. Our networks cover a wide spectrum of molecular interactions, such as bind, phosphorylate, glycosylate, and activate; 207 of these interaction types occur more than 1,000 times in our unfiltered, multi-species data set. Because mouse and human genes are linked through an orthological relationship, human and mouse networks are amenable to straightforward, joint computational analysis. Using our newly generated networks and known associations between mouse genes and cerebellar malformation phenotypes, we predicted a number of new associations between genes and five cerebellar phenotypes (small cerebellum, absent cerebellum, cerebellar degeneration, abnormal foliation, and abnormal vermis). Using a battery of statistical tests, we showed that genes that are associated with cerebellar phenotypes tend to form compact network clusters. Further, we observed that cerebellar malformation phenotypes tend to be associated with highly connected genes. This tendency was stronger for developmental phenotypes and weaker for cerebellar degeneration

    Disclosing ambiguous gene aliases by automatic literature profiling

    Get PDF
    Submitted by Nuzia Santos ([email protected]) on 2015-01-14T10:55:18Z No. of bitstreams: 1 Disclosing ambiguous gene aliases by automatic.pdf: 217573 bytes, checksum: ce54aa2c4ea49eb989f9e7308d827ce6 (MD5)Approved for entry into archive by Nuzia Santos ([email protected]) on 2015-01-14T10:55:25Z (GMT) No. of bitstreams: 1 Disclosing ambiguous gene aliases by automatic.pdf: 217573 bytes, checksum: ce54aa2c4ea49eb989f9e7308d827ce6 (MD5)Approved for entry into archive by Nuzia Santos ([email protected]) on 2015-01-14T11:01:59Z (GMT) No. of bitstreams: 1 Disclosing ambiguous gene aliases by automatic.pdf: 217573 bytes, checksum: ce54aa2c4ea49eb989f9e7308d827ce6 (MD5)Made available in DSpace on 2015-01-14T11:01:59Z (GMT). No. of bitstreams: 1 Disclosing ambiguous gene aliases by automatic.pdf: 217573 bytes, checksum: ce54aa2c4ea49eb989f9e7308d827ce6 (MD5) Previous issue date: 2010Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brasil/Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Grupo de Genômica e Biologia Computacional. Belo Horizonte, MG, BrasilGlaxoSmithKline Moore Dr. Molecular Discovery Research. Research Triangle Park, NC, USAFundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brasil/Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Grupo de Genômica e Biologia Computacional. Belo Horizonte, MG, BrasilBackground Retrieving pertinent information from biological scientific literature requires cutting-edge text mining methods which may be able to recognize the meaning of the very ambiguous names of biological entities. Aliases of a gene share a common vocabulary in their respective collections of PubMed abstracts. This may be true even when these aliases are not associated with the same subset of documents. This gene-specific vocabulary defines a unique fingerprint that can be used to disclose ambiguous aliases. The present work describes an original method for automatically assessing the ambiguity levels of gene aliases in large gene terminologies based exclusively in the content of their associated literature. The method can deal with the two major problems restricting the usage of current text mining tools: 1) different names associated with the same gene; and 2) one name associated with multiple genes, or even with non-gene entities. Important, this method does not require training examples. Results Aliases were considered “ambiguous” when their Jaccard distance to the respective official gene symbol was equal or greater than the smallest distance between the official gene symbol and one of the three internal controls (randomly picked unrelated official gene symbols). Otherwise, they were assigned the status of “synonyms”. We evaluated the coherence of the results by comparing the frequencies of the official gene symbols in the text corpora retrieved with their respective “synonyms” or “ambiguous” aliases. Official gene symbols were mentioned in the abstract collections of 42 % (70/165) of their respective synonyms. No official gene symbol occurred in the abstract collections of any of their respective ambiguous aliases. In overall, querying PubMed with official gene symbols and “synonym” aliases allowed a 3.6-fold increase in the number of unique documents retrieved. Conclusions These results confirm that this method is able to distinguish between synonyms and ambiguous gene aliases based exclusively on their vocabulary fingerprint. The approach we describe could be used to enhance the retrieval of relevant literature related to a gen

    Germline MC1R status influences somatic mutation burden in melanoma

    Get PDF
    The major genetic determinants of cutaneous melanoma risk in the general population are disruptive variants (R alleles) in the melanocortin 1 receptor (MC1R) gene. These alleles are also linked to red hair, freckling, and sun sensitivity, all of which are known melanoma phenotypic risk factors. Here we report that in melanomas and for somatic C>T mutations, a signature linked to sun exposure, the expected single-nucleotide variant count associated with the presence of an R allele is estimated to be 42% (95% CI, 15-76%) higher than that among persons without an R allele. This figure is comparable to the expected mutational burden associated with an additional 21 years of age. We also find significant and similar enrichment of non-C>T mutation classes supporting a role for additional mutagenic processes in melanoma development in individuals carrying R alleles

    RAC1(P29S) Induces a Mesenchymal Phenotypic Switch via Serum Response Factor to Promote Melanoma Development and Therapy Resistance

    Get PDF
    RAC1 P29 is the third most commonly mutated codon in human cutaneous melanoma, after BRAF V600 and NRAS Q61. Here, we study the role of RAC1P29S in melanoma development and reveal that RAC1P29S activates PAK, AKT, and a gene expression program initiated by the SRF/MRTF transcriptional pathway, which results in a melanocytic to mesenchymal phenotypic switch. Mice with ubiquitous expression of RAC1P29S from the endogenous locus develop lymphoma. When expressed only in melanocytes, RAC1P29S cooperates with oncogenic BRAF or with NF1-loss to promote tumorigenesis. RAC1P29S also drives resistance to BRAF inhibitors, which is reversed by SRF/MRTF inhibitors. These findings establish RAC1P29S as a promoter of melanoma initiation and mediator of therapy resistance, while identifying SRF/MRTF as a potential therapeutic target

    Normalizing biomedical terms by minimizing ambiguity and variability

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach.</p> <p>Results</p> <p>We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS.</p> <p>Conclusions</p> <p>The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known.</p

    Facilitating the development of controlled vocabularies for metabolomics technologies with text mining

    Get PDF
    BACKGROUND: Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to construct these resources manually. RESULTS: We describe a methodology for rapid development of controlled vocabularies, a study originally motivated by the needs for vocabularies describing metabolomics technologies. We present case studies involving two controlled vocabularies (for nuclear magnetic resonance spectroscopy and gas chromatography) whose development is currently underway as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms. A total of 5,699 and 2,612 new terms were acquired automatically from the literature. The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts. CONCLUSIONS: We suggest a text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature. We adopted an integrative approach, combining relatively generic software and data resources for time- and cost-effective development of a text mining tool for expansion of controlled vocabularies across various domains, as a practical alternative to both manual term collection and tailor-made named entity recognition methods
    corecore