54 research outputs found

    The strength of co-authorship in gene name disambiguation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A biomedical entity mention in articles and other free texts is often ambiguous. For example, 13% of the gene names (aliases) might refer to more than one gene. The task of Gene Symbol Disambiguation (GSD) – a special case of Word Sense Disambiguation (WSD) – is to assign a unique gene identifier for all identified gene name aliases in biology-related articles. Supervised and unsupervised machine learning WSD techniques have been applied in the biomedical field with promising results. We examine here the utilisation potential of the fact – one of the special features of biological articles – that the authors of the documents are known through graph-based semi-supervised methods for the GSD task.</p> <p>Results</p> <p>Our key hypothesis is that a biologist refers to each particular gene by a fixed gene alias and this holds for the co-authors as well. To make use of the co-authorship information we decided to build the inverse co-author graph on MedLine abstracts. The nodes of the inverse co-author graph are articles and there is an edge between two nodes if and only if the two articles have a mutual author. We introduce here two methods using distances (based on the graph) of abstracts for the GSD task. We found that a disambiguation decision can be made in 85% of cases with an extremely high (99.5%) precision rate just by using information obtained from the inverse co-author graph. We incorporated the co-authorship information into two GSD systems in order to attain full coverage and in experiments our procedure achieved precision of 94.3%, 98.85%, 96.05% and 99.63% on the human, mouse, fly and yeast GSD evaluation sets, respectively.</p> <p>Conclusion</p> <p>Based on the promising results obtained so far we suggest that the co-authorship information and the circumstances of the articles' release (like the title of the journal, the year of publication) can be a crucial building block of any sophisticated similarity measure among biological articles and hence the methods introduced here should be useful for other biomedical natural language processing tasks (like organism or target disease detection) as well.</p

    Disclosing ambiguous gene aliases by automatic literature profiling

    Get PDF
    Submitted by Nuzia Santos ([email protected]) on 2015-01-14T10:55:18Z No. of bitstreams: 1 Disclosing ambiguous gene aliases by automatic.pdf: 217573 bytes, checksum: ce54aa2c4ea49eb989f9e7308d827ce6 (MD5)Approved for entry into archive by Nuzia Santos ([email protected]) on 2015-01-14T10:55:25Z (GMT) No. of bitstreams: 1 Disclosing ambiguous gene aliases by automatic.pdf: 217573 bytes, checksum: ce54aa2c4ea49eb989f9e7308d827ce6 (MD5)Approved for entry into archive by Nuzia Santos ([email protected]) on 2015-01-14T11:01:59Z (GMT) No. of bitstreams: 1 Disclosing ambiguous gene aliases by automatic.pdf: 217573 bytes, checksum: ce54aa2c4ea49eb989f9e7308d827ce6 (MD5)Made available in DSpace on 2015-01-14T11:01:59Z (GMT). No. of bitstreams: 1 Disclosing ambiguous gene aliases by automatic.pdf: 217573 bytes, checksum: ce54aa2c4ea49eb989f9e7308d827ce6 (MD5) Previous issue date: 2010Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brasil/Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Grupo de Genômica e Biologia Computacional. Belo Horizonte, MG, BrasilGlaxoSmithKline Moore Dr. Molecular Discovery Research. Research Triangle Park, NC, USAFundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Centro de Excelência em Bioinformática. Belo Horizonte, MG, Brasil/Fundação Oswaldo Cruz. Centro de Pesquisa René Rachou. Grupo de Genômica e Biologia Computacional. Belo Horizonte, MG, BrasilBackground Retrieving pertinent information from biological scientific literature requires cutting-edge text mining methods which may be able to recognize the meaning of the very ambiguous names of biological entities. Aliases of a gene share a common vocabulary in their respective collections of PubMed abstracts. This may be true even when these aliases are not associated with the same subset of documents. This gene-specific vocabulary defines a unique fingerprint that can be used to disclose ambiguous aliases. The present work describes an original method for automatically assessing the ambiguity levels of gene aliases in large gene terminologies based exclusively in the content of their associated literature. The method can deal with the two major problems restricting the usage of current text mining tools: 1) different names associated with the same gene; and 2) one name associated with multiple genes, or even with non-gene entities. Important, this method does not require training examples. Results Aliases were considered “ambiguous” when their Jaccard distance to the respective official gene symbol was equal or greater than the smallest distance between the official gene symbol and one of the three internal controls (randomly picked unrelated official gene symbols). Otherwise, they were assigned the status of “synonyms”. We evaluated the coherence of the results by comparing the frequencies of the official gene symbols in the text corpora retrieved with their respective “synonyms” or “ambiguous” aliases. Official gene symbols were mentioned in the abstract collections of 42 % (70/165) of their respective synonyms. No official gene symbol occurred in the abstract collections of any of their respective ambiguous aliases. In overall, querying PubMed with official gene symbols and “synonym” aliases allowed a 3.6-fold increase in the number of unique documents retrieved. Conclusions These results confirm that this method is able to distinguish between synonyms and ambiguous gene aliases based exclusively on their vocabulary fingerprint. The approach we describe could be used to enhance the retrieval of relevant literature related to a gen

    Information Discovery on Electronic Health Records Using Authority Flow Techniques

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As the use of electronic health records (EHRs) becomes more widespread, so does the need to search and provide effective information discovery within them. Querying by keyword has emerged as one of the most effective paradigms for searching. Most work in this area is based on traditional Information Retrieval (IR) techniques, where each document is compared individually against the query. We compare the effectiveness of two fundamentally different techniques for keyword search of EHRs.</p> <p>Methods</p> <p>We built two ranking systems. The traditional BM25 system exploits the EHRs' content without regard to association among entities within. The Clinical ObjectRank (CO) system exploits the entities' associations in EHRs using an authority-flow algorithm to discover the most relevant entities. BM25 and CO were deployed on an EHR dataset of the cardiovascular division of Miami Children's Hospital. Using sequences of keywords as queries, sensitivity and specificity were measured by two physicians for a set of 11 queries related to congenital cardiac disease.</p> <p>Results</p> <p>Our pilot evaluation showed that CO outperforms BM25 in terms of sensitivity (65% vs. 38%) by 71% on average, while maintaining the specificity (64% vs. 61%). The evaluation was done by two physicians.</p> <p>Conclusions</p> <p>Authority-flow techniques can greatly improve the detection of relevant information in EHRs and hence deserve further study.</p

    Physiological dynamics of chemosynthetic symbionts in hydrothermal vent snails

    Get PDF
    © The Author(s), 2020. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Breusing, C., Mitchell, J., Delaney, J., Sylva, S. P., Seewald, J. S., Girguis, P. R., & Beinart, R. A. Physiological dynamics of chemosynthetic symbionts in hydrothermal vent snails. Isme Journal, (2020), doi:10.1038/s41396-020-0707-2.Symbioses between invertebrate animals and chemosynthetic bacteria form the basis of hydrothermal vent ecosystems worldwide. In the Lau Basin, deep-sea vent snails of the genus Alviniconcha associate with either Gammaproteobacteria (A. kojimai, A. strummeri) or Campylobacteria (A. boucheti) that use sulfide and/or hydrogen as energy sources. While the A. boucheti host–symbiont combination (holobiont) dominates at vents with higher concentrations of sulfide and hydrogen, the A. kojimai and A. strummeri holobionts are more abundant at sites with lower concentrations of these reductants. We posit that adaptive differences in symbiont physiology and gene regulation might influence the observed niche partitioning between host taxa. To test this hypothesis, we used high-pressure respirometers to measure symbiont metabolic rates and examine changes in gene expression among holobionts exposed to in situ concentrations of hydrogen (H2: ~25 µM) or hydrogen sulfide (H2S: ~120 µM). The campylobacterial symbiont exhibited the lowest rate of H2S oxidation but the highest rate of H2 oxidation, with fewer transcriptional changes and less carbon fixation relative to the gammaproteobacterial symbionts under each experimental condition. These data reveal potential physiological adaptations among symbiont types, which may account for the observed net differences in metabolic activity and contribute to the observed niche segregation among holobionts.We thank the Schmidt Ocean Institute, the crew of the R/V Falkor and the pilots of the ROV ROPOS for facilitating the sample collections and shipboard experiments, and the Broad Institute Microbial ‘Omics Core for preparing and sequencing the transcriptomic libraries. This material is based in part upon work supported by the National Science Foundation under Grant Numbers NSF OCE-1536653 (to PRG), OCE-1536331 (to RAB and JSS), OCE-1819530 and OCE-1736932 (to RAB)

    Critical Transition in Tissue Homeostasis Accompanies Murine Lung Senescence

    Get PDF
    BACKGROUND: Respiratory dysfunction is a major contributor to morbidity and mortality in aged populations. The susceptibility to pulmonary insults is attributed to "low pulmonary reserve", ostensibly reflecting a combination of age-related musculoskeletal, immunologic and intrinsic pulmonary dysfunction. METHODS/PRINCIPAL FINDINGS: Using a murine model of the aging lung, senescent DBA/2 mice, we correlated a longitudinal survey of airspace size and injury measures with a transcriptome from the aging lung at 2, 4, 8, 12, 16 and 20 months of age. Morphometric analysis demonstrated a nonlinear pattern of airspace caliber enlargement with a critical transition occurring between 8 and 12 months of age marked by an initial increase in oxidative stress, cell death and elastase activation which is soon followed by inflammatory cell infiltration, immune complex deposition and the onset of airspace enlargement. The temporally correlative transcriptome showed exuberant induction of immunoglobulin genes coincident with airspace enlargement. Immunohistochemistry, ELISA analysis and flow cytometry demonstrated increased immunoglobulin deposition in the lung associated with a contemporaneous increase in activated B-cells expressing high levels of TLR4 (toll receptor 4) and CD86 and macrophages during midlife. These midlife changes culminate in progressive airspace enlargement during late life stages. CONCLUSION/SIGNIFICANCE: Our findings establish that a tissue-specific aging program is evident during a presenescent interval which involves early oxidative stress, cell death and elastase activation, followed by B lymphocyte and macrophage expansion/activation. This sequence heralds the progression to overt airspace enlargement in the aged lung. These signature events, during middle age, indicate that early stages of the aging immune system may have important correlates in the maintenance of tissue morphology. We further show that time-course analyses of aging models, when informed by structural surveys, can reveal nonintuitive signatures of organ-specific aging pathology

    Gene Characterization Index: Assessing the Depth of Gene Annotation

    Get PDF
    We introduce the Gene Characterization Index, a bioinformatics method for scoring the extent to which a protein-encoding gene is functionally described. Inherently a reflection of human perception, the Gene Characterization Index is applied for assessing the characterization status of individual genes, thus serving the advancement of both genome annotation and applied genomics research by rapid and unbiased identification of groups of uncharacterized genes for diverse applications such as directed functional studies and delineation of novel drug targets.The scoring procedure is based on a global survey of researchers, who assigned characterization scores from 1 (poor) to 10 (extensive) for a sample of genes based on major online resources. By evaluating the survey as training data, we developed a bioinformatics procedure to assign gene characterization scores to all genes in the human genome. We analyzed snapshots of functional genome annotation over a period of 6 years to assess temporal changes reflected by the increase of the average Gene Characterization Index. Applying the Gene Characterization Index to genes within pharmaceutically relevant classes, we confirmed known drug targets as high-scoring genes and revealed potentially interesting novel targets with low characterization indexes. Removing known drug targets and genes linked to sequence-related patent filings from the entirety of indexed genes, we identified sets of low-scoring genes particularly suited for further experimental investigation.The Gene Characterization Index is intended to serve as a tool to the scientific community and granting agencies for focusing resources and efforts on unexplored areas of the genome. The Gene Characterization Index is available from http://cisreg.ca/gci/

    Predicting Concentrations of Organic Chemicals in Fish by Using Toxicokinetic Models

    Get PDF
    Quantification of chemical toxicity continues to be generally based on measured external concentrations. Yet, internal chemical concentrations have been suggested to be a more suitable parameter. To better understand the relationship between the external and internal concentrations of chemicals in fish, and to quantify internal concentrations we compared three. toxicokinetic (TK) models with each other and with literature data of measured concentrations of 39 chemicals. Two one, compartment models, together with the physiologically based toxicokinetic (PBTK) model, in which we improved the treatment of lipids, were used to predict concentrations of organic chemicals in two fish species: rainbow trout (Oncorhynchus mykiss) and fathead minnow (Pimephales promelas). All models predicted the measured internal concentrations in fish within I order of magnitude for at least 68% of the chemicals. Furthermore, the PBTK model outperformed the one-compartment models with respect to simulating chemical concentrations in the whole body (at least 88% of internal concentrations were predicted within 1 order of magnitude using the PBTK model). All the models can be used to predict concentrations in different fish species without additional experiments. However, further development of TK models is required for polar, ionizable, and easily biotransformed compounds

    Short-Lived Trace Gases in the Surface Ocean and the Atmosphere

    Get PDF
    The two-way exchange of trace gases between the ocean and the atmosphere is important for both the chemistry and physics of the atmosphere and the biogeochemistry of the oceans, including the global cycling of elements. Here we review these exchanges and their importance for a range of gases whose lifetimes are generally short compared to the main greenhouse gases and which are, in most cases, more reactive than them. Gases considered include sulphur and related compounds, organohalogens, non-methane hydrocarbons, ozone, ammonia and related compounds, hydrogen and carbon monoxide. Finally, we stress the interactivity of the system, the importance of process understanding for modeling, the need for more extensive field measurements and their better seasonal coverage, the importance of inter-calibration exercises and finally the need to show the importance of air-sea exchanges for global cycling and how the field fits into the broader context of Earth System Science
    corecore