208 research outputs found

    Long-term quality assurance of fMRI and MRS on a 3.0T clinical scanner

    Get PDF
    Functional MRI (fMRI) and Magnetic Resonance Spectroscopy (MRS) are being increasingly used in clinical protocols. Subsequenly it is crucial to develop a routine quality assurance protocol (QA)of both techniques. This work describes a long-term variability study, as apart of the QA of fMRI and MRS on our institution clinical 3.0 T MR scanner

    EnvMine: A text-mining system for the automatic extraction of contextual information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles). So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations) from textual sources of any kind.</p> <p>Results</p> <p>EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved) of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings.</p> <p>Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude), thus allowing the calculation of distance between the individual locations.</p> <p>Conclusion</p> <p>EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical variables of sampling sites, thus facilitating the performance of ecological analyses. EnvMine can also help in the development of standards for the annotation of environmental features.</p

    Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Pharmacogenomics studies the relationship between genetic variation and the variation in drug response phenotypes. The field is rapidly gaining importance: it promises drugs targeted to particular subpopulations based on genetic background. The pharmacogenomics literature has expanded rapidly, but is dispersed in many journals. It is challenging, therefore, to identify important associations between drugs and molecular entities – particularly genes and gene variants, and thus these critical connections are often lost. Text mining techniques can allow us to convert the free-style text to a computable, searchable format in which pharmacogenomic concepts (such as genes, drugs, polymorphisms, and diseases) are identified, and important links between these concepts are recorded. Availability of full text articles as input into text mining engines is key, as literature abstracts often do not contain sufficient information to identify these pharmacogenomic associations.</p> <p>Results</p> <p>Thus, building on a tool called Textpresso, we have created the Pharmspresso tool to assist in identifying important pharmacogenomic facts in full text articles. Pharmspresso parses text to find references to human genes, polymorphisms, drugs and diseases and their relationships. It presents these as a series of marked-up text fragments, in which key concepts are visually highlighted. To evaluate Pharmspresso, we used a gold standard of 45 human-curated articles. Pharmspresso identified 78%, 61%, and 74% of target gene, polymorphism, and drug concepts, respectively.</p> <p>Conclusion</p> <p>Pharmspresso is a text analysis tool that extracts pharmacogenomic concepts from the literature automatically and thus captures our current understanding of gene-drug interactions in a computable form. We have made Pharmspresso available at <url>http://pharmspresso.stanford.edu</url>.</p

    Clustering of gene expression data: performance and similarity analysis

    Get PDF
    BACKGROUND: DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in today's bioinformatics research. RESULTS: In this paper we first experimentally study three major clustering algorithms: Hierarchical Clustering (HC), Self-Organizing Map (SOM), and Self Organizing Tree Algorithm (SOTA) using Yeast Saccharomyces cerevisiae gene expression data, and compare their performance. We then introduce Cluster Diff, a new data mining tool, to conduct the similarity analysis of clusters generated by different algorithms. The performance study shows that SOTA is more efficient than SOM while HC is the least efficient. The results of similarity analysis show that when given a target cluster, the Cluster Diff can efficiently determine the closest match from a set of clusters. Therefore, it is an effective approach for evaluating different clustering algorithms. CONCLUSION: HC methods allow a visual, convenient representation of genes. However, they are neither robust nor efficient. The SOM is more robust against noise. A disadvantage of SOM is that the number of clusters has to be fixed beforehand. The SOTA combines the advantages of both hierarchical and SOM clustering. It allows a visual representation of the clusters and their structure and is not sensitive to noises. The SOTA is also more flexible than the other two clustering methods. By using our data mining tool, Cluster Diff, it is possible to analyze the similarity of clusters generated by different algorithms and thereby enable comparisons of different clustering methods

    Defining functional distances over Gene Ontology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A fundamental problem when trying to define the functional relationships between proteins is the difficulty in quantifying functional similarities, even when well-structured ontologies exist regarding the activity of proteins (i.e. 'gene ontology' -GO-). However, functional metrics can overcome the problems in the comparing and evaluating functional assignments and predictions. As a reference of proximity, previous approaches to compare GO terms considered linkage in terms of ontology weighted by a probability distribution that balances the non-uniform 'richness' of different parts of the Direct Acyclic Graph. Here, we have followed a different approach to quantify functional similarities between GO terms.</p> <p>Results</p> <p>We propose a new method to derive 'functional distances' between GO terms that is based on the simultaneous occurrence of terms in the same set of Interpro entries, instead of relying on the structure of the GO. The coincidence of GO terms reveals natural biological links between the GO functions and defines a distance model <it>D</it><sub><it>f </it></sub>which fulfils the properties of a Metric Space. The distances obtained in this way can be represented as a hierarchical 'Functional Tree'.</p> <p>Conclusion</p> <p>The method proposed provides a new definition of distance that enables the similarity between GO terms to be quantified. Additionally, the 'Functional Tree' defines groups with biological meaning enhancing its utility for protein function comparison and prediction. Finally, this approach could be for function-based protein searches in databases, and for analysing the gene clusters produced by DNA array experiments.</p

    El poder dels productors primaris unicel·lulars

    Get PDF
    3 pages, 1 figure[EN] Marine phytoplankton, including cyanobacteria and microalgae, dominates primary production across two thirds of the earth’s surface, sustaining virtually all marine life and exerting a fundamental control over global climate through carbon sequestration into the deep ocean. These unicellular photoautotrophs are responsible for roughly 50% of global net primary production, which is equivalent to producing 50 gigatons of organic carbon (C) per year (about 140 million t per day). […][ES] El fitoplancton marino, que incluye tanto a las cianobacterias como a las microalgas, domina la producción primaria en dos tercios de la superficie de la Tierra, sustentando prácticamente toda la vida marina y ejerciendo un control fundamental sobre el clima global mediante el secuestro de carbono en las profundidades del océano. Estos productores primarios unicelulares son responsables de aproximadamente el 50% de la producción primaria neta mundial, lo que equivale a producir 50 gigatoneladas de carbono orgánico (C) al año (alrededor de 140 millones de toneladas al día). […][CAT] El fitoplàncton marí, que inclou tant als cianobacteris com a les microalgues, domina la producció primària en dos terços de la superfície de la Terra, sustentant pràcticament tota la vida marina i exercint un control fonamental sobre el clima global mitjançant el segrest de carboni en les profunditats de l’oceà. Aquests productors primaris unicel·lulars són responsables d’aproximadament el 50% de la producció primària neta mundial, la qual cosa equival a produir 50 gigatones de carboni orgànic (C) l’any (al voltant de 140 milions de tones al dia). […]The ideas embodied in this essay are part of the objectives of the PRODIGIO project “Developing early warning systems for improved microalgae PROduction and anaerobic DIGgestIOn”. The PRODIGIO project has received funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement no. 101007006Peer reviewe

    Identification of conserved gene clusters in multiple genomes based on synteny and homology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Uncovering the relationship between the conserved chromosomal segments and the functional relatedness of elements within these segments is an important question in computational genomics. We build upon the series of works on <it>gene teams</it> and <it>homology teams.</it></p> <p>Results</p> <p>Our primary contribution is a local sliding-window SYNS (SYNtenic teamS) algorithm that refines an existing family structure into orthologous sub-families by analyzing the neighborhoods around the members of a given family with a locally sliding window. The neighborhood analysis is done by computing conserved gene clusters. We evaluate our algorithm on the existing homologous families from the Genolevures database over five genomes of the Hemyascomycete phylum.</p> <p>Conclusions</p> <p>The result is an efficient algorithm that works on multiple genomes, considers paralogous copies of genes and is able to uncover orthologous clusters even in distant genomes. Resulting orthologous clusters are comparable to those obtained by manual curation.</p

    pubmed2ensembl: A Resource for Mining the Biological Literature on Genes

    Get PDF
    The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions.To overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data.By allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature
    corecore