512 research outputs found

    Clustering the annotation space of proteins

    Get PDF
    BACKGROUND: Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas. RESULTS: Here we report a new approach, named CLAN, which clusters proteins according to both annotation and sequence similarity. This approach is extremely fast, clustering the complete SwissProt database within minutes. It is also accurate, recovering consistent protein families agreeing on average in more than 97% with sequence-based protein families from Pfam. Discrepancies between sequence- and annotation-based clusters were scrutinized and the reasons reported. We demonstrate examples for each of these cases, and thoroughly discuss an example of a propagated error in SwissProt: a vacuolar ATPase subunit M9.2 erroneously annotated as vacuolar ATP synthase subunit H. CLAN algorithm is available from the authors and the CLAN database is accessible at CONCLUSIONS: CLAN creates refined function-and-sequence specific protein families that can be used for identification and annotation of unknown family members. It also allows easy identification of erroneous annotations by spotting inconsistencies between similarities on annotation and sequence levels

    Copper removal from industrial wastewaters by means of electrostatic shielding

    Get PDF
    Electrostatic shielding zones made of electrode graphite powder were used as a new type of ionic and electronic currentsinks. Because of the local elimination of the applied electric field, voltage and current within the zones, ions are led insidethem and accumulate there. We implemented the current sinks in electrodialysis of a simulated copper plating rinse watercontaining 100 mg L-1 Cu2+ ions and electrodeionization of a 0.001 M CuSO4 solution with simultaneous electrochemicalregeneration of the used ion exchange resin beds and obtained pure water with a Cu2+ ion concentration of less than 0.12 mgL-1 at a flow rate of 1.29x10-4 L s-1 diluate stream and a current density of 2 mA cm-2

    A model for Bioinformatics training : the Marine Biological Laboratory

    Get PDF
    Author Posting. © The Authors, 2010. This is the author's version of the work. It is posted here by permission of Oxford University Press for personal use, not for redistribution. The definitive version was published in Briefings in Bioinformatics 6 (2010): 610-615, doi:10.1093/bib/bbq029.Many areas of science such as biology, medicine, and oceanography are becoming increasingly data-rich and most programs that train scientists do not address informatics techniques or technologies that are necessary for managing and analyzing large amounts of data. Educational resources for scientists in informatics are scarce, yet scientists need the skills and knowledge to work with informaticians and manage graduate students and post-docs in informatics projects. The Marine Biological Laboratory houses a world-renowned library and is involved in a number of informatics projects in the sciences. The MBL has been home to the National Library of Medicine's BioMedical Informatics Course for nearly two decades and is committed to educating scientists and other scholars in informatics. In an innovative, immersive learning experience, Grant Yamashita, a biologist and post-doc at Arizona State University, visited the Science Informatics Group at MBL to learn first hand how informatics is done and how informatics teams work. Hands-on work with developers, systems administrators, librarians, and other scientists provided an invaluable education in informatics and is a model for future science informatics training.This work was supported by the National Science Foundation [0926026 to G.Y., SES-0623176]; Jewett Foundation; Ellison Medical Foundation

    Probabilistic annotation of protein sequences based on functional classifications

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.Abstract Background One of the most evident achievements of bioinformatics is the development of methods that transfer biological knowledge from characterised proteins to uncharacterised sequences. This mode of protein function assignment is mostly based on the detection of sequence similarity and the premise that functional properties are conserved during evolution. Most automatic approaches developed to date rely on the identification of clusters of homologous proteins and the mapping of new proteins onto these clusters, which are expected to share functional characteristics. Results Here, we inverse the logic of this process, by considering the mapping of sequences directly to a functional classification instead of mapping functions to a sequence clustering. In this mode, the starting point is a database of labelled proteins according to a functional classification scheme, and the subsequent use of sequence similarity allows defining the membership of new proteins to these functional classes. In this framework, we define the Correspondence Indicators as measures of relationship between sequence and function and further formulate two Bayesian approaches to estimate the probability for a sequence of unknown function to belong to a functional class. This approach allows the parametrisation of different sequence search strategies and provides a direct measure of annotation error rates. We validate this approach with a database of enzymes labelled by their corresponding four-digit EC numbers and analyse specific cases. Conclusion The performance of this method is significantly higher than the simple strategy consisting in transferring the annotation from the highest scoring BLAST match and is expected to find applications in automated functional annotation pipelines.Published versio

    Genome-wide expression patterns in physiological cardiac hypertrophy

    Get PDF
    Abstract Background Physiological left ventricular hypertrophy (LVH) involves complex cardiac remodeling that occurs as an adaptive response to chronic exercise. A stark clinical contrast exists between physiological LVH and pathological cardiac remodeling in response to diseases such as hypertension, but little is known about the precise molecular mechanisms driving physiological adaptation. Results In this study, the first large-scale analysis of publicly available genome-wide expression data of several in vivo murine models of physiological LVH was carried out using network analysis. On evaluating 3 million gene co-expression patterns across 141 relevant microarray experiments, it was found that physiological adaptation is an evolutionarily conserved processes involving preservation of the function of cytochrome c oxidase, induction of autophagy compatible with cell survival, and coordinated regulation of angiogenesis. Conclusion This analysis not only identifies known biological pathways involved in physiological LVH, but also offers novel insights into the molecular basis of this phenotype by identifying key networks of co-expressed genes, as well as their topological and functional properties, using relevant high-quality microarray experiments and network inference. </jats:sec

    Expansion of the BioCyc collection of pathway/genome databases to 160 genomes

    Get PDF
    The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB in the BioCyc collection describes the genome and predicted metabolic network of a single organism, inferred from the MetaCyc database, which is a reference source on metabolic pathways from multiple organisms. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms. This paper discusses the computational methodology by which the BioCyc collection has been expanded, and presents an aggregate analysis of the collection that includes the range of number of pathways present in these organisms, and the most frequently observed pathways. We seek scientists to adopt and curate individual PGDBs within the BioCyc collection. Only by harnessing the expertise of many scientists we can hope to produce biological databases, which accurately reflect the depth and breadth of knowledge that the biomedical research community is producing

    Measuring genome conservation across taxa: divided strains and united kingdoms

    Get PDF
    Species evolutionary relationships have traditionally been defined by sequence similarities of phylogenetic marker molecules, recently followed by whole-genome phylogenies based on gene order, average ortholog similarity or gene content. Here, we introduce genome conservation—a novel metric of evolutionary distances between species that simultaneously takes into account, both gene content and sequence similarity at the whole-genome level. Genome conservation represents a robust distance measure, as demonstrated by accurate phylogenetic reconstructions. The genome conservation matrix for all presently sequenced organisms exhibits a remarkable ability to define evolutionary relationships across all taxonomic ranges. An assessment of taxonomic ranks with genome conservation shows that certain ranks are inadequately described and raises the possibility for a more precise and quantitative taxonomy in the future. All phylogenetic reconstructions are available at the genome phylogeny server: <>

    Establishment of computational biology in Greece and Cyprus: Past, present, and future.

    Get PDF
    We review the establishment of computational biology in Greece and Cyprus from its inception to date and issue recommendations for future development. We compare output to other countries of similar geography, economy, and size—based on publication counts recorded in the literature—and predict future growth based on those counts as well as national priority areas. Our analysis may be pertinent to wider national or regional communities with challenges and opportunities emerging from the rapid expansion of the field and related industries. Our recommendations suggest a 2-fold growth margin for the 2 countries, as a realistic expectation for further expansion of the field and the development of a credible roadmap of national priorities, both in terms of research and infrastructure funding
    corecore