318 research outputs found

    The SIB Swiss Institute of Bioinformatics’ resources : focus on curated databases

    Get PDF
    The SIB Swiss Institute of Bioinformatics provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article

    The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases

    Get PDF
    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article

    What I talk about when I talk about integration of single-cell data

    Get PDF
    Over the past decade, single-cell technologies evolved from profiling hundreds of cells to millions of cells, and emerged from a single modality of data to cover multiple views at single-cell resolution, including genome, epigenome, transcriptome, and so on. With advance of these single-cell technologies, the booming of multimodal single-cell data creates a valuable resource for us to understand cellular heterogeneity and molecular mechanism at a comprehensive level. However, the large-scale multimodal single-cell data also presents a huge computational challenge for insightful integrative analysis. Here, I will lay out problems in data integration that single-cell research community is interested in and introduce computational principles for solving these integration problems. In the following chapters, I will present four computational methods for data integration under different scenarios. Finally, I will discuss some future directions and potential applications of single-cell data integration

    Comparative analysis of plant genomes through data integration

    Get PDF
    When we started our research in 2008, several online resources for genomics existed, each with a different focus. TAIR (The Arabidopsis Information Resource) has a focus on the plant model species Arabidopsis thaliana, with (at that time) little or no support for evolutionary or comparative genomics. Ensemble provided some basic tools and functions as a data warehouse, but it would only start incorporating plant genomes in 2010. There was no online resource at that time however, that provided the necessary data content and tools for plant comparative and evolutionary genomics that we required. As such, the plant community was missing an essential component to get their research at the same level as the biomedicine oriented research communities. We started to work on PLAZA in order to provide such a data resource that could be accessed by the plant community, and which also contained the necessary data content to help our research group’s focus on evolutionary genomics. The platform for comparative and evolutionary genomics, which we named PLAZA, was developed from scratch (i.e. not based on an existing database scheme, such as Ensemble). Gathering the data for all species, parsing this data into a common format and then uploading it into the database was the next step. We developed a processing pipeline, based on sequence similarity measurements, to group genes into gene families and sub families. Functional annotation was gathered through both the original data providers and through InterPro scans, combined with Interpro2GO. This primary data information was then ready to be used in every subsequent analysis. Building such a database was good enough for research within our bioinformatics group, but the target goal was to provide a comprehensive resource for all plant biologists with an interest in comparative and evolutionary genomics. Designing and creating a user-friendly, visually appealing web interface, connected to our database, was the next step. While the most detailed information is commonly presented in data tables, aesthetically pleasing graphics, images and charts are often used to visualize trends, general statistics and also used in specific tools. Design and development of these tools and visualizations is thus one of the core elements within my PhD. The PLAZA platform was designed as a gene-centric data resource, which is easily navigated when a biologist wants to study a relative small number of genes. However, using the default PLAZA website to retrieve information for dozens of genes quickly becomes very tedious. Therefore a ’gene set’-centric extra layer was developed where user-defined gene sets could be quickly analyzed. This extra layer, called the PLAZA workbench, functions on top of the normal PLAZA website, implicating that only gene sets from species present within the PLAZA database can be directly analyzed. The PLAZA resource for comparative and evolutionary genomics was a major success, but it still had several issues. We tried to solve at least two of these problems at the same time by creating a new platform. The first issue was the building procedure of PLAZA: adding a single species, or updating the structural annotation of an existing one, requires the total re-computation of the database content. The second issue was the restrictiveness of the PLAZA workbench: through a mapping procedure gene sets could be entered for species not present in the PLAZA database, but for species without a phylogenetic close relative this approach did not always yield satisfying results. Furthermore, the research in question might just focus on the difference between a species present in PLAZA and a close relative not present in PLAZA (e.g. to study adaptation to a different ecological niche). In such a case, the mapping procedure is in itself useless. With the advent of NGS transcriptome data sets for a growing number of species, it was clear that a next challenge had presented itself. We designed and developed a new platform, named TRAPID, which could automatically process entire transcriptome data sets, using a reference database. The target goal was to have the processing done quickly with the results containing both gene family oriented data (such as multiple sequence alignments and phylogenetic trees) and functional characterization of the transcripts. Major efforts went into designing the processing pipeline so it could be reliable, fast and accurate

    The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases.

    Get PDF
    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article

    A survey of best practices for RNA-seq data analysis.

    Get PDF
    RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.This is the final published version. It first appeared at http://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8

    The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases

    Get PDF
    The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article

    Extreme overall mushroom genome expansion in Mycena s.s. irrespective of plant hosts or substrate specializations.

    Get PDF
    Mycena s.s. is a ubiquitous mushroom genus whose members degrade multiple dead plant substrates and opportunistically invade living plant roots. Having sequenced the nuclear genomes of 24 Mycena species, we find them to defy the expected patterns for fungi based on both their traditionally perceived saprotrophic ecology and substrate specializations. Mycena displayed massive genome expansions overall affecting all gene families, driven by novel gene family emergence, gene duplications, enlarged secretomes encoding polysaccharide degradation enzymes, transposable element (TE) proliferation, and horizontal gene transfers. Mainly due to TE proliferation, Arctic Mycena species display genomes of up to 502 Mbp (2-8× the temperate Mycena), the largest among mushroom-forming Agaricomycetes, indicating a possible evolutionary convergence to genomic expansions sometimes seen in Arctic plants. Overall, Mycena show highly unusual, varied mosaic-like genomic structures adaptable to multiple lifestyles, providing genomic illustration for the growing realization that fungal niche adaptations can be far more fluid than traditionally believed

    Temporal and Causal Inference with Longitudinal Multi-omics Microbiome Data

    Get PDF
    Microbiomes are communities of microbes inhabiting an environmental niche. Thanks to next generation sequencing technologies, it is now possible to study microbial communities, their impact on the host environment, and their role in specific diseases and health. Technology has also triggered the increased generation of multi-omics microbiome data, including metatranscriptomics (quantitative survey of the complete metatranscriptome of the microbial community), metabolomics (quantitative profile of the entire set of metabolites present in the microbiome\u27s environmental niche), and host transcriptomics (gene expression profile of the host). Consequently, another major challenge in microbiome data analysis is the integration of multi-omics data sets and the construction of unified models. Finally, since microbiomes are inherently dynamic, to fully understand the complex interactions that take place within these communities, longitudinal studies are critical. Although the analysis of longitudinal microbiome data has been attempted, these approaches do not attempt to probe interactions between taxa, do not offer holistic analyses, and do not investigate causal relationships. In this work we propose approaches to address all of the above challenges. We propose novel analysis pipelines to analyze multi-omic longitudinal microbiome data, and to infer temporal and causal relationships between the different entities involved. As a first step, we showed how to deal with longitudinal metagenomic data sets by building a pipeline, PRIMAL, which takes microbial abundance data as input and outputs a dynamic Bayesian network model that is highly predictive, suggests significant interactions between the different microbes, and proposes important connections from clinical variables. A significant innovation of our work is its ability to deal with differential rates of the internal biological processes in different individuals. Second, we showed how to analyze longitudinal multi-omic microbiome datasets. Our pipeline, PALM, significantly extends the previous state of the art by allowing for the integration of longitudinal metatranscriptomics, host transcriptomics, and metabolomics data in additional to longitudinal metagenomics data. PALM achieves prediction powers comparable to the PRIMAL pipeline while discovering a web of interactions between the entities of far greater complexity. An important innovation of PALM is the use of a multi-omic Skeleton framework that incorporates prior knowledge in the learning of the models. Another major innovation of this work is devising a suite of validation methods, both in silico and in vitro, enhancing the utility and validity of PALM. Finally, we propose a suite of novel methods (unrolling and de-confounding), called METALICA, consisting of tools and techniques that make it possible to uncover significant details about the nature of microbial interactions. We also show methods to validate such interactions using ground truth databases. The proposed methods were tested using an IBD multi-omics dataset

    Pore-Forming Proteins from Cnidarians and Arachnids as Potential Biotechnological Tools

    Get PDF
    Animal venoms are complex mixtures of highly specialized toxic molecules. Cnidarians and arachnids produce pore-forming proteins (PFPs) directed against the plasma membrane of their target cells. Among PFPs from cnidarians, actinoporins stand out for their small size and molecular simplicity. While native actinoporins require only sphingomyelin for membrane binding, engineered chimeras containing a recognition antibody-derived domain fused to an actinoporin isoform can nonetheless serve as highly specific immunotoxins. Examples of such constructs targeted against malignant cells have been already reported. However, PFPs from arachnid venoms are less well-studied from a structural and functional point of view. Spiders from the Latrodectus genus are professional insect hunters that, as part of their toxic arsenal, produce large PFPs known as latrotoxins. Interestingly, some latrotoxins have been identified as potent and highly-specific insecticides. Given the proteinaceous nature of these toxins, their promising future use as efficient bioinsecticides is discussed throughout this Perspective. Protein engineering and large-scale recombinant production are critical steps for the use of these PFPs as tools to control agriculturally important insect pests. In summary, both families of PFPs, from Cnidaria and Arachnida, appear to be molecules with promising biotechnological applications
    corecore