43 research outputs found

    A quick guide for building a successful bioinformatics community

    Get PDF
    “Scientific community” refers to a group of people collaborating together on scientific-research-related activities who also share common goals, interests, and values. Such communities play a key role in many bioinformatics activities. Communities may be linked to a specific location or institute, or involve people working at many different institutions and locations. Education and training is typically an important component of these communities, providing a valuable context in which to develop skills and expertise, while also strengthening links and relationships within the community. Scientific communities facilitate: (i) the exchange and development of ideas and expertise; (ii) career development; (iii) coordinated funding activities; (iv) interactions and engagement with professionals from other fields; and (v) other activities beneficial to individual participants, communities, and the scientific field as a whole. It is thus beneficial at many different levels to understand the general features of successful, high-impact bioinformatics communities; how individual participants can contribute to the success of these communities; and the role of education and training within these communities. We present here a quick guide to building and maintaining a successful, high-impact bioinformatics community, along with an overview of the general benefits of participating in such communities. This article grew out of contributions made by organizers, presenters, panelists, and other participants of the ISMB/ECCB 2013 workshop “The ‘How To Guide’ for Establishing a Successful Bioinformatics Network” at the 21st Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and the 12th European Conference on Computational Biology (ECCB)

    The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update

    Get PDF
    Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially

    Transcription elongation factors represent in vivo cancer dependencies in glioblastoma

    Get PDF
    Glioblastoma is a universally lethal cancer with a median survival of approximately 15 months1. Despite substantial efforts to define druggable targets, there are no therapeutic options that meaningfully extend glioblastoma patient lifespan. While previous work has largely focused on in vitro cellular models, here we demonstrate a more physiologically relevant approach to target discovery in glioblastoma. We adapted pooled RNA interference (RNAi) screening technology2–4 for use in orthotopic patient-derived xenograft (PDX) models, creating a high-throughput negative selection screening platform in a functional in vivo tumour microenvironment. Using this approach, we performed parallel in vivo and in vitro screens and discovered that the chromatin and transcriptional regulators necessary for cell survival in vivo are non-overlapping with those required in vitro. We identified transcription pause-release and elongation factors as one set of in vivo-specific cancer dependencies and determined that these factors are necessary for enhancer-mediated transcriptional adaptations that enable cells to survive the tumour microenvironment. Our lead hit, JMJD6, mediates the upregulation of in vivo stress and stimulus response pathways through enhancer-mediated transcriptional pause-release, promoting cell survival specifically in vivo. Targeting JMJD6 or other identified elongation factors extends survival in orthotopic xenograft mouse models, supporting targeting the transcription elongation machinery as a therapeutic strategy for glioblastoma. More broadly, this study demonstrates the power of in vivo phenotypic screening to identify new classes of ‘cancer dependencies’ not identified by previous in vitro approaches, which could supply untapped opportunities for therapeutic intervention

    Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics

    Get PDF
    Background: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. Results: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. Conclusion: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network

    Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space

    Get PDF
    The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types

    Dietary and flight energetic adaptations in a salivary gland transcriptome of an insectivorous bat.

    Get PDF
    We hypothesized that evolution of salivary gland secretory proteome has been important in adaptation to insectivory, the most common dietary strategy among Chiroptera. A submandibular salivary gland (SMG) transcriptome was sequenced for the little brown bat, Myotis lucifugus. The likely secretory proteome of 23 genes included seven (RETNLB, PSAP, CLU, APOE, LCN2, C3, CEL) related to M. lucifugus insectivorous diet and metabolism. Six of the secretory proteins probably are endocrine, whereas one (CEL) most likely is exocrine. The encoded proteins are associated with lipid hydrolysis, regulation of lipid metabolism, lipid transport, and insulin resistance. They are capable of processing exogenous lipids for flight metabolism while foraging. Salivary carboxyl ester lipase (CEL) is thought to hydrolyze insect lipophorins, which probably are absorbed across the gastric mucosa during feeding. The other six proteins are predicted either to maintain these lipids at high blood concentrations or to facilitate transport and uptake by flight muscles. Expression of these seven genes and coordinated secretion from a single organ is novel to this insectivorous bat, and apparently has evolved through instances of gene duplication, gene recruitment, and nucleotide selection. Four of the recruited genes are single-copy in the Myotis genome, whereas three have undergone duplication(s) with two of these genes exhibiting evolutionary 'bursts' of duplication resulting in multiple paralogs. Evidence for episodic directional selection was found for six of seven genes, reinforcing the conclusion that the recruited genes have important roles in adaptation to insectivory and the metabolic demands of flight. Intragenic frequencies of mobile- element-like sequences differed from frequencies in the whole M. lucifugus genome. Differences among recruited genes imply separate evolutionary trajectories and that adaptation was not a single, coordinated event

    Integrative Approach Reveals Composition of Endoparasitoid Wasp Venoms

    Get PDF
    <div><p>The fruit fly <i>Drosophila melanogaster</i> and its endoparasitoid wasps are a developing model system for interactions between host immune responses and parasite virulence mechanisms. In this system, wasps use diverse venom cocktails to suppress the conserved fly cellular encapsulation response. Although numerous genetic tools allow detailed characterization of fly immune genes, lack of wasp genomic information has hindered characterization of the parasite side of the interaction. Here, we use high-throughput nucleic acid and amino acid sequencing methods to describe the venoms of two related Drosophila endoparasitoids with distinct infection strategies, <i>Leptopilina boulardi</i> and <i>L. heterotoma</i>. Using RNA-seq, we assembled and quantified libraries of transcript sequences from female wasp abdomens. Next, we used mass spectrometry to sequence peptides derived from dissected venom gland lumens. We then mapped the peptide spectral data against the abdomen transcriptomes to identify a set of putative venom genes for each wasp species. Our approach captured the three venom genes previously characterized in <i>L. boulardi</i> by traditional cDNA cloning methods as well as numerous new venom genes that were subsequently validated by a combination of RT-PCR, blast comparisons, and secretion signal sequence search. Overall, 129 proteins were found to comprise <i>L. boulardi</i> venom and 176 proteins were found to comprise <i>L. heterotoma</i> venom. We found significant overlap in <i>L. boulardi</i> and <i>L. heterotoma</i> venom composition but also distinct differences that may underlie their unique infection strategies. Our joint transcriptomic-proteomic approach for endoparasitoid wasp venoms is generally applicable to identification of functional protein subsets from any non-genome sequenced organism.</p></div
    corecore