3 research outputs found

    An integrative approach for a network based meta-analysis of viral RNAi screens.

    Get PDF
    BACKGROUND: Big data is becoming ubiquitous in biology, and poses significant challenges in data analysis and interpretation. RNAi screening has become a workhorse of functional genomics, and has been applied, for example, to identify host factors involved in infection for a panel of different viruses. However, the analysis of data resulting from such screens is difficult, with often low overlap between hit lists, even when comparing screens targeting the same virus. This makes it a major challenge to select interesting candidates for further detailed, mechanistic experimental characterization. RESULTS: To address this problem we propose an integrative bioinformatics pipeline that allows for a network based meta-analysis of viral high-throughput RNAi screens. Initially, we collate a human protein interaction network from various public repositories, which is then subjected to unsupervised clustering to determine functional modules. Modules that are significantly enriched with host dependency factors (HDFs) and/or host restriction factors (HRFs) are then filtered based on network topology and semantic similarity measures. Modules passing all these criteria are finally interpreted for their biological significance using enrichment analysis, and interesting candidate genes can be selected from the modules. CONCLUSIONS: We apply our approach to seven screens targeting three different viruses, and compare results with other published meta-analyses of viral RNAi screens. We recover key hit genes, and identify additional candidates from the screens. While we demonstrate the application of the approach using viral RNAi data, the method is generally applicable to identify underlying mechanisms from hit lists derived from high-throughput experimental data, and to select a small number of most promising genes for further mechanistic studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-015-0035-7) contains supplementary material, which is available to authorized users

    GET_PANGENES: calling pangenes from plant genome alignments confirms presence-absence variation.

    No full text
    Crop pangenomes made from individual cultivar assemblies promise easy access to conserved genes, but genome content variability and inconsistent identifiers hamper their exploration. To address this, we define pangenes, which summarize a species coding potential and link back to original annotations. The protocol get_pangenes performs whole genome alignments (WGA) to call syntenic gene models based on coordinate overlaps. A benchmark with small and large plant genomes shows that pangenes recapitulate phylogeny-based orthologies and produce complete soft-core gene sets. Moreover, WGAs support lift-over and help confirm gene presence-absence variation. Source code and documentation: https://github.com/Ensembl/plant-scripts

    Additional file 1 of GET_PANGENES: calling pangenes from plant genome alignments confirms presence-absence variation

    No full text
    Additional file 1: Table S1. Other Whole Genome Alignment stats for minimap2 and GSAlign algorithms. Table S2. Summary of BUSCO completeness analyses of individual genomes that are part of datasets in this paper. Table S3. Collinear genes found between Arabidopsis thaliana and A. lyrata within 23 blocks of the Ancestral Crucifer Karyotype based on Whole Genome Alignments produced with minimap2 and GSAlign. Table S4. Excerpt from BED-like pangene matrix produced during the analysis of dataset rice3. Table S5. Summary of Whole Genome Alignment (WGA) evidence for the gene models in CDS cluster Horvu_MOREX_1H01G011400 resulting from the analysis of dataset barley20. Figure S1. Overlap ratio of collinear gene models in rice, wheat and barley. Figure S2. Dot plots of collinear gene models called in rice, wheat and barley genomes. Figure S3. Venn diagrams of pangene clusters based on minimap2 and GSAlign Whole Genome Alignments of the rice3 dataset. Figure S4. Sequence identity among sequences in rice3 pangene clusters based on minimap2 (left) and GSAlign (right). Figure S5. Example of pangene cluster where the cDNA sequences have a long local alignment but the encoded CDS sequences cannot be aligned. Figure S6. Examples of rice pangene clusters not matched by Ensembl Compara orthogroups. Figure S7. Example of pangene cluster where the encoded protein sequences do not share protein domains. Figure S8. Flowchart of script check_evidence.pl , which uses as input a cluster in FASTA format and precomputed collinearity evidence in TSV format. Figure S9. Partial deletion of locus HvFT3/Ppd-H2 in barley cultivar Igri. Figure S10. Genomic context of pangene cluster HORVU.MOREX.r3.2HG0166090 (cluster members indicated with green arrows), which corresponds to barley locus HvCEN. Figure S11. Multiple alignment of protein sequences of pangene cluster HORVU.MOREX.r3.2HG0184740, which corresponds to barley locus Vrs1. Figure S12. Multiple alignment of protein sequences of pangene cluster HORVU.MOREX.r3.3HG0311160, which corresponds to barley locus HvOS2. Figure S13. Genomic context of pangene cluster gene:HORVU.MOREX.r3.7HG0752640, an example with tandem copies (cluster members indicated with green arrows), which encode acidic proteins
    corecore