21 research outputs found

    GeneNet toolbox for MATLAB: a flexible platform for the analysis of gene connectivity in biological networks

    Get PDF
    We present GeneNet Toolbox for MATLAB (also available as a set of standalone applications for Linux). The toolbox, available as command-line or with a graphical user interface, enables biologists to assess connectivity among a set of genes of interest (‘seed-genes’) within a biological network of their choosing. Two methods are implemented for calculating the significance of connectivity among seed-genes: ‘seed randomization’ and ‘network permutation’. Options include restricting analyses to a specified subnetwork of the primary biological network, and calculating connectivity from the seed-genes to a second set of interesting genes. Pre-analysis tools help the user choose the best connectivity-analysis algorithm for their network. The toolbox also enables visualization of the connections among seed-genes. GeneNet Toolbox functions execute in reasonable time for very large networks (∼10 million edges) on a desktop computer

    The clustering of functionally related genes contributes to CNV-mediated disease

    Get PDF
    Clusters of functionally related genes can be disrupted by a single copy number variant (CNV). We demonstrate that the simultaneous disruption of multiple functionally related genes is a frequent and significant characteristic of de novo CNVs in patients with developmental disorders (P = 1 × 10−3). Using three different functional networks, we identified unexpectedly large numbers of functionally related genes within de novo CNVs from two large independent cohorts of individuals with developmental disorders. The presence of multiple functionally related genes was a significant predictor of a CNV's pathogenicity when compared to CNVs from apparently healthy individuals and a better predictor than the presence of known disease or haploinsufficient genes for larger CNVs. The functionally related genes found in the de novo CNVs belonged to 70% of all clusters of functionally related genes found across the genome. De novo CNVs were more likely to affect functional clusters and affect them to a greater extent than benign CNVs (P = 6 × 10−4). Furthermore, such clusters of functionally related genes are phenotypically informative: Different patients possessing CNVs that affect the same cluster of functionally related genes exhibit more similar phenotypes than expected (P < 0.05). The spanning of multiple functionally similar genes by single CNVs contributes substantially to how these variants exert their pathogenic effects

    The Malaria Cell Atlas: single parasite transcriptomes across the complete Plasmodium life cycle

    Get PDF
    Malaria parasites adopt a remarkable variety of morphological life stages as they transition through multiple mammalian host and mosquito vector environments. We profiled the single-cell transcriptomes of thousands of individual parasites, deriving the first high-resolution transcriptional atlas of the entire life cycle. We then used our atlas to precisely define developmental stages of single cells from three different human malaria parasite species, including parasites isolated directly from infected individuals. The Malaria Cell Atlas provides both a comprehensive view of gene usage in a eukaryotic parasite and an open-access reference dataset for the study of malaria parasites

    EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data.

    Get PDF
    Droplet-based single-cell RNA sequencing protocols have dramatically increased the throughput of single-cell transcriptomics studies. A key computational challenge when processing these data is to distinguish libraries for real cells from empty droplets. Here, we describe a new statistical method for calling cells from droplet-based data, based on detecting significant deviations from the expression profile of the ambient solution. Using simulations, we demonstrate that EmptyDrops has greater power than existing approaches while controlling the false discovery rate among detected cells. Our method also retains distinct cell types that would have been discarded by existing methods in several real data sets

    SC3: consensus clustering of single-cell RNA-seq data

    Get PDF
    Single-cell RNA-seq enables the quantitative characterization of cell types based on global transcriptome profiles. We present single-cell consensus clustering (SC3), a user-friendly tool for unsupervised clustering, which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach (http://bioconductor.org/packages/SC3). We demonstrate that SC3 is capable of identifying subclones from the transcriptomes of neoplastic cells collected from patients.V.Y.K., T.A., A.Y. and M.H. are supported by Wellcome Trust Grants. K.N.N. is supported by the Wellcome Trust Strategic Award 'Single cell genomics of mouse gastrulation'. M.T.S. acknowledges support from FRS-FNRS; the Belgian Network DYSCO (Dynamical Systems, Control and Optimisation), funded by the Interuniversity Attraction Poles Programme initiated by the Belgian State Science Policy Office; and the ARC (Action de Recherche Concerte) on Mining and Optimization of Big Data Models, funded by the Wallonia-Brussels Federation. M.B. acknowledges support from EPSRC (grant EP/N014529/1). T.C. was funded through a core funded fellowship by the Sanger Institute and a Chancellor′s fellowship from the University of Edinburgh. K.K. and A.R.G. are supported by Bloodwise (grant ref. 13003), the Wellcome Trust (grant ref. 104710/Z/14/Z), the Medical Research Council, the Kay Kendall Leukaemia Fund, the Cambridge NIHR Biomedical Research Center, the Cambridge Experimental Cancer Medicine Centre, the Leukemia and Lymphoma Society of America (grant ref. 07037) and a core support grant from the Wellcome Trust and MRC to the Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute. W.R. was supported by BBSRC (grant ref. BB/K010867/1), the Wellcome Trust (grant ref. 095645/Z/11/Z), EU BLUEPRINT and EpiGeneSys

    Clustering genes by function to understand disease phenotypes : code used in the analyses presented in the thesis with the same title

    No full text
    Contains Perl & C algorithms used in the analysis as well as R scripts for producing figures. Does not include publicly available datasets and annotations required to replicate the analyses

    Clustering genes by function to understand disease phenotypes

    No full text
    Developmental disorders including: autism, intellectual disability, and congenital abnormalities are present in 3-8% of live births and display a huge amount of phenotypic and genetic heterogeneity. Traditionally, geneticists have identified individual monogenic diseases among these patients but a majority of patients fail to receive a clinical diagnosis. However, the genomes of these patients frequently harbour large copynumber variants (CNVs) but their interpretation remains challenging. Using pathway analysis I found significant functional associations for 329 individual phenotypes and show that 39% of these could explain the patients’ multiple co-morbid phenotypes; and multiple associated genes clustered within individual CNVs. I showed there was significantly more such clustering than expected by chance. In addition, the presence of a multiple functionally-related genes is a significant predictor of CNV pathogenicity beyond the presence of known disease genes and size of the CNV. This clustering of functionally-related genes was part of a broader pattern of functional clusters across the human genome. These genome-wide functional clusters showed tissuespecific expression and some evidence of chromatin-domain level regulation. Furthermore, many genome-wide functional clusters were enriched in segmental duplications making them prone to CNV-causing mutations and were frequently seen disrupted in healthy individuals. However, the majority of the time a pathogenic CNV affected the entire functional cluster, where as benign CNVs tended to affect only one or two genes. I also showed that patients with CNVs affecting the same functional cluster are significantly more phenotypically similar to each other than expected even if their CNVs do not affect any of the same genes. Lastly, I considered one of the major limitations in pathway analysis, namely ascertainment biases in functional information due to the prioritization of genes linked to human disease, and show how the modular nature of gene-networks can be used to identify and prioritize understudied genes

    False signals induced by single-cell imputation [version 2; referees: 3 approved, 1 approved with reservations]

    No full text
    Background: Single-cell RNA-seq is a powerful tool for measuring gene expression at the resolution of individual cells.  A challenge in the analysis of this data is the large amount of zero values, representing either missing data or no expression. Several imputation approaches have been proposed to address this issue, but they generally rely on structure inherent to the dataset under consideration they may not provide any additional information, hence, are limited by the information contained therein and the validity of their assumptions. Methods: We evaluated the risk of generating false positive or irreproducible differential expression when imputing data with six different methods. We applied each method to a variety of simulated datasets as well as to permuted real single-cell RNA-seq datasets and consider the number of false positive gene-gene correlations and differentially expressed genes. Using matched 10X and Smart-seq2 data we examined whether cell-type specific markers were reproducible across datasets derived from the same tissue before and after imputation. Results: The extent of false-positives introduced by imputation varied considerably by method. Data smoothing based methods, MAGIC, knn-smooth and dca, generated many false-positives in both real and simulated data. Model-based imputation methods typically generated fewer false-positives but this varied greatly depending on the diversity of cell-types in the sample. All imputation methods decreased the reproducibility of cell-type specific markers, although this could be mitigated by selecting markers with large effect size and significance. Conclusions: Imputation of single-cell RNA-seq data introduces circularity that can generate false-positive results. Thus, statistical tests applied to imputed data should be treated with care. Additional filtering by effect size can reduce but not fully eliminate these effects. Of the methods we considered, SAVER was the least likely to generate false or irreproducible results, thus should be favoured over alternatives if imputation is necessary

    GeneNet Toolbox for MATLAB: a flexible platform for the analysis of gene connectivity in biological networks

    No full text
    Summary: We present GeneNet Toolbox for MATLAB (also available as a set of standalone applications for Linux). The toolbox, available as command-line or with a graphical user interface, enables biologists to assess connectivity among a set of genes of interest (‘seed-genes’) within a biological network of their choosing. Two methods are implemented for calculating the significance of connectivity among seed-genes: ‘seed randomization’ and ‘network permutation’. Options include restricting analyses to a specified subnetwork of the primary biological network, and calculating connectivity from the seed-genes to a second set of interesting genes. Pre-analysis tools help the user choose the best connectivity-analysis algorithm for their network. The toolbox also enables visualization of the connections among seed-genes. GeneNet Toolbox functions execute in reasonable time for very large networks (∼10 million edges) on a desktop computer. Availability and implementation: GeneNet Toolbox is open source and freely available from http://avigailtaylor.github.io/gntat14. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: [email protected]
    corecore