4 research outputs found

    A Hidden Markov Model for Copy Number Variant prediction from whole genome resequencing data

    Get PDF
    Motivation: Copy Number Variants (CNVs) are important genetic factors for studying human diseases. While high-throughput whole genome re-sequencing provides multiple lines of evidence for detecting CNVs, computational algorithms need to be tailored for different type or size of CNVs under different experimental designs. Results: To achieve optimal power and resolution of detecting CNVs at low depth of coverage, we implemented a Hidden Markov Model that integrates both depth of coverage and mate-pair relationship. The novelty of our algorithm is that we infer the likelihood of carrying a deletion jointly from multiple mate pairs in a region without the requirement of a single mate pairs being obvious outliers. By integrating all useful information in a comprehensive model, our method is able to detect medium-size deletions (200-2000bp) at low depth (<10× per sample). We applied the method to simulated data and demonstrate the power of detecting medium-size deletions is close to theoretical values. Availability: A program implemented in Java, Zinfandel, is available at http://www.cs.columbia.edu/~itsik/zinfandel

    Shotgun metagenomics reveals interkingdom association between intestinal bacteria and fungi involving competition for nutrients

    Get PDF
    Comprehensive database; Diet; MicrobiomeBase de datos integral; Dieta; MicrobiomaBase de dades integral; Dieta; MicrobiomaBackground The accuracy of internal-transcribed-spacer (ITS) and shotgun metagenomics has not been robustly evaluated, and the effect of diet on the composition and function of the bacterial and fungal gut microbiome in a longitudinal setting has been poorly investigated. Here we compared two approaches to study the fungal community (ITS and shotgun metagenomics), proposed an enrichment protocol to perform a reliable mycobiome analysis using a comprehensive in-house fungal database, and correlated dietary data with both bacterial and fungal communities. Results We found that shotgun DNA sequencing after a new enrichment protocol combined with the most comprehensive and novel fungal databases provided a cost-effective approach to perform gut mycobiome profiling at the species level and to integrate bacterial and fungal community analyses in fecal samples. The mycobiome was significantly more variable than the bacterial community at the compositional and functional levels. Notably, we showed that microbial diversity, composition, and functions were associated with habitual diet composition instead of driven by global dietary changes. Our study indicates a potential competitive inter-kingdom interaction between bacteria and fungi for food foraging. Conclusion Together, our present work proposes an efficient workflow to study the human gut microbiome integrating robustly fungal, bacterial, and dietary data. These findings will further advance our knowledge of the interaction between gut bacteria and fungi and pave the way for future investigations in human mycobiome.This work was supported by the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Action, Innovative Training Network [grant number 812969]

    Parsimonious Higher-Order Hidden Markov Models for Improved Array-CGH Analysis with Applications to Arabidopsis thaliana

    Get PDF
    Array-based comparative genomic hybridization (Array-CGH) is an important technology in molecular biology for the detection of DNA copy number polymorphisms between closely related genomes. Hidden Markov Models (HMMs) are popular tools for the analysis of Array-CGH data, but current methods are only based on first-order HMMs having constrained abilities to model spatial dependencies between measurements of closely adjacent chromosomal regions. Here, we develop parsimonious higher-order HMMs enabling the interpolation between a mixture model ignoring spatial dependencies and a higher-order HMM exhaustively modeling spatial dependencies. We apply parsimonious higher-order HMMs to the analysis of Array-CGH data of the accessions C24 and Col-0 of the model plant Arabidopsis thaliana. We compare these models against first-order HMMs and other existing methods using a reference of known deletions and sequence deviations. We find that parsimonious higher-order HMMs clearly improve the identification of these polymorphisms. Moreover, we perform a functional analysis of identified polymorphisms revealing novel details of genomic differences between C24 and Col-0. Additional model evaluations are done on widely considered Array-CGH data of human cell lines indicating that parsimonious HMMs are also well-suited for the analysis of non-plant specific data. All these results indicate that parsimonious higher-order HMMs are useful for Array-CGH analyses. An implementation of parsimonious higher-order HMMs is available as part of the open source Java library Jstacs (www.jstacs.de/index.php/PHHMM)

    Charting genomic heterogeneity in tumours : from bulk to single cell

    Get PDF
    Tumours do not consist of a single homogeneous population but are complex heterogeneous systems that contain billions of ever-evolving cells with no two tumours being the same. Tumour heterogeneity is present at three levels, 1) inter-patient heterogeneity; 2) intra-patient heterogeneity; and 3) intra-tumour heterogeneity (ITH). Understanding all levels of heterogeneity is crucial for patient prognosis and treatment choice. To this end, we aimed to improve our understanding of all three levels of tumour heterogeneity. In paper I we investigated the prevalence, type, length, and genomic distribution of 853.218 somatic copy number alterations (SCNAs) across 20.249 tumours belonging to 32 cancer types. Based on the 1) number of SCNAs; 2) percentage of the genome altered; and 3) average SCNA size, we found high levels of inter-patient heterogeneity, both between and within cancer types. We found that specific chromosomes were preferentially lost or gained depending on cancer type. Lastly, we detected co-alterations of key oncogenes and TSGs. Taken together, we provided a comprehensive analysis on SCNAs across many cancer types as a valuable resource for the community. In paper II we sought to elucidate intra-patient heterogeneity in non-small cell lung cancer (NSCLC) and their matched brain metastasis (BM). We performed shallow wholegenome sequencing (WGS) on 51 primary NSCLC and matched BM, whole exome sequencing on 40 of the pairs, multi-region sequencing of 15 BMs, and shallow WGS on an additional cohort of 115 BMs. We showed that there is significant intra-patient heterogeneity at the SCNA level, with BM samples showing, on average, more SCNAs compared to their matched NSCLC. In contrast, multi-region sequencing of 15 BMs did not show significant ITH at the level of SCNAs. Finally, we identified putative metastatic driver SCNAs and singlenucleotide variants in key tumour suppressor genes (TSGs) and oncogenes. In paper III we aimed to assess the level of ITH in early localized prostate cancer. We performed organ-wide, multi-region, single-cell DNA sequencing on two prostate midsections. We found transient chromosomal instability (CIN) both in tumour and normal prostate tissue, evidenced by a large number of cells with unique chromosomal (arm) losses and or gains. Furthermore, we found three distinct groups of cells within the prostate: 1) diploid cells; 2) pseudo-diploid cells; and 3) monster cells. We observed an enrichment of diploid cells in normal regions and pseudo-diploid cells in tumour-rich regions, while monster cells were equally distributed over the entire prostate, again suggesting that there were elevated CIN levels across the prostate. Lastly, we detected highly localized subclones that were exclusive to tumour-rich regions and harboured deletions in TSGs that are known to be frequently deleted in prostate cancer. Taken together, with this thesis, I have contributed to advance the understanding of inter-patient, intra-patient, and intra-tumour heterogeneity
    corecore