9,244 research outputs found

    Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis

    Full text link
    In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my SeqMM is used to explore the hierarchical structures of chromosomes. I find that in global scale, the Fiedler vector from my SeqMM bears a great similarity with the principal vector from principal component analysis, and can be used to study genomic compartments. In TAD analysis, I find that TADs evaluated from different scales are not consistent and vary a lot. Particularly when the sequence scale is small, the calculated TAD boundaries are dramatically different. Even for regions with high contact frequencies, TAD regions show no obvious consistence. However, when the scale value increases further, although TADs are still quite different, TAD boundaries in these high contact frequency regions become more and more consistent. Finally, I find that for a fixed local scale, my method can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

    A multiscale analysis of gene flow for the New England cottontail, an imperiled habitat specialist in a fragmented landscape

    Get PDF
    Landscape features of anthropogenic or natural origin can influence organisms\u27 dispersal patterns and the connectivity of populations. Understanding these relationships is of broad interest in ecology and evolutionary biology and provides key insights for habitat conservation planning at the landscape scale. This knowledge is germane to restoration efforts for the New England cottontail (Sylvilagus transitionalis), an early successional habitat specialist of conservation concern. We evaluated local population structure and measures of genetic diversity of a geographically isolated population of cottontails in the northeastern United States. We also conducted a multiscale landscape genetic analysis, in which we assessed genetic discontinuities relative to the landscape and developed several resistance models to test hypotheses about landscape features that promote or inhibit cottontail dispersal within and across the local populations. Bayesian clustering identified four genetically distinct populations, with very little migration among them, and additional substructure within one of those populations. These populations had private alleles, low genetic diversity, critically low effective population sizes (3.2-36.7), and evidence of recent genetic bottlenecks. Major highways and a river were found to limit cottontail dispersal and to separate populations. The habitat along roadsides, railroad beds, and utility corridors, on the other hand, was found to facilitate cottontail movement among patches. The relative importance of dispersal barriers and facilitators on gene flow varied among populations in relation to landscape composition, demonstrating the complexity and context dependency of factors influencing gene flow and highlighting the importance of replication and scale in landscape genetic studies. Our findings provide information for the design of restoration landscapes for the New England cottontail and also highlight the dual influence of roads, as both barriers and facilitators of dispersal for an early successional habitat specialist in a fragmented landscape

    Recovering complete and draft population genomes from metagenome datasets.

    Get PDF
    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution
    corecore