Search CORE

9,244 research outputs found

Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis

Author: Xia Kelin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/07/2017
Field of study

In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my SeqMM is used to explore the hierarchical structures of chromosomes. I find that in global scale, the Fiedler vector from my SeqMM bears a great similarity with the principal vector from principal component analysis, and can be used to study genomic compartments. In TAD analysis, I find that TADs evaluated from different scales are not consistent and vary a lot. Particularly when the sequence scale is small, the calculated TAD boundaries are dramatically different. Even for regions with high contact frequencies, TAD regions show no obvious consistence. However, when the scale value increases further, although TADs are still quite different, TAD boundaries in these high contact frequency regions become more and more consistent. Finally, I find that for a fixed local scale, my method can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE

arXiv.org e-Print Archive

Directory of Open Access Journals

DR-NTU (Digital Repository of NTU)

FigShare

Techniques for clustering gene expression data

Author: Crane Martin
Doolan Padraig
Kerr Gráinne
Ruskin Heather J.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

CiteSeerX

Irish Universities

DCU Online Research Access Service

A multiscale analysis of gene flow for the New England cottontail, an imperiled habitat specialist in a fragmented landscape

Author: Allendorf F. W.
Brubaker D. R.
Chapman J. A.
Fenderson L. E.
Forman R. T. T.
Franklin I.
Fuller S.
Jackson S. N.
Kovach A. I.
Litvaitis J. A.
Litvaitis J. A.
Litvaitis J. A.
Litvaitis M. K.
Manly B. F. J.
Mantel N.
Pritchard J. K.
R Development Core Team
Sauer J. R.
USFWS
USFWS
Walsh C.
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 18/04/2014
Field of study

Landscape features of anthropogenic or natural origin can influence organisms\u27 dispersal patterns and the connectivity of populations. Understanding these relationships is of broad interest in ecology and evolutionary biology and provides key insights for habitat conservation planning at the landscape scale. This knowledge is germane to restoration efforts for the New England cottontail (Sylvilagus transitionalis), an early successional habitat specialist of conservation concern. We evaluated local population structure and measures of genetic diversity of a geographically isolated population of cottontails in the northeastern United States. We also conducted a multiscale landscape genetic analysis, in which we assessed genetic discontinuities relative to the landscape and developed several resistance models to test hypotheses about landscape features that promote or inhibit cottontail dispersal within and across the local populations. Bayesian clustering identified four genetically distinct populations, with very little migration among them, and additional substructure within one of those populations. These populations had private alleles, low genetic diversity, critically low effective population sizes (3.2-36.7), and evidence of recent genetic bottlenecks. Major highways and a river were found to limit cottontail dispersal and to separate populations. The habitat along roadsides, railroad beds, and utility corridors, on the other hand, was found to facilitate cottontail movement among patches. The relative importance of dispersal barriers and facilitators on gene flow varied among populations in relation to landscape composition, demonstrating the complexity and context dependency of factors influencing gene flow and highlighting the importance of replication and scale in landscape genetic studies. Our findings provide information for the design of restoration landscapes for the New England cottontail and also highlight the dual influence of roads, as both barriers and facilitators of dispersal for an early successional habitat specialist in a fragmented landscape

Crossref

PubMed Central

UNH Scholars' Repository

Recovering complete and draft population genomes from metagenome datasets.

Author: Gilbert Jack A
Sangwan Naseer
Xia Fangfang
Publication venue: eScholarship, University of California
Publication date: 01/03/2016
Field of study

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution

Woods Hole Open Access Server

Springer - Publisher Connector

PubMed Central

eScholarship - University of California