584 research outputs found

    ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of <it>Drosophila melanogaster</it>.</p> <p>Results</p> <p>Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis.</p> <p>Conclusions</p> <p>Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis.</p

    Molecular phylogenetics: principles and practice

    Get PDF
    Phylogenies are important for addressing various biological questions such as relationships among species or genes, the origin and spread of viral infection and the demographic changes and migration patterns of species. The advancement of sequencing technologies has taken phylogenetic analysis to a new height. Phylogenies have permeated nearly every branch of biology, and the plethora of phylogenetic methods and software packages that are now available may seem daunting to an experimental biologist. Here, we review the major methods of phylogenetic analysis, including parsimony, distance, likelihood and Bayesian methods. We discuss their strengths and weaknesses and provide guidance for their use

    Phylogeography above the species level for perennial species in a composite genus

    Get PDF
    In phylogeography, DNA sequence and fingerprint data at the population level are used to infer evolutionary histories of species. Phylogeography above the species level is concerned with the genealogical aspects of divergent lineages. Here, we provide a phylogeographic study to examine the evolutionary history of a western Mediterranean composite, focusing on the perennial species of Helminthotheca (Asteraceae, Cichorieae). We used molecular markers (AFLP; ITS and plastid DNA sequences) to infer relationships among populations throughout the distributional range of the group. Interpretation is aided by biogeographic and molecular clock analyses. Four coherent entities are revealed by Bayesian mixture clustering of AFLP data, which correspond to taxa previously recognized at the rank of subspecies. The origin of the group was in western North Africa, from where it expanded across the Strait of Gibraltar to the Iberian Peninsula and across the Strait of Sicily to Sicily. Pleistocene lineage divergence is inferred within western North Africa as well as within the western Iberian region. The existence of the four entities as discrete evolutionary lineages suggests that they should be elevated to the rank of species, yielding H. aculeata, H. comosa, H. maroccana and H. spinosa, whereby the latter two necessitate new combinations

    The phylogenetic affinities of the extinct glyptodonts

    Get PDF
    Among the fossils of hitherto unknown mammals that Darwin collected in South America between 1832 and 1833 during the Beagle expedition [1] were examples of the large, heavily armored herbivores later known as glyptodonts. Ever since, glyptodonts have fascinated evolutionary biologists because of their remarkable skeletal adaptations and seemingly isolated phylogenetic position even within their natural group, the cingulate xenarthrans (armadillos and their allies [2]). In possessing a carapace comprised of fused osteoderms, the glyptodonts were clearly related to other cingulates, but their precise phylogenetic position as suggested by morphology remains unresolved [3,4]. To provide a molecular perspective on this issue, we designed sequence-capture baits using in silico reconstructed ancestral sequences and successfully assembled the complete mitochondrial genome of Doedicurus sp., one of the largest glyptodonts. Our phylogenetic reconstructions establish that glyptodonts are in fact deeply nested within the armadillo crown-group, representing a distinct subfamily (Glyptodontinae) within family Chlamyphoridae [5]. Molecular dating suggests that glyptodonts diverged no earlier than around 35 million years ago, in good agreement with their fossil record. Our results highlight the derived nature of the glyptodont morphotype, one aspect of which is a spectacular increase in body size until their extinction at the end of the last ice age.Facultad de Ciencias Naturales y Muse

    Optimal data partitioning, multispecies coalescent and Bayesian concordance analyses resolve early divergences of the grape family (Vitaceae)

    Get PDF
    Evolutionary rate heterogeneity and rapid radiations are common phenomena in organismal evolution and represent major challenges for reconstructing deep-level phylogenies. Here we detected substantial conflicts in and among data sets as well as uncertainty concerning relationships among lineages of Vitaceae from individual gene trees, supernetworks and tree certainty values. Congruent deep-level relationships of Vitaceae were retrieved by comprehensive comparisons of results from optimal partitioning analyses, multispecies coalescent approaches and the Bayesian concordance method. We found that partitioning schemes selected by PartitionFinder were preferred over those by gene or by codon position, and the unpartitioned model usually performed the worst. For a data set with conflicting signals, however, the unpartitioned model outperformed models that included more partitions, demonstrating some limitations to the effectiveness of concatenation for these data. For a transcriptome data set, fast coalescent methods (STAR and MP-EST) and a Bayesian concordance approach yielded congruent topologies with trees from the concatenated analyses and previous studies. Our results highlight that well-resolved gene trees are critical for the effectiveness of coalescent-based methods. Future efforts to improve the accuracy of phylogenomic analyses should emphasize the development of newmethods that can accommodate multiple biological processes and tolerate missing data while remaining computationally tractable. (C) The Willi Hennig Society 2017.National Natural Science Foundation of China [NNSF 31500179, 31590822, 31270268]; National Basic Research Program of China [2014CB954101]; National Science Foundation [DEB0743474]; Smithsonian Scholarly Studies Grant Program and the Endowment Grant Program; CAS/SAFEA International Partnership Program for Creative Research Teams; Laboratory of Analytical Biology of the National Museum of Natural History, Smithsonian Institution; Science and Technology Basic Work [2013FY112100]info:eu-repo/semantics/publishedVersio

    Comparing De Novo Genome Assembly: The Long and Short of It

    Get PDF
    Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers – both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies – are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing “next-generation” assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium

    Gene Expression Analysis Methods on Microarray Data a A Review

    Get PDF
    In recent years a new type of experiments are changing the way that biologists and other specialists analyze many problems. These are called high throughput experiments and the main difference with those that were performed some years ago is mainly in the quantity of the data obtained from them. Thanks to the technology known generically as microarrays, it is possible to study nowadays in a single experiment the behavior of all the genes of an organism under different conditions. The data generated by these experiments may consist from thousands to millions of variables and they pose many challenges to the scientists who have to analyze them. Many of these are of statistical nature and will be the center of this review. There are many types of microarrays which have been developed to answer different biological questions and some of them will be explained later. For the sake of simplicity we start with the most well known ones: expression microarrays

    Data structures and algorithms for analysis of alternative splicing with RNA-Seq data

    Get PDF
    corecore