8,346 research outputs found

    MAVID: Constrained ancestral alignment of multiple sequences

    Get PDF
    We describe a new global multiple alignment program capable of aligning a large number of genomic regions. Our progressive alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region which consists of 1.8Mb of human sequence and 20 orthologous regions in marsupials, birds, fish, and mammals. Finally, we describe two large MAVID alignments: an alignment of all the available HIV genomes and a multiple alignment of the entire human, mouse and rat genomes

    Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement

    Full text link
    Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. We describe a method to align two or more genomes that have undergone large-scale recombination, particularly genomes that have undergone substantial amounts of gene gain and loss (gene flux). The method utilizes a novel alignment objective score, referred to as a sum-of-pairs breakpoint score. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The progressive genome alignment algorithm demonstrates markedly improved accuracy over previous approaches in situations where genomes have undergone realistic amounts of genome rearrangement, gene gain, loss, and duplication. We apply the progressive genome alignment algorithm to a set of 23 completely sequenced genomes from the genera Escherichia, Shigella, and Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content conserved among all taxa and total unique content of 15.2Mbp. We document substantial population-level variability among these organisms driven by homologous recombination, gene gain, and gene loss. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve .Comment: Revision dated June 19, 200

    Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression.

    Get PDF
    While genetic variation at chromatin loops is relevant for human disease, the relationships between contact propensity (the probability that loci at loops physically interact), genetics, and gene regulation are unclear. We quantitatively interrogate these relationships by comparing Hi-C and molecular phenotype data across cell types and haplotypes. While chromatin loops consistently form across different cell types, they have subtle quantitative differences in contact frequency that are associated with larger changes in gene expression and H3K27ac. For the vast majority of loci with quantitative differences in contact frequency across haplotypes, the changes in magnitude are smaller than those across cell types; however, the proportional relationships between contact propensity, gene expression, and H3K27ac are consistent. These findings suggest that subtle changes in contact propensity have a biologically meaningful role in gene regulation and could be a mechanism by which regulatory genetic variants in loop anchors mediate effects on expression

    Sensitive Long-Indel-Aware Alignment of Sequencing Reads

    Full text link
    The tremdendous advances in high-throughput sequencing technologies have made population-scale sequencing as performed in the 1000 Genomes project and the Genome of the Netherlands project possible. Next-generation sequencing has allowed genom-wide discovery of variations beyond single-nucleotide polymorphisms (SNPs), in particular of structural variations (SVs) like deletions, insertions, duplications, translocations, inversions, and even more complex rearrangements. Here, we design a read aligner with special emphasis on the following properties: (1) high sensitivity, i.e. find all (reasonable) alignments; (2) ability to find (long) indels; (3) statistically sound alignment scores; and (4) runtime fast enough to be applied to whole genome data. We compare performance to BWA, bowtie2, stampy and find that our methods is especially advantageous on reads containing larger indels

    Comparative analyses of CTCF and BORIS occupancies uncover two distinct classes of CTCF binding genomic regions.

    Get PDF
    BackgroundCTCF and BORIS (CTCFL), two paralogous mammalian proteins sharing nearly identical DNA binding domains, are thought to function in a mutually exclusive manner in DNA binding and transcriptional regulation.ResultsHere we show that these two proteins co-occupy a specific subset of regulatory elements consisting of clustered CTCF binding motifs (termed 2xCTSes). BORIS occupancy at 2xCTSes is largely invariant in BORIS-positive cancer cells, with the genomic pattern recapitulating the germline-specific BORIS binding to chromatin. In contrast to the single-motif CTCF target sites (1xCTSes), the 2xCTS elements are preferentially found at active promoters and enhancers, both in cancer and germ cells. 2xCTSes are also enriched in genomic regions that escape histone to protamine replacement in human and mouse sperm. Depletion of the BORIS gene leads to altered transcription of a large number of genes and the differentiation of K562 cells, while the ectopic expression of this CTCF paralog leads to specific changes in transcription in MCF7 cells.ConclusionsWe discover two functionally and structurally different classes of CTCF binding regions, 2xCTSes and 1xCTSes, revealed by their predisposition to bind BORIS. We propose that 2xCTSes play key roles in the transcriptional program of cancer and germ cells

    CTCF mediates chromatin looping via N-terminal domain-dependent cohesin retention

    Get PDF
    The DNA-binding protein CCCTC-binding factor (CTCF) and the cohesin complex function together to shape chromatin architecture in mammalian cells, but the molecular details of this process remain unclear. Here, we demonstrate that a 79-aa region within the CTCF N terminus is essential for cohesin positioning at CTCF binding sites and chromatin loop formation. However, the N terminus of CTCF fused to artificial zinc fingers was not sufficient to redirect cohesin to non-CTCF binding sites, indicating a lack of an autonomously functioning domain in CTCF responsible for cohesin positioning. BORIS (CTCFL), a germline-specific paralog of CTCF, was unable to anchor cohesin to CTCF DNA binding sites. Furthermore, CTCF-BORIS chimeric constructs provided evidence that, besides the N terminus of CTCF, the first two CTCF zinc fingers, and likely the 3D geometry of CTCF-DNA complexes, are also involved in cohesin retention. Based on this knowledge, we were able to convert BORIS into CTCF with respect to cohesin positioning, thus providing additional molecular details of the ability of CTCF to retain cohesin. Taken together, our data provide insight into the process by which DNA-bound CTCF constrains cohesin movement to shape spatiotemporal genome organization

    Differences in transcription between free-living and CO_2-activated third-stage larvae of Haemonchus contortus

    Get PDF
    Background: The disease caused by Haemonchus contortus, a blood-feeding nematode of small ruminants, is of major economic importance worldwide. The infective third-stage larva (L3) of this gastric nematode is enclosed in a cuticle (sheath) and, once ingested with herbage by the host, undergoes an exsheathment process that marks the transition from the free-living (L3) to the parasitic (xL3) stage. This study explored changes in gene transcription associated with this transition and predicted, based on comparative analysis, functional roles for key transcripts in the metabolic pathways linked to larval development. Results: Totals of 101,305 (L3) and 105,553 (xL3) expressed sequence tags (ESTs) were determined using 454 sequencing technology, and then assembled and annotated; the most abundant transcripts encoded transthyretin-like, calcium-binding EF-hand, NAD(P)-binding and nucleotide-binding proteins as well as homologues of Ancylostoma-secreted proteins (ASPs). Using an in silico-subtractive analysis, 560 and 685 sequences were shown to be uniquely represented in the L3 and xL3 stages, respectively; the transcripts encoded ribosomal proteins, collagens and elongation factors (in L3), and mainly peptidases and other enzymes of amino acid catabolism (in xL3). Caenorhabditis elegans orthologues of transcripts that were uniquely transcribed in each L3 and xL3 were predicted to interact with a total of 535 other genes, all of which were involved in embryonic development. Conclusion: The present study indicated that some key transcriptional alterations taking place during the transition from the L3 to the xL3 stage of H. contortus involve genes predicted to be linked to the development of neuronal tissue (L3 and xL3), formation of the cuticle (L3) and digestion of host haemoglobin (xL3). Future efforts using next-generation sequencing and bioinformatic technologies should provide the efficiency and depth of coverage required for the determination of the complete transcriptomes of different developmental stages and/or tissues of H. contortus as well as the genome of this important parasitic nematode. Such advances should lead to a significantly improved understanding of the molecular biology of H. contortus and, from an applied perspective, to novel methods of intervention
    • …
    corecore