133 research outputs found

    Pybedtools: a flexible Python library for manipulating genomic datasets and annotations

    Get PDF
    Summary: pybedtools is a flexible Python software library for manipulating and exploring genomic datasets in many common formats. It provides an intuitive Python interface that extends upon the popular BEDTools genome arithmetic tools. The library is well documented and efficient, and allows researchers to quickly develop simple, yet powerful scripts that enable complex genomic analyses

    Dose–Sensitivity, Conserved Non-Coding Sequences, and Duplicate Gene Retention through Multiple Tetraploidies in the Grasses

    Get PDF
    Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein–protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein–protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose–sensitive protein–DNA interactions between the regulatory regions of CNS-rich genes – nicknamed bigfoot genes – and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy

    Automated Conserved Non-Coding Sequence (CNS) Discovery Reveals Differences in Gene Content and Promoter Evolution among Grasses

    Get PDF
    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by \u3e12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize

    Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels

    Get PDF
    Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variants and annotation fields into a compressed archive that can be used for rapid variant annotation and filtering. Most variants, represented by chromosome, position and alleles are encoded into 32-bits-half the size of previous encoding schemes and at least 4 times smaller than a naive encoding. The annotations, stored separately within the same archive, are also encoded and compressed. We show that echtvar is faster and uses less space than existing tools and that it can effectively reduce the number of candidate variants. We give examples on germ-line and somatic variants to document how echtvar can facilitate exploratory data analysis on genetic variants. Echtvar is available at https://github.com/brentp/echtvar under an MIT license

    Inhibition of angiogenesis and suppression of colorectal cancer metastatic to the liver using the Sleeping Beauty Transposon System

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metastatic colon cancer is one of the leading causes of cancer-related death worldwide, with disease progression and metastatic spread being closely associated with angiogenesis. We investigated whether an antiangiogenic gene transfer approach using the <it>Sleeping Beauty </it>(SB) transposon system could be used to inhibit growth of colorectal tumors metastatic to the liver.</p> <p>Results</p> <p>Liver CT26 tumor-bearing mice were hydrodynamically injected with different doses of a plasmid containing a transposon encoding an angiostatin-endostatin fusion gene (Statin AE) along with varying amounts of SB transposase-encoding plasmid. Animals that were injected with a low dose (10 ÎĽg) of Statin AE transposon plasmid showed a significant decrease in tumor formation only when co-injected with SB transposase-encoding plasmid, while for animals injected with a higher dose (25 ÎĽg) of Statin AE transposon, co-injection of SB transposase-encoding plasmid did not significantly affect tumor load. For animals injected with 10 ÎĽg Statin AE transposon plasmid, the number of tumor nodules was inversely proportional to the amount of co-injected SB plasmid. Suppression of metastases was further evident in histological analyses, in which untreated animals showed higher levels of tumor cell proliferation and tumor vascularization than animals treated with low dose transposon plasmid.</p> <p>Conclusion</p> <p>These results demonstrate that hepatic colorectal metastases can be reduced using antiangiogenic transposons, and provide evidence for the importance of the transposition process in mediating suppression of these tumors.</p

    Transposed Genes in Arabidopsis Are Often Associated with Flanking Repeats

    Get PDF
    Much of the eukaryotic genome is known to be mobile, largely due to the movement of transposons and other parasitic elements. Recent work in plants and Drosophila suggests that mobility is also a feature of many nontransposon genes and gene families. Indeed, analysis of the Arabidopsis genome suggested that as many as half of all genes had moved to unlinked positions since Arabidopsis diverged from papaya roughly 72 million years ago, and that these mobile genes tend to fall into distinct gene families. However, the mechanism by which single gene transposition occurred was not deduced. By comparing two closely related species, Arabidopsis thaliana and Arabidopsis lyrata, we sought to determine the nature of gene transposition in Arabidopsis. We found that certain categories of genes are much more likely to have transposed than others, and that many of these transposed genes are flanked by direct repeat sequence that was homologous to sequence within the orthologous target site in A. lyrata and which was predominantly genic in identity. We suggest that intrachromosomal recombination between tandemly duplicated sequences, and subsequent insertion of the circular product, is the predominant mechanism of gene transposition

    BioStar: An Online Question & Answer Resource for the Bioinformatics Community

    Get PDF
    Parnell, Laurence D. et al.Although the era of big data has produced many bioinformatics tools and databases, using them effectively often requires specialized knowledge. Many groups lack bioinformatics expertise, and frequently find that software documentation is inadequate while local colleagues may be overburdened or unfamiliar with specific applications. Too often, such problems create data analysis bottlenecks that hinder the progress of biological research. In order to help address this deficiency, we present BioStar, a forum based on the Stack Exchange platform where experts and those seeking solutions to problems of computational biology exchange ideas. The main strengths of BioStar are its large and active group of knowledgeable users, rapid response times, clear organization of questions and responses that limit discussion to the topic at hand, and ranking of questions and answers that help identify their usefulness. These rankings, based on community votes, also contribute to a reputation score for each user, which serves to keep expert contributors engaged. The BioStar community has helped to answer over 2,300 questions from over 1,400 users (as of June 10, 2011), and has played a critical role in enabling and expediting many research projects. BioStar can be accessed at http://www.biostars.org/.This work was partially supported by NSF grants MCB-0618402 and CCF-0643529 (CAREER), NIH grants 1R55AI065507 – 01A2 and 1 R01 GM083113-01, NIH/NCRR grant number UL1RR033184, and FPI fellowship SAF-2007-63171/BES-2009-017731 from the Ministerio de Educación y Ciencia, Spain. These funders had no role in the design of BioStar, decision to publish, or preparation of the manuscript.Peer reviewe

    Combating subclonal evolution of resistant cancer phenotypes

    Get PDF
    Metastatic breast cancer remains challenging to treat, and most patients ultimately progress on therapy. This acquired drug resistance is largely due to drug-refractory sub-populations (subclones) within heterogeneous tumors. Here, we track the genetic and phenotypic subclonal evolution of four breast cancers through years of treatment to better understand how breast cancers become drug-resistant. Recurrently appearing post-chemotherapy mutations are rare. However, bulk and single-cell RNA sequencing reveal acquisition of malignant phenotypes after treatment, including enhanced mesenchymal and growth factor signaling, which may promote drug resistance, and decreased antigen presentation and TNF-α signaling, which may enable immune system avoidance. Some of these phenotypes pre-exist in pre-treatment subclones that become dominant after chemotherapy, indicating selection for resistance phenotypes. Post-chemotherapy cancer cells are effectively treated with drugs targeting acquired phenotypes. These findings highlight cancer's ability to evolve phenotypically and suggest a phenotype-targeted treatment strategy that adapts to cancer as it evolves
    • …
    corecore