24 research outputs found

    BioMart – biological queries made easy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biologists need to perform complex queries, often across a variety of databases. Typically, each data resource provides an advanced query interface, each of which must be learnt by the biologist before they can begin to query them. Frequently, more than one data source is required and for high-throughput analysis, cutting and pasting results between websites is certainly very time consuming. Therefore, many groups rely on local bioinformatics support to process queries by accessing the resource's programmatic interfaces if they exist. This is not an efficient solution in terms of cost and time. Instead, it would be better if the biologist only had to learn one generic interface. BioMart provides such a solution.</p> <p>Results</p> <p>BioMart enables scientists to perform advanced querying of biological data sources through a single web interface. The power of the system comes from integrated querying of data sources regardless of their geographical locations. Once these queries have been defined, they may be automated with its "scripting at the click of a button" functionality. BioMart's capabilities are extended by integration with several widely used software packages such as BioConductor, DAS, Galaxy, Cytoscape, Taverna. In this paper, we describe all aspects of BioMart from a user's perspective and demonstrate how it can be used to solve real biological use cases such as SNP selection for candidate gene screening or annotation of microarray results.</p> <p>Conclusion</p> <p>BioMart is an easy to use, generic and scalable system and therefore, has become an integral part of large data resources including Ensembl, UniProt, HapMap, Wormbase, Gramene, Dictybase, PRIDE, MSD and Reactome. BioMart is freely accessible to use at <url>http://www.biomart.org</url>.</p

    A Genome-Wide Analysis of Open Chromatin in Human Epididymis Epithelial Cells Reveals Candidate Regulatory Elements for Genes Coordinating Epididymal Function1

    Get PDF
    The epithelium lining the epididymis has a pivotal role in ensuring a luminal environment that can support normal sperm maturation. Many of the individual genes that encode proteins involved in establishing the epididymal luminal fluid are well characterized. They include ion channels, ion exchangers, transporters, and solute carriers. However, the molecular mechanisms that coordinate expression of these genes and modulate their activities in response to biological stimuli are less well understood. To identify cis-regulatory elements for genes expressed in human epididymis epithelial cells, we generated genome-wide maps of open chromatin by DNase-seq. This analysis identified 33 542 epididymis-selective DNase I hypersensitive sites (DHS), which were not evident in five cell types of different lineages. Identification of genes with epididymis-selective DHS at their promoters revealed gene pathways that are active in immature epididymis epithelial cells. These include processes correlating with epithelial function and also others with specific roles in the epididymis, including retinol metabolism and ascorbate and aldarate metabolism. Peaks of epididymis-selective chromatin were seen in the androgen receptor gene and the cystic fibrosis transmembrane conductance regulator (CFTR) gene, which has a critical role in regulating ion transport across the epididymis epithelium. In silico prediction of transcription factor binding sites that were overrepresented in epididymis-selective DHS identified epithelial transcription factors, including ELF5 and ELF3, the androgen receptor, Pax2, and Sox9, as components of epididymis transcriptional networks. Active genes, which are targets of each transcription factor, reveal important biological processes in the epididymis epithelium

    A genome-wide analysis of open chromatin in human tracheal epithelial cells reveals novel candidate regulatory elements for lung function

    Get PDF
    Distal cell-type-specific regulatory elements may be located at very large distances from the genes that they control and are often hidden within intergenic regions or in introns of other genes. The development of methods that enable mapping of regions of open chromatin genome wide has greatly advanced the identification and characterisation of these elements

    Extensive Evolutionary Changes in Regulatory Element Activity during Human Origins Are Associated with Altered Gene Expression and Positive Selection

    Get PDF
    Understanding the molecular basis for phenotypic differences between humans and other primates remains an outstanding challenge. Mutations in non-coding regulatory DNA that alter gene expression have been hypothesized as a key driver of these phenotypic differences. This has been supported by differential gene expression analyses in general, but not by the identification of specific regulatory elements responsible for changes in transcription and phenotype. To identify the genetic source of regulatory differences, we mapped DNaseI hypersensitive (DHS) sites, which mark all types of active gene regulatory elements, genome-wide in the same cell type isolated from human, chimpanzee, and macaque. Most DHS sites were conserved among all three species, as expected based on their central role in regulating transcription. However, we found evidence that several hundred DHS sites were gained or lost on the lineages leading to modern human and chimpanzee. Species-specific DHS site gains are enriched near differentially expressed genes, are positively correlated with increased transcription, show evidence of branch-specific positive selection, and overlap with active chromatin marks. Species-specific sequence differences in transcription factor motifs found within these DHS sites are linked with species-specific changes in chromatin accessibility. Together, these indicate that the regulatory elements identified here are genetic contributors to transcriptional and phenotypic differences among primate species

    The accessible chromatin landscape of the human genome

    Get PDF
    DNaseI hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers, and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ~2.9 million DHSs that encompass virtually all known experimentally-validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation, and regulatory factor occupancy patterns. We connect ~580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is choreographed with dozens to hundreds of co-activated elements, and the trans-cellular DNaseI sensitivity pattern at a given region can predict cell type-specific functional behaviors. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    A Genome-Wide Analysis of Open Chromatin in Human Epididymis Epithelial Cells Reveals Candidate Regulatory Elements for Genes Coordinating Epididymal Function1

    No full text
    The epithelium lining the epididymis has a pivotal role in ensuring a luminal environment that can support normal sperm maturation. Many of the individual genes that encode proteins involved in establishing the epididymal luminal fluid are well characterized. They include ion channels, ion exchangers, transporters, and solute carriers. However, the molecular mechanisms that coordinate expression of these genes and modulate their activities in response to biological stimuli are less well understood. To identify cis-regulatory elements for genes expressed in human epididymis epithelial cells, we generated genome-wide maps of open chromatin by DNase-seq. This analysis identified 33 542 epididymis-selective DNase I hypersensitive sites (DHS), which were not evident in five cell types of different lineages. Identification of genes with epididymis-selective DHS at their promoters revealed gene pathways that are active in immature epididymis epithelial cells. These include processes correlating with epithelial function and also others with specific roles in the epididymis, including retinol metabolism and ascorbate and aldarate metabolism. Peaks of epididymis-selective chromatin were seen in the androgen receptor gene and the cystic fibrosis transmembrane conductance regulator (CFTR) gene, which has a critical role in regulating ion transport across the epididymis epithelium. In silico prediction of transcription factor binding sites that were overrepresented in epididymis-selective DHS identified epithelial transcription factors, including ELF5 and ELF3, the androgen receptor, Pax2, and Sox9, as components of epididymis transcriptional networks. Active genes, which are targets of each transcription factor, reveal important biological processes in the epididymis epithelium

    EnsMart: A Generic System for Fast and Flexible Access to Biological Data

    No full text
    The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. The system consists of a query-optimized database and interactive, user-friendly interfaces. EnsMart has been applied to Ensembl, where it extends its genomic browser capabilities, facilitating rapid retrieval of customized data sets. A wide variety of complex queries, on various types of annotations, for numerous species are supported. These can be applied to many research problems, ranging from SNP selection for candidate gene screening, through cross-species evolutionary comparisons, to microarray annotation. Users can group and refine biological data according to many criteria, including cross-species analyses, disease links, sequence variations, and expression patterns. Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. A wide range of sequence types, such as cDNA, peptides, coding regions, UTRs, and exons, with additional upstream and downstream regions, can be retrieved. The EnsMart database can be accessed via a public Web site, or through a Java application suite. Both implementations and the database are freely available for local installation, and can be extended or adapted to `non-Ensembl' data sets
    corecore