51 research outputs found

    New tools and methods for direct programmatic access to the dbSNP relational database

    Get PDF
    Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale

    Automated Querying of Genome Databases

    Get PDF

    In silico characterization of structural and functional impact of the deleterious SNPs on FSHR gene

    Get PDF
    492-499FSHR is an important gene which plays a major role in the development of secondary sex characteristics and influences the female reproductive cycle by regulating the Follicle Stimulating Hormone. Though this gene and its protein are extensively studied, no attempts have been made yet to methodically analyze the variants in this gene. One of the chief objectives during the analysis of human genetic variation is to distinguish between the Single Nucleotide Polymorphisms (SNPs) that are functionally neutral from those that contribute to the disorder. To predict the possible impact of SNPs on the FSHR structure and function, data were obtained from NCBI (dbSNP and dbVar) and validated manually. Various bioinformatics tools were used to predict the alterations at transcriptional, post transcriptional stages and protein interaction. Around 38 variants reported by NCBI Variation Viewer were sorted by SIFT and 14 of them were reported damaging, 13 were reported to be either benign or damaging by PROVEAN and Panther. From these 13 SNPs, the most damaging (11 SNPs) were modeled using Pymol and the energy difference between the native and mutated structure was calculated by Swiss PDB – Viewer. Based on our analysis, we have reported potential candidate SNPs for the FSHR gene involved in the regulation of ovarian pathophysiology

    In silico characterization of structural and functional impact of the deleterious SNPs on FSHR gene

    Get PDF
    FSHR is an important gene which plays a major role in the development of secondary sex characteristics and influences the female reproductive cycle by regulating the Follicle Stimulating Hormone. Though this gene and its protein are extensively studied, no attempts have been made yet to methodically analyze the variants in this gene. One of the chief objectives during the analysis of human genetic variation is to distinguish between the Single Nucleotide Polymorphisms (SNPs) that are functionally neutral from those that contribute to the disorder. To predict the possible impact of SNPs on the FSHR structure and function, data were obtained from NCBI (dbSNP and dbVar) and validated manually. Various bioinformatics tools were used to predict the alterations at transcriptional, post transcriptional stages and protein interaction. Around 38 variants reported by NCBI Variation Viewer were sorted by SIFT and 14 of them were reported damaging, 13 were reported to be either benign or damaging by PROVEAN and Panther. From these 13 SNPs, the most damaging (11 SNPs) were modeled using Pymol and the energy difference between the native and mutated structure was calculated by Swiss PDB – Viewer. Based on our analysis, we have reported potential candidate SNPs for the FSHR gene involved in the regulation of ovarian pathophysiology

    GWASdb: a database for human genetic variants identified by genome-wide association studies

    Get PDF
    Recent advances in genome-wide association studies (GWAS) have enabled us to identify thousands of genetic variants (GVs) that are associated with human diseases. As next-generation sequencing technologies become less expensive, more GVs will be discovered in the near future. Existing databases, such as NHGRI GWAS Catalog, collect GVs with only genome-wide level significance. However, many true disease susceptibility loci have relatively moderate P values and are not included in these databases. We have developed GWASdb that contains 20 times more data than the GWAS Catalog and includes less significant GVs (P < 1.0 × 10−3) manually curated from the literature. In addition, GWASdb provides comprehensive functional annotations for each GV, including genomic mapping information, regulatory effects (transcription factor binding sites, microRNA target sites and splicing sites), amino acid substitutions, evolution, gene expression and disease associations. Furthermore, GWASdb classifies these GVs according to diseases using Disease-Ontology Lite and Human Phenotype Ontology. It can conduct pathway enrichment and PPI network association analysis for these diseases. GWASdb provides an intuitive, multifunctional database for biologists and clinicians to explore GVs and their functional inferences. It is freely available at http://jjwanglab.org/gwasdb and will be updated frequently

    An Application for Downloading and Integrating Molecular Biology Data

    Get PDF
    Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Master of Sciences in the School of Informatics Indiana University July 2004Integrating large volumes of data from diverse sources is a formidable challenge for many investigators in the field of molecular biology. Developing efficient methods for accessing and integrating this data is a major focus of investigation in the field of bioinformatics. In early 2003, the Hereditary Genomics division of the department of Medical and Molecular Genetics at IUPUI recognized the need for a software application that would automate many of the manual processes that were being used to obtain data for their research. The two primary objectives for this project were: 1) an application that would provide large-scale, integrated output tables to help answer questions that frequently arose in the course of their research, and 2) a graphic user interface (GUI) that would minimize or eliminate the need for technical expertise in computer programming or database operations on the part of the end-users. In early 2003, Indiana University (IU), IBM, and the Indiana Genomics Initiative (INGEN) introduced a new resource called Centralized Life Sciences Data Services (CLSD). CLSD is a centralized data repository that provides programmatic access to biological data that is collected and integrated from multiple public, online databases. METHODS 1. an in-depth analysis was conducted to assess the department's data requirements and map these requirements to the data available at CLSD 2. CLSD incorporated new data as necessary 3. SQL was written to generate tables that would replace the targeted manual processes 4. a DB2 client was installed in Medical and Molecular Genetics to establish remote access to CLSD 5. a graphic user interface (GUI) was designed and implemented in HTML/CGI 6. a PERL program was written to accept parameters from the web input form, submit queries to CLSD, and generate HTML-based output tables 7. validation, updates, and maintenance procedures were conducted after early prototype implementation RESULTS AND CONCLUSIONS This application resulted in a substantial increase in efficiency over the manual methods that were previously used for data collection. The application also allows research teams to update their data much more frequently. A high level of accuracy in the output tables was confirmed by a thorough validation process

    SPOT: a web-based tool for using biological databases to prioritize SNPs after a genome-wide association study

    Get PDF
    SPOT (http://spot.cgsmd.isi.edu), the SNP prioritization online tool, is a web site for integrating biological databases into the prioritization of single nucleotide polymorphisms (SNPs) for further study after a genome-wide association study (GWAS). Typically, the next step after a GWAS is to genotype the top signals in an independent replication sample. Investigators will often incorporate information from biological databases so that biologically relevant SNPs, such as those in genes related to the phenotype or with potentially non-neutral effects on gene expression such as a splice sites, are given higher priority. We recently introduced the genomic information network (GIN) method for systematically implementing this kind of strategy. The SPOT web site allows users to upload a list of SNPs and GWAS P-values and returns a prioritized list of SNPs using the GIN method. Users can specify candidate genes or genomic regions with custom levels of prioritization. The results can be downloaded or viewed in the browser where users can interactively explore the details of each SNP, including graphical representations of the GIN method. For investigators interested in incorporating biological databases into a post-GWAS SNP selection strategy, the SPOT web tool is an easily implemented and flexible solution

    Ensembl Genomes: Extending Ensembl across the taxonomic space

    Get PDF
    Ensembl Genomes (http://www.ensemblgenomes.org) is a new portal offering integrated access to genome-scale data from non-vertebrate species of scientific interest, developed using the Ensembl genome annotation and visualisation platform. Ensembl Genomes consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the availability of vertebrate genomes in Ensembl. Many of the databases supporting the portal have been built in close collaboration with the scientific community, which we consider as essential for maintaining the accuracy and usefulness of the resource. A common set of user interfaces (which include a graphical genome browser, FTP, BLAST search, a query optimised data warehouse, programmatic access, and a Perl API) is provided for all domains. Data types incorporated include annotation of (protein and non-protein coding) genes, cross references to external resources, and high throughput experimental data (e.g. data from large scale studies of gene expression and polymorphism visualised in their genomic context). Additionally, extensive comparative analysis has been performed, both within defined clades and across the wider taxonomy, and sequence alignments and gene trees resulting from this can be accessed through the site

    Bioinformatic Investigations Into the Genetic Architecture of Renal Disorders

    Get PDF
    Modern genomic analysis has a significant bioinformatic component due to the high volume of complex data that is involved. During investigations into the genetic components of two renal diseases, we developed two software tools. // Genome-Wide Association Studies (GWAS) datasets may be genotyped on different microarrays and subject to different annotation, leading to a mosaic case-control cohort that has inherent errors, primarily due to strand mismatching. Our software REMEDY seeks to detect and correct strand designation of input datasets, as well as filtering for common sources of noise such as structural and multi-allelic variants. We performed a GWAS on a large cohort of Steroid-sensitive nephrotic syndrome samples; the mosaic input datasets were pre-processed with REMEDY prior to merging and analysis. Our results show that REMEDY significantly reduced noise in GWAS output results. REMEDY outperforms existing software as it has significantly more features available such as auto-strand designation detection, comprehensive variant filtering and high-speed variant matching to dbSNP. // The second tool supported the analysis of a newly characterised rare renal disorder: Polycystic kidney disease with hyperinsulinemic hypoglycemia (HIPKD). Identification of the underlying genetic cause led to the hypothesis that a change in chromatin looping at a specific locus affected the aetiology of the disease. We developed LOOPER, a software suite capable of predicting chromatin loops from ChIP-Seq data to explore the possible conformations of chromatin architecture in the HIPKD genomic region. LOOPER predicted several interesting functional and structural loops that supported our hypothesis. We then extended LOOPER to visualise ChIA-PET and ChIP-Seq data as a force-directed graph to show experimental structural and functional chromatin interactions. Next, we re-analysed the HIPKD region with LOOPER to show experimentally validated chromatin interactions. We first confirmed our original predicted loops and subsequently discovered that the local genomic region has many more chromatin features than first thought
    corecore