63 research outputs found

    Using BioMart as a framework to manage and query pancreatic cancer data

    Get PDF
    We describe the Pancreatic Expression Database (PED), the first cancer database originally designed based on the BioMart infrastructure. The PED portal brings together multidimensional pancreatic cancer data from the literature including genomic, proteomic, miRNA and gene expression profiles. Based on the BioMart 0.7 framework, the database is easily integrated with other BioMart-compliant resources, such as Ensembl and Reactome, to give access to a wide range of annotations alongside detailed experimental conditions. This article is intended to give an overview of PED, describe its data content and work through examples of how to successfully mine and integrate pancreatic cancer data sets and other BioMart resources

    Pancreatic Expression database: a generic model for the organization, integration and mining of complex cancer datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Pancreatic cancer is the 5th leading cause of cancer death in both males and females. In recent years, a wealth of gene and protein expression studies have been published broadening our understanding of pancreatic cancer biology. Due to the explosive growth in publicly available data from multiple different sources it is becoming increasingly difficult for individual researchers to integrate these into their current research programmes. The Pancreatic Expression database, a generic web-based system, is aiming to close this gap by providing the research community with an open access tool, not only to mine currently available pancreatic cancer data sets but also to include their own data in the database.</p> <p>Description</p> <p>Currently, the database holds 32 datasets comprising 7636 gene expression measurements extracted from 20 different published gene or protein expression studies from various pancreatic cancer types, pancreatic precursor lesions (PanINs) and chronic pancreatitis. The pancreatic data are stored in a data management system based on the BioMart technology alongside the human genome gene and protein annotations, sequence, homologue, SNP and antibody data. Interrogation of the database can be achieved through both a web-based query interface and through web services using combined criteria from pancreatic (disease stages, regulation, differential expression, expression, platform technology, publication) and/or public data (antibodies, genomic region, gene-related accessions, ontology, expression patterns, multi-species comparisons, protein data, SNPs). Thus, our database enables connections between otherwise disparate data sources and allows relatively simple navigation between all data types and annotations.</p> <p>Conclusion</p> <p>The database structure and content provides a powerful and high-speed data-mining tool for cancer research. It can be used for target discovery i.e. of biomarkers from body fluids, identification and analysis of genes associated with the progression of cancer, cross-platform meta-analysis, SNP selection for pancreatic cancer association studies, cancer gene promoter analysis as well as mining cancer ontology information. The data model is generic and can be easily extended and applied to other types of cancer. The database is available online with no restrictions for the scientific community at <url>http://www.pancreasexpression.org/</url>.</p

    BioMart: a data federation framework for large collaborative projects

    Get PDF
    BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework. BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects between different research groups. BioMart contains several levels of query optimization to efficiently manage large data sets and offers a diverse selection of graphical user interfaces and application programming interfaces to ensure that queries can be performed in whatever manner is most convenient for the user. The software has now been adopted by a large number of different biological databases spanning a wide range of data types and providing a rich source of annotation available to bioinformaticians and biologists alike

    The BioMart community portal: an innovative alternative to large, centralized data repositories.

    Get PDF
    The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations

    Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated.</p> <p>Results</p> <p>We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes.</p> <p>Conclusions</p> <p>The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information.</p

    Bioinformatics analysis of mitochondrial disease

    Get PDF
    PhD thesisSeveral bioinformatic methods have been developed to aid the identification of novel nuclear-mitochondrial genes involved in disease. Previous research has aimed to increase the sensitivity and specificity of these predictions through a combination of available techniques. This investigation shows the optimum sensitivity and specificity can be achieved by carefully selecting seven specific classifiers in combination. The results also show that increasing the number of classifiers even further can paradoxically decrease the sensitivity and specificity of a prediction. Additionally, text mining applications are playing a huge role in disease candidate gene identification providing resources for interpreting the vast quantities of biomedical literature currently available. A workflow resource was developed identifying a number of genes potentially associated with Lebers Hereditary Optic Neuropathy (LHON). This included specific orthologues in mouse displaying a potential association to LHON not annotated as such in humans. Mitochondrial DNA (mtDNA) fragments have been transferred to the human nuclear genome over evolutionary time. These insertions were compared to an existing database of 263 mtDNA deletions to highlight any associated mechanisms governing DNA loss from mitochondria. Flanking regions were also screened within the nuclear genome that surrounded these insertions for transposable elements, GC content and mitochondrial genes. No obvious association was found relating NUMTs to mtDNA deletions. NUMTs do not appear to be distributed throughout the genome via transposition and integrate predominantly in areas of low %GC with low gene content. These areas also lacked evidence of an elevated number of surrounding nuclear-mitochondrial genes but a further genome-wide study is required
    • …
    corecore