3,261 research outputs found

    XWeB: the XML Warehouse Benchmark

    Full text link
    With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, important performance issues must be addressed. Performance is customarily assessed with the help of benchmarks. However, decision support benchmarks do not currently support XML features. In this paper, we introduce the XML Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from the relational decision support benchmark TPC-H. It is mainly composed of a test data warehouse that is based on a unified reference model for XML warehouses and that features XML-specific structures, and its associate XQuery decision support workload. XWeB's usage is illustrated by experiments on several XML database management systems

    Post-genomic structural analysis of single amino acid polymorphisms

    Get PDF
    Inherited genetic variation is critical in defining disease susceptibility. PDs, or pathogenic deviations, are mutations reported to be disease-causing, while SNPs, or single nucleotide polymorphisms, are understood to have a negligible effect on phenotype. With recent developments in biotechnology—most relevant being increased reliability and speed of sequencing—a wealth of information regarding SNPs and PDs has been acquired. Quite apart from the analytical challenge of analysing this information with a view to identifying novel therapies and targets for disease, the challenge of simply storing, mapping and processing these data is significant in itself. This thesis describes the development of a large-scale, automated pipeline that provides hypotheses as to what the structural effects of these genomic variations might be. This includes the development of nine new analyses. Eight of these new methods are structural, identifying mutations that disrupt various aspects of protein structure, including the interface, binding sites, folding mechanics and stability. The final new analysis is a novel method of identifying highly conserved residues from sequence. Here, the distribution of conservation scores from a multiple sequence alignment (MSA) is analysed to generate an MSA-specific threshold for high conservation. In order to construct MSAs for the sequence analysis, a novel method for identifying functionally equivalent proteins has been developed. Further, PDs and SNPs are characterised with respect to these structural analyses, and with respect to basic sequence and structural features. The findings support trends elsewhere in the literature: PDs are more often found in the core of proteins and at highly conserved sites; they most often affect the stability of protein structures; and they more often are between very different amino acids. In addition to the implications for disease therapies, these findings are informative in the more general context of protein structure

    Data DNA: The Next Generation of Statistical Metadata

    Get PDF
    Describes the components of a complete statistical metadata system and suggests ways to create and structure metadata for better access and understanding of data sets by diverse users

    Why and How to Benchmark XML Databases

    Get PDF
    Benchmarks belong to the very standard repertory of tools deployed in database development. Assessing the capabilities of a system, analyzing actual and potential bottlenecks, and, naturally, comparing the pros and cons of different systems architectures have become indispensable tasks as databases management systems grow in complexity and capacity. In the course of the development of XML databases the need for a benchmark framework has become more and more evident: a great many different ways to store XML data have been suggested in the past, each with its genuine advantages, disadvantages and consequences that propagate through the layers of a complex database system and need to be carefully considered. The different storage schemes render the query characteristics of the data variably different. However, no conclusive methodology for assessing these differences is available to date. In this paper, we outline desiderata for a benchmark for XML databases drawing from our own experience of developing an XML repository, involvement in the definition of the standard query language, and experience with standard benchmarks for relational databases

    Improving the resolution of interaction maps: A middleground between high-resolution complexes and genome-wide interactomes

    Get PDF
    Protein-protein interactions are ubiquitous in Biology and therefore central to understand living organisms. In recent years, large-scale studies have been undertaken to describe, at least partially, protein-protein interaction maps or interactomes for a number of relevant organisms including human. Although the analysis of interaction networks is proving useful, current interactomes provide a blurry and granular picture of the molecular machinery, i.e. unless the structure of the protein complex is known the molecular details of the interaction are missing and sometime is even not possible to know if the interaction between the proteins is direct, i.e. physical interaction or part of functional, not necessary, direct association. Unfortunately, the determination of the structure of protein complexes cannot keep pace with the discovery of new protein-protein interactions resulting in a large, and increasing, gap between the number of complexes that are thought to exist and the number for which 3D structures are available. The aim of the thesis was to tackle this problem by implementing computational approaches to derive structural models of protein complexes and thus reduce this existing gap. Over the course of the thesis, a novel modelling algorithm to predict the structure of protein complexes, V-D2OCK, was implemented. This new algorithm combines structure-based prediction of protein binding sites by means of a novel algorithm developed over the course of the thesis: VORFFIP and M-VORFFIP, data-driven docking and energy minimization. This algorithm was used to improve the coverage and structural content of the human interactome compiled from different sources of interactomic data to ensure the most comprehensive interactome. Finally, the human interactome and structural models were compiled in a database, V-D2OCK DB, that offers an easy and user-friendly access to the human interactome including a bespoken graphical molecular viewer to facilitate the analysis of the structural models of protein complexes. Furthermore, new organisms, in addition to human, were included providing a useful resource for the study of all known interactomes

    Why and How to Benchmark XML Databases

    Get PDF
    Benchmarks belong to the very standard repertory of tools deployed in database development. Assessing the capabilities of a system, analyzing actual and potential bottlenecks, and, naturally, comparing the pros and cons of different systems architectures have become indispensable tasks as databases management systems grow in complexity and capacity. In the course of the development of XML databases the need for a benchmark framework has become more and more evident: a great many different ways to store XML data have been suggested in the past, each with its genuine advantages, disadvantages and consequences that propagate through the layers of a complex database system and need to be carefully considered. The different storage schemes render the query characteristics of the data variably different. However, no conclusive methodology for assessing these differences is available to date. In this paper, we outline desiderata for a benchmark for XML databases drawing from our own experience of developing an XML repository, involvement in the definition of the standard query language, and experience with standard benchmarks for relational databases
    corecore