1,171 research outputs found

    Virtual Mutation Analysis of Relational Database Schemas

    Get PDF
    Relational databases are a vital component of many modern soft- ware applications. Key to the definition of the database schema — which specifies what types of data will be stored in the database and the structure in which the data is to be organized — are integrity constraints. Integrity constraints are conditions that protect and preserve the consistency and validity of data in the database, preventing data values that violate their rules from being admitted into database tables. They encode logic about the application concerned, and like any other component of a software application, need to be properly tested. Mutation analysis is a technique that has been successfully applied to integrity constraint testing, seeding database schema faults of both omission and commission. Yet, as for traditional mutation analysis for program testing, it is costly to perform, since the test suite under analysis needs to be run against each individual mutant to establish whether or not it exposes the fault. One overhead incurred by database schema mutation is the cost of communicating with the database management system (DBMS). In this paper, we seek to eliminate this cost by performing mutation analysis virtually on a local model of the DBMS, rather than on an actual, running instance hosting a real database. We present an empirical evaluation of our virtual technique revealing that, across all of the studied DBMSs and schemas, the virtual method yields an average time saving of 51% over the baseline

    SchemaAnalyst: Search-Based Test Data Generation for Relational Database Schemas

    Get PDF
    Data stored in relational databases plays a vital role in many aspects of society. When this data is incorrect, the services that depend on it may be compromised. The database schema is the artefact responsible for maintaining the integrity of stored data. Because of its critical function, the proper testing of the database schema is a task of great importance. Employing a search-based approach to generate high-quality test data for database schemas, SchemaAnalyst is a tool that supports testing this key software component. This presented tool is extensible and includes both an evaluation framework for assessing the quality of the generated tests and full-featured documentation. In addition to describing the design and implementation of SchemaAnalyst and overviewing its efficiency and effectiveness, this paper coincides with the tool’s public release, thereby enhancing practitioners’ ability to test relational database schemas

    mrstudyr: Retrospectively Studying the Effectiveness of Mutant Reduction Techniques

    Get PDF
    Mutation testing is a well-known method for measuring a test suite’s quality. However, due to its computational expense and intrinsic difficulties (e.g., detecting equivalent mutants and potentially checking a mutant’s status for each test), mutation testing is often challenging to practically use. To control the computational cost of mutation testing, many reduction strategies have been proposed (e.g., uniform random sampling over mutants). Yet, a stand-alone tool to compare the efficiency and effectiveness of these methods is heretofore unavailable. Since existing mutation testing tools are often complex and languagedependent, this paper presents a tool, called mrstudyr, that enables the “retrospective” study of mutant reduction methods using the data collected from a prior analysis of all mutants. Focusing on the mutation operators and the mutants that they produce, the presented tool allows developers to prototype and evaluate mutant reducers without being burdened by the implementation details of mutation testing tools. Along with describing mrstudyr’s design and overviewing the experimental results from using it, this paper inaugurates the public release of this open-source tool

    Data access and integration in the ISPIDER proteomics grid

    Get PDF
    Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources

    Automatic detection and removal of ineffective mutants for the mutation analysis of relational database schemas

    Get PDF
    Data is one of an organization’s most valuable and strategic assets. Testing the relational database schema, which protects the integrity of this data, is of paramount importance. Mutation analysis is a means of estimating the fault-finding “strength” of a test suite. As with program mutation, however, relational database schema mutation results in many “ineffective” mutants that both degrade test suite quality estimates and make mutation analysis more time consuming. This paper presents a taxonomy of ineffective mutants for relational database schemas, summarizing the root causes of ineffectiveness with a series of key patterns evident in database schemas. On the basis of these, we introduce algorithms that automatically detect and remove ineffective mutants. In an experimental study involving the mutation analysis of 34 schemas used with three popular relational database management systems—HyperSQL, PostgreSQL, and SQLite—the results show that our algorithms can identify and discard large numbers of ineffective mutants that can account for up to 24% of mutants, leading to a change in mutation score for 33 out of 34 schemas. The tests for seven schemas were found to achieve 100% scores, indicating that they were capable of detecting and killing all non-equivalent mutants. The results also reveal that the execution cost of mutation analysis may be significantly reduced, especially with “heavyweight” DBMSs like PostgreSQL

    Heterogeneous biomedical database integration using a hybrid strategy: a p53 cancer research database.

    Get PDF
    Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.)

    XML-based approaches for the integration of heterogeneous bio-molecular data

    Get PDF
    Background: The today's public database infrastructure spans a very large collection of heterogeneous biological data, opening new opportunities for molecular biology, bio-medical and bioinformatics research, but raising also new problems for their integration and computational processing. Results: In this paper we survey the most interesting and novel approaches for the representation, integration and management of different kinds of biological data by exploiting XML and the related recommendations and approaches. Moreover, we present new and interesting cutting edge approaches for the appropriate management of heterogeneous biological data represented through XML. Conclusion: XML has succeeded in the integration of heterogeneous biomolecular information, and has established itself as the syntactic glue for biological data sources. Nevertheless, a large variety of XML-based data formats have been proposed, thus resulting in a difficult effective integration of bioinformatics data schemes. The adoption of a few semantic-rich standard formats is urgent to achieve a seamless integration of the current biological resources. </p

    Mutation Analysis of Relational Database Schemas

    Get PDF
    The schema is the key artefact used to describe the structure of a relational database, specifying how data will be stored and the integrity constraints used to ensure it is valid. It is therefore surprising that to date little work has addressed the problem of schema testing, which aims to identify mistakes in the schema early in software development. Failure to do so may lead to critical faults, which may cause data loss or degradation of data quality, remaining undetected until later when they will prove much more costly to fix. This thesis explores how mutation analysis – a technique commonly used in software testing to evaluate test suite quality – can be applied to evaluate data generated to exercise the integrity constraints of a relational database schema. By injecting faults into the constraints, modelling both faults of omission and commission, this enables the fault-finding capability of test suites generated by different techniques to be compared. This is essential to empirically evaluate further schema testing research, providing a means of assessing the effectiveness of proposed techniques. To mutate the integrity constraints of a schema, a collection of novel mutation operators are proposed and implementation described. These allow an empirical evaluation of an existing data generation approach, demonstrating the effectiveness of the mutation analysis technique and identifying a configuration that killed 94% of mutants on average. Cost-effective algorithms for automatically removing equivalent mutants and other ineffective mutants are then proposed and evaluated, revealing a third of mutation scores to be mutation adequate and reducing time taken by an average of 7%. Finally, the execution cost problem is confronted, with a range of optimisation strategies being applied that consistently improve efficiency, reducing the time taken by several hours in the best case and as high as 99% on average for one DBMS

    A cooperative framework for molecular biology database integration using image object selection

    Get PDF
    The theme and the concept of 'Molecular Biology Database Integration' and the problems associated with this concept initiated the idea for this Ph.D research. The available technologies facilitate to analyse the data independently and discretely but it fails to integrate the data resources for more meaningful information. This along with the integration issues created the scope for this Ph.D research. The research has reviewed the 'database interoperability' problems and it has suggested a framework for integrating the molecular biology databases. The framework has proposed to develop a cooperative environment to share information on the basis of common purpose for the molecular biology databases. The research has also reviewed other implementation and interoperability issues for laboratory based, dedicated and target specific database. The research has addressed the following issues: diversity of molecular biology databases schemas, schema constructs and schema implementation multi-database query using image object keying, database integration technologies using context graph, automated navigation among these databases. This thesis has introduced a new approach for database implementation. It has introduced an interoperable component database concept to initiate multidatabase query on gene mutation data. A number of data models have been proposed for gene mutation data which is the basis for integrating the target specific component database to be integrated with the federated information system. The proposed data models are: data models for genetic trait analysis, classification of gene mutation data, pathological lesion data and laboratory data. The main feature of this component database is non-overlapping attributes and it will follow non-redundant integration approach as explained in the thesis. This will be achieved by storing attributes which will not have the union or intersection of any attributes that exist in public domain molecular biology databases. Unlike data warehousing technique, this feature is quite unique and novel. The component database will be integrated with other biological data sources for sharing information in a cooperative environment. This involves developing new tools. The thesis explains the role of these new tools which are: meta data extractor, mapping linker, query generator and result interpreter. These tools are used for a transparent integration without creating any global schema of the participating databases. The thesis has also established the concept of image object keying for multidatabase query and it has proposed a relevant algorithm for matching protein spot in gel electrophoresis image. An object spot in gel electrophoresis image will initiate the query when it is selected by the user. It matches the selected spot with other similar spots in other resource databases. This image object keying method is an alternative to conventional multidatabase query which requires writing complex SQL scripts. This method also resolve the semantic conflicts that exist among molecular biology databases. The research has proposed a new framework based on the context of the web data for interactions with different biological data resources. A formal description of the resource context is described in the thesis. The implementation of the context into Resource Document Framework (RDF) will be able to increase the interoperability by providing the description of the resources and the navigation plan for accessing the web based databases. A higher level construct is developed (has, provide and access) to implement the context into RDF for web interactions. The interactions within the resources are achieved by utilising an integration domain to extract the required information with a single instance and without writing any query scripts. The integration domain allows to navigate and to execute the query plan within the resource databases. An extractor module collects elements from different target webs and unify them as a whole object in a single page. The proposed framework is tested to find specific information e.g., information on Alzheimer's disease, from public domain biology resources, such as, Protein Data Bank, Genome Data Bank, Online Mendalian Inheritance in Man and local database. Finally, the thesis proposes further propositions and plans for future work
    • …
    corecore