3,446 research outputs found

    A Comprehensive Survey on Database Management System Fuzzing: Techniques, Taxonomy and Experimental Comparison

    Full text link
    Database Management System (DBMS) fuzzing is an automated testing technique aimed at detecting errors and vulnerabilities in DBMSs by generating, mutating, and executing test cases. It not only reduces the time and cost of manual testing but also enhances detection coverage, providing valuable assistance in developing commercial DBMSs. Existing fuzzing surveys mainly focus on general-purpose software. However, DBMSs are different from them in terms of internal structure, input/output, and test objectives, requiring specialized fuzzing strategies. Therefore, this paper focuses on DBMS fuzzing and provides a comprehensive review and comparison of the methods in this field. We first introduce the fundamental concepts. Then, we systematically define a general fuzzing procedure and decompose and categorize existing methods. Furthermore, we classify existing methods from the testing objective perspective, covering various components in DBMSs. For representative works, more detailed descriptions are provided to analyze their strengths and limitations. To objectively evaluate the performance of each method, we present an open-source DBMS fuzzing toolkit, OpenDBFuzz. Based on this toolkit, we conduct a detailed experimental comparative analysis of existing methods and finally discuss future research directions.Comment: 34 pages, 22 figure

    Automated Software Testing of Relational Database Schemas

    Get PDF
    Relational databases are critical for many software systems, holding the most valuable data for organisations. Data engineers build relational databases using schemas to specify the structure of the data within a database and defining integrity constraints. These constraints protect the data's consistency and coherency, leading industry experts to recommend testing them. Since manual schema testing is labour-intensive and error-prone, automated techniques enable the generation of test data. Although these generators are well-established and effective, they use default values and often produce many, long, and similar tests --- this results in decreasing fault detection and increasing regression testing time and testers inspection efforts. It raises the following questions: How effective is the optimised random generator at generating tests and its fault detection compared to prior methods? What factors make tests understandable for testers? How to reduce tests while maintaining effectiveness? How effectively do testers inspect differently reduced tests? To answer these questions, the first contribution of this thesis is to evaluate a new optimised random generator against well-established methods empirically. Secondly, identifying understandability factors of schema tests using a human study. Thirdly, evaluating a novel approach that reduces and merge tests against traditional reduction methods. Finally, studying testers' inspection efforts with differently reduced tests using a human study. The results show that the optimised random method efficiently generates effective tests compared to well-established methods. Testers reported that many NULLs and negative numbers are confusing, and they prefer simple repetition of unimportant values and readable strings. The reduction technique with merging is the most effective at minimising tests and producing efficient tests while maintaining effectiveness compared to traditional methods. The merged tests showed an increase in inspection efficiency with a slight accuracy decrease compared to only reduced tests. Therefore, these techniques and investigations can help practitioners adopt these generators in practice

    Investigation of SQL Clone on MVC-based Application

    Get PDF
    Model-View-Controller (MVC) design pattern is design pattern that is suitable for interactive systems. MVC is adapted in desktop and web-based applications. Moreover, many frameworks are adapting MVC pattern. Each layer of MVC has a different function. The main function of the model layer is query to the database system that represented by SQL language. In software development, code duplication or code clone is a serious problem because it will impact on the maintenance process. Associated with model layer and code clone, clone detection approach that exists today is not effective to detect clones in the model layer represented by SQL language, because the definition of code clone is not suitable for SQL clone.Ā  SQL is declarative language that is different from the common programming language like C and Java. So, the definition of code clone must be adjusted with characteristic of SQL. In this research, we investigate the existence of SQL clone on MVC-based application and define the types of SQL clone. We define four types of SQL clone and they are confirmed exist in MVC-based application datasets that used in this researc

    Detection of Lightweight Directory Access Protocol Query Injection Attacks in Web Applications

    Get PDF
    The Lightweight Directory Access Protocol (LDAP) is a common protocol used in organizations for Directory Service. LDAP is popular because of its features such as representation of data objects in hierarchical form, being open source and relying on TCP/IP, which is necessary for Internet access. However, with LDAP being used in a large number of web applications, different types of LDAP injection attacks are becoming common. The idea behind LDAP injection attacks is to take advantage of an application not validating inputs before being used as part of LDAP queries. An attacker can provide inputs that may result in alteration of intended LDAP query structure. LDAP injection attacks can lead to various types of security breaches including (i) Login Bypass, (ii) Information Disclosure, (iii) Privilege Escalation, and (iv) Information Alteration. Despite many research efforts focused on traditional SQL Injection attacks, most of the proposed techniques cannot be suitably applied for mitigating LDAP injection attacks due to syntactic and semantic differences between LDAP and SQL queries. Many implemented web applications remain vulnerable to LDAP injection attacks. In particular, there has been little attention for testing web applications to detect the presence of LDAP query injection attacks. The aim of this thesis is two folds: First, study various types of LDAP injection attacks and vulnerabilities reported in the literature. The planned research is to critically examine and evaluate existing injection mitigation techniques using a set of open source applications reported to be vulnerable to LDAP query injection attacks. Second, propose an approach to detect LDAP injection attacks by generating test cases when developing secure web applications. In particular, the thesis focuses on specifying signatures for detecting LDAP injection attack types using Object Constraint Language (OCL) and evaluates the proposed approach using PHP web applications. We also measure the effectiveness of generated test cases using a metric named Mutation Score

    Approach for testing the extract-transform-load process in data warehouse systems, An

    Get PDF
    2018 Spring.Includes bibliographical references.Enterprises use data warehouses to accumulate data from multiple sources for data analysis and research. Since organizational decisions are often made based on the data stored in a data warehouse, all its components must be rigorously tested. In this thesis, we first present a comprehensive survey of data warehouse testing approaches, and then develop and evaluate an automated testing approach for validating the Extract-Transform-Load (ETL) process, which is a common activity in data warehousing. In the survey we present a classification framework that categorizes the testing and evaluation activities applied to the different components of data warehouses. These approaches include both dynamic analysis as well as static evaluation and manual inspections. The classification framework uses information related to what is tested in terms of the data warehouse component that is validated, and how it is tested in terms of various types of testing and evaluation approaches. We discuss the specific challenges and open problems for each component and propose research directions. The ETL process involves extracting data from source databases, transforming it into a form suitable for research and analysis, and loading it into a data warehouse. ETL processes can use complex one-to-one, many-to-one, and many-to-many transformations involving sources and targets that use different schemas, databases, and technologies. Since faulty implementations in any of the ETL steps can result in incorrect information in the target data warehouse, ETL processes must be thoroughly validated. In this thesis, we propose automated balancing tests that check for discrepancies between the data in the source databases and that in the target warehouse. Balancing tests ensure that the data obtained from the source databases is not lost or incorrectly modified by the ETL process. First, we categorize and define a set of properties to be checked in balancing tests. We identify various types of discrepancies that may exist between the source and the target data, and formalize three categories of properties, namely, completeness, consistency, and syntactic validity that must be checked during testing. Next, we automatically identify source-to-target mappings from ETL transformation rules provided in the specifications. We identify one-to-one, many-to-one, and many-to-many mappings for tables, records, and attributes involved in the ETL transformations. We automatically generate test assertions to verify the properties for balancing tests. We use the source-to-target mappings to automatically generate assertions corresponding to each property. The assertions compare the data in the target data warehouse with the corresponding data in the sources to verify the properties. We evaluate our approach on a health data warehouse that uses data sources with different data models running on different platforms. We demonstrate that our approach can find previously undetected real faults in the ETL implementation. We also provide an automatic mutation testing approach to evaluate the fault finding ability of our balancing tests. Using mutation analysis, we demonstrated that our auto-generated assertions can detect faults in the data inside the target data warehouse when faulty ETL scripts execute on mock source data

    FINDbase: a worldwide database for genetic variation allele frequencies updated

    Get PDF
    Frequency of INherited Disorders database (FIND base; http://www.findbase.org) records frequencies of causative genetic variations worldwide. Database records include the population and ethnic group or geographical region, the disorder name and the related gene, accompanied by links to any related external resources and the genetic variation together with its frequency in that population. In addition to the regular data content updates, we report the following significant advances: (i) the systematic collection and thorough documentation of population/ethnic group-specific pharmacogenomic markers allele frequencies for 144 markers in 14 genes of pharmacogenomic interest from different classes of drug-metabolizing enzymes and transporters, representing 150 populations and ethnic groups worldwide; (ii) the development of new data querying and visualization tools in the expanded FINDbase data collection, built around Microsoftā€™s PivotViewer software (http://www.getpivot.com), based on Microsoft Silverlight technology (http://www.silverlight.net) that facilitates querying of large data sets and visualizing the results; and (iii) the establishment of the first database journal, by affiliating FINDbase with Human Genomics and Proteomics, a new open-access scientific journal, which would serve as a prime example of a non-profit model for sustainable database funding
    • ā€¦
    corecore