923 research outputs found

    Detecting Ambiguity in Prioritized Database Repairing

    Get PDF
    In its traditional definition, a repair of an inconsistent database is a consistent database that differs from the inconsistent one in a "minimal way." Often, repairs are not equally legitimate, as it is desired to prefer one over another; for example, one fact is regarded more reliable than another, or a more recent fact should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs, in the context of denial constraints and subset repairs. There, a priority relation between facts is lifted towards a priority relation between consistent databases, and repairs are restricted to the ones that are optimal in the lifted sense. Three notions of lifting (and optimal repairs) have been proposed: Pareto, global, and completion. In this paper we investigate the complexity of deciding whether the priority relation suffices to clean the database unambiguously, or in other words, whether there is exactly one optimal repair. We show that the different lifting semantics entail highly different complexities. Under Pareto optimality, the problem is coNP-complete, in data complexity, for every set of functional dependencies (FDs), except for the tractable case of (equivalence to) one FD per relation. Under global optimality, one FD per relation is still tractable, but we establish Pi-2-p-completeness for a relation with two FDs. In contrast, under completion optimality the problem is solvable in polynomial time for every set of FDs. In fact, we present a polynomial-time algorithm for arbitrary conflict hypergraphs. We further show that under a general assumption of transitivity, this algorithm solves the problem even for global optimality. The algorithm is extremely simple, but its proof of correctness is quite intricate

    Improving Developers\u27 Understanding of Regex Denial of Service Tools through Anti-Patterns and Fix Strategies

    Get PDF
    Regular expressions are used for diverse purposes, including input validation and firewalls. Unfortunately, they can also lead to a security vulnerability called ReDoS (Regular Expression Denial of Service), caused by a super-linear worst-case execution time during regex matching. Due to the severity and prevalence of ReDoS, past work proposed automatic tools to detect and fix regexes. Although these tools were evaluated in automatic experiments, their usability has not yet been studied; usability has not been a focus of prior work. Our insight is that the usability of existing tools to detect and fix regexes will improve if we complement them with anti-patterns and fix strategies of vulnerable regexes. We developed novel anti-patterns for vulnerable regexes, and a collection of fix strategies to fix them. We derived our anti-patterns and fix strategies from a novel theory of regex infinite ambiguity—a necessary condition for regexes vulnerable to ReDoS. We proved the soundness and completeness of our theory. We evaluated the effectiveness of our anti-patterns, both in an automatic experiment and when applied manually. Then, we evaluated how much our anti-patterns and fix strategies improve developers’ understanding of the outcome of detection and fixing tools. Our evaluation found that our anti-patterns were effective over a large dataset of regexes (N=209,188): 100% precision and 99% recall, improving the state of the art 50% precision and 87% recall. Our anti-patterns were also more effective than the state of the art when applied manually (N=20): 100% developers applied them effectively vs. 50% for the state of the art. Finally, our anti-patterns and fix strategies increased developers’ understanding using automatic tools (N=9): from median “Very weakly” to median “Strongly” when detecting vulnerabilities, and from median “Very weakly” to median “Very strongly” when fixing them

    Computational Complexity And Algorithms For Dirty Data Evaluation And Repairing

    Get PDF
    In this dissertation, we study the dirty data evaluation and repairing problem in relational database. Dirty data is usually inconsistent, inaccurate, incomplete and stale. Existing methods and theories of consistency describe using integrity constraints, such as data dependencies. However, integrity constraints are good at detection but not at evaluating the degree of data inconsistency and cannot guide the data repairing. This dissertation first studies the computational complexity of and algorithms for the database inconsistency evaluation. We define and use the minimum tuple deletion to evaluate the database inconsistency. For such minimum tuple deletion problem, we study the relationship between the size of rule set and its computational complexity. We show that the minimum tuple deletion problem is NP-hard to approximate the minimum tuple deletion within 17/16 if given three functional dependencies and four attributes involved. A near optimal approximated algorithm for computing the minimum tuple deletion is proposed with a ratio of 2 − 1/2r , where r is the number of given functional dependencies. To guide the data repairing, this dissertation also investigates the data repairing method by using query feedbacks, formally studies two decision problems, functional dependency restricted deletion and insertion propagation problem, corresponding to the feedbacks of deletion and insertion. A comprehensive analysis on both combined and data complexity of the cases is provided by considering different relational operators and feedback types. We have identified the intractable and tractable cases to picture the complexity hierarchy of these problems, and provided the efficient algorithm on these tractable cases. Two improvements are proposed, one focuses on figuring out the minimum vertex cover in conflict graph to improve the upper bound of tuple deletion problem, and the other one is a better dichotomy for deletion and insertion propagation problems at the absence of functional dependencies from the point of respectively considering data, combined and parameterized complexities

    A Meta-Analysis of Procedures to Change Implicit Measures

    Get PDF
    Using a novel technique known as network meta-analysis, we synthesized evidence from 492 studies (87,418 participants) to investigate the effectiveness of procedures in changing implicit measures, which we define as response biases on implicit tasks. We also evaluated these procedures’ effects on explicit and behavioral measures. We found that implicit measures can be changed, but effects are often relatively weak (|ds| \u3c .30). Most studies focused on producing short-term changes with brief, single-session manipulations. Procedures that associate sets of concepts, invoke goals or motivations, or tax mental resources changed implicit measures the most, whereas procedures that induced threat, affirmation, or specific moods/emotions changed implicit measures the least. Bias tests suggested that implicit effects could be inflated relative to their true population values. Procedures changed explicit measures less consistently and to a smaller degree than implicit measures and generally produced trivial changes in behavior. Finally, changes in implicit measures did not mediate changes in explicit measures or behavior. Our findings suggest that changes in implicit measures are possible, but those changes do not necessarily translate into changes in explicit measures or behavior

    Economic impact failure mode and effects analysis

    Get PDF
    Failure mode and effects analysis (FMEA) is a method for reducing or eliminating failure modes in a system. A failure mode occurs when a system does not meet its specification. While FMEA is widely used in different industries, its multiple limitations can cause the method to be ineffective. One major limitation is the ambiguity of the risk priority number (RPN), which is used for risk prioritization and is the product of three ordinal variables: severity of effect, probability of occurrence, and likelihood of detection. There have been multiple attempts to address the RPN's ambiguity, but more work is still needed. Any new risk prioritization method needs to have a decision-support system to determine when to implement a corrective action or improvement.This research addresses some of the shortcomings of traditional FMEA through the creation of a new method called Economic Impact FMEA (EI-FMEA). EI-FMEA replaces the three ordinal values used in the RPN calculation with a new set of variables focusing on the expected cost of a failure occurring. A detailed decision-support system allows for the evaluation of corrective actions based on implementation cost, recurring cost, and adjusted failure cost. The RPN risk prioritization metric is replaced by the economic impact value (EIV) risk prioritization metric which ranks risks based on the impact of the corrective action through the largest reduction in potential failure cost. To help with resource allocation, the EIV only ranks risks where the corrective actions are economically sustainable.A comparison of three FMEA methods is performed on a product, and the risk prioritization metrics for each method are used to determine corrective action implementation. An evaluation of the FMEA methods are shown, based on the expected failure cost reduction, using the decision-support criteria of each method.The EI-FMEA method contributes to the body of knowledge by addressing the ambiguity of the RPN in FMEA by creating the EIV risk prioritization metric. This allows the EI-FMEA method to reduce failure cost by providing a decision-support system to determine when to implement a corrective action when both finite and infinite resources are available

    Towards understanding the challenges faced by machine learning software developers and enabling automated solutions

    Get PDF
    Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To fill that gap this thesis reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikitlearn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. Our findings reveal the urgent need for software engineering (SE) research in this area. The second part of the thesis particularly focuses on understanding the Deep Neural Network (DNN) bug characteristics. We study 2,716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, their root causes and impacts, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. While exploring the bug characteristics, our findings imply that repairing software that uses DNNs is one such unmistakable SE need where automated tools could be beneficial; however, we do not fully understand challenges to repairing and patterns that are utilized when manually repairing DNNs. So, the third part of this thesis presents a comprehensive study of bug fix patterns to address these questions. We have studied 415 repairs from Stack Overflow and 555 repairs from Github for five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand challenges in repairs and bug repair patterns. Our key findings reveal that DNN bug fix patterns are distinctive compared to traditional bug fix patterns and the most common bug fix patterns are fixing data dimension and neural network connectivity. Finally, we propose an automatic technique to detect ML Application Programming Interface (API) misuses. We started with an empirical study to understand ML API misuses. Our study shows that ML API misuse is prevalent and distinct compared to non-ML API misuses. Inspired by these findings, we contributed Amimla (Api Misuse In Machine Learning Apis) an approach and a tool for ML API misuse detection. Amimla relies on several technical innovations. First, we proposed an abstract representation of ML pipelines to use in misuse detection. Second, we proposed an abstract representation of neural networks for deep learning related APIs. Third, we have developed a representation strategy for constraints on ML APIs. Finally, we have developed a misuse detection strategy for both single and multi-APIs. Our experimental evaluation shows that Amimla achieves a high average accuracy of ∼80% on two benchmarks of misuses from Stack Overflow and Github

    A DATA-INFORMED MODEL OF PERFORMANCE SHAPING FACTORS FOR USE IN HUMAN RELIABILITY ANALYSIS

    Get PDF
    Many Human Reliability Analysis (HRA) models use Performance Shaping Factors (PSFs) to incorporate human elements into system safety analysis and to calculate the Human Error Probability (HEP). Current HRA methods rely on different sets of PSFs that range from a few to over 50 PSFs, with varying degrees of interdependency among the PSFs. This interdependency is observed in almost every set of PSFs, yet few HRA methods offer a way to account for dependency among PSFs. The methods that do address interdependencies generally do so by varying different multipliers in linear or log-linear formulas. These relationships could be more accurately represented in a causal model of PSF interdependencies. This dissertation introduces a methodology to produce a Bayesian Belief Network (BBN) of interactions among PSFs. The dissertation also presents a set of fundamental guidelines for the creation of a PSF set, a hierarchy of PSFs developed specifically for causal modeling, and a set of models developed using currently available data. The models, methodology, and PSF set were developed using nuclear power plant data available from two sources: information collected by the University of Maryland for the Information-Decision-Action model [1] and data from the Human Events Repository and Analysis (HERA) database [2] , currently under development by the United States Nuclear Regulatory Commission. Creation of the methodology, the PSF hierarchy, and the models was an iterative process that incorporated information from available data, current HRA methods, and expert workshops. The fundamental guidelines are the result of insights gathered during the process of developing the methodology; these guidelines were applied to the final PSF hierarchy. The PSF hierarchy reduces overlap among the PSFs so that patterns of dependency observed in the data can be attribute to PSF interdependencies instead of overlapping definitions. It includes multiple levels of generic PSFs that can be expanded or collapsed for different applications. The model development methodology employs correlation and factor analysis to systematically collapse the PSF hierarchy and form the model structure. Factor analysis is also used to identify Error Contexts (ECs) – specific PSF combinations that together produce an increased probability of human error (versus the net effect of the PSFs acting alone). Three models were created to demonstrate how the methodology can be used provide different types of data-informed insights. By employing Bayes' Theorem, the resulting model can be used to replace linear calculations for HEPs used in Probabilistic Risk Assessment. When additional data becomes available, the methodology can be used to produce updated causal models to further refine HEP values

    Working Notes from the 1992 AAAI Spring Symposium on Practical Approaches to Scheduling and Planning

    Get PDF
    The symposium presented issues involved in the development of scheduling systems that can deal with resource and time limitations. To qualify, a system must be implemented and tested to some degree on non-trivial problems (ideally, on real-world problems). However, a system need not be fully deployed to qualify. Systems that schedule actions in terms of metric time constraints typically represent and reason about an external numeric clock or calendar and can be contrasted with those systems that represent time purely symbolically. The following topics are discussed: integrating planning and scheduling; integrating symbolic goals and numerical utilities; managing uncertainty; incremental rescheduling; managing limited computation time; anytime scheduling and planning algorithms, systems; dependency analysis and schedule reuse; management of schedule and plan execution; and incorporation of discrete event techniques
    corecore