18,975 research outputs found
Data generator for evaluating ETL process quality
Obtaining the right set of data for evaluating the fulfillment of different quality factors in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while manually providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. More importantly, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over input data, and automatically generates testing datasets. Bijoux is highly modular and configurable to enable end-users to generate datasets for a variety of interesting test scenarios (e.g., evaluating specific parts of an input ETL process design, with different input dataset sizes, different distributions of data, and different operation selectivities). We have developed a running prototype that implements the functionality of our data generation framework and here we report our experimental findings showing the effectiveness and scalability of our approach.Peer ReviewedPostprint (author's final draft
Targeted Automatic Integer Overflow Discovery Using Goal-Directed Conditional Branch Enforcement
We present a new technique and system, DIODE, for auto- matically generating inputs that trigger overflows at memory allocation sites. DIODE is designed to identify relevant sanity checks that inputs must satisfy to trigger overflows at target memory allocation sites, then generate inputs that satisfy these sanity checks to successfully trigger the overflow. DIODE works with off-the-shelf, production x86 binaries. Our results show that, for our benchmark set of applications, and for every target memory allocation site exercised by our seed inputs (which the applications process correctly with no overflows), either 1) DIODE is able to generate an input that triggers an overflow at that site or 2) there is no input that would trigger an overflow for the observed target expression at that site.United States. Defense Advanced Research Projects Agency (Grant FA8650-11-C-7192
Dependency parsing resources for French: Converting acquired lexical functional grammar F-Structure annotations and parsing F-Structures directly
Recent years have seen considerable success in the generation of automatically obtained wide-coverage deep grammars for natural language processing, given reliable
and large CFG-like treebanks. For research within Lexical Functional Grammar framework, these deep grammars are
typically based on an extended PCFG parsing scheme from which dependencies are extracted. However, increasing success in statistical dependency parsing suggests that such deep grammar approaches to statistical parsing could be streamlined. We explore this novel approach to deep
grammar parsing within the framework of LFG in this paper, for French, showing that best results (an f-score of 69.46) for the established integrated architecture may be obtained for French
Practical applications of multi-agent systems in electric power systems
The transformation of energy networks from passive to active systems requires the embedding of intelligence within the network. One suitable approach to integrating distributed intelligent systems is multi-agent systems technology, where components of functionality run as autonomous agents capable of interaction through messaging. This provides loose coupling between components that can benefit the complex systems envisioned for the smart grid. This paper reviews the key milestones of demonstrated agent systems in the power industry and considers which aspects of agent design must still be addressed for widespread application of agent technology to occur
ACETest: Automated Constraint Extraction for Testing Deep Learning Operators
Deep learning (DL) applications are prevalent nowadays as they can help with
multiple tasks. DL libraries are essential for building DL applications.
Furthermore, DL operators are the important building blocks of the DL
libraries, that compute the multi-dimensional data (tensors). Therefore, bugs
in DL operators can have great impacts. Testing is a practical approach for
detecting bugs in DL operators. In order to test DL operators effectively, it
is essential that the test cases pass the input validity check and are able to
reach the core function logic of the operators. Hence, extracting the input
validation constraints is required for generating high-quality test cases.
Existing techniques rely on either human effort or documentation of DL library
APIs to extract the constraints. They cannot extract complex constraints and
the extracted constraints may differ from the actual code implementation.
To address the challenge, we propose ACETest, a technique to automatically
extract input validation constraints from the code to build valid yet diverse
test cases which can effectively unveil bugs in the core function logic of DL
operators. For this purpose, ACETest can automatically identify the input
validation code in DL operators, extract the related constraints and generate
test cases according to the constraints. The experimental results on popular DL
libraries, TensorFlow and PyTorch, demonstrate that ACETest can extract
constraints with higher quality than state-of-the-art (SOTA) techniques.
Moreover, ACETest is capable of extracting 96.4% more constraints and detecting
1.95 to 55 times more bugs than SOTA techniques. In total, we have used ACETest
to detect 108 previously unknown bugs on TensorFlow and PyTorch, with 87 of
them confirmed by the developers. Lastly, five of the bugs were assigned with
CVE IDs due to their security impacts.Comment: Accepted by ISSTA 202
- …