188,755 research outputs found

    Data generator for evaluating ETL process quality

    Get PDF
    Obtaining the right set of data for evaluating the fulfillment of different quality factors in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while manually providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. More importantly, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over input data, and automatically generates testing datasets. Bijoux is highly modular and configurable to enable end-users to generate datasets for a variety of interesting test scenarios (e.g., evaluating specific parts of an input ETL process design, with different input dataset sizes, different distributions of data, and different operation selectivities). We have developed a running prototype that implements the functionality of our data generation framework and here we report our experimental findings showing the effectiveness and scalability of our approach.Peer ReviewedPostprint (author's final draft

    On the relation between the base of an EI algebra and word graphs

    Get PDF
    This paper is an attempt to investigate the possibilities to link algebraic fuzzy set theory with the theory of word graphs. In both theories concepts are studied and concepts can be set in correspondence. This enables to use algebraic results in the context of word graph theory

    Categories in Control

    Full text link
    Control theory uses "signal-flow diagrams" to describe processes where real-valued functions of time are added, multiplied by scalars, differentiated and integrated, duplicated and deleted. These diagrams can be seen as string diagrams for the symmetric monoidal category FinVect_k of finite-dimensional vector spaces over the field of rational functions k = R(s), where the variable s acts as differentiation and the monoidal structure is direct sum rather than the usual tensor product of vector spaces. For any field k we give a presentation of FinVect_k in terms of the generators used in signal flow diagrams. A broader class of signal-flow diagrams also includes "caps" and "cups" to model feedback. We show these diagrams can be seen as string diagrams for the symmetric monoidal category FinRel_k, where objects are still finite-dimensional vector spaces but the morphisms are linear relations. We also give a presentation for FinRel_k. The relations say, among other things, that the 1-dimensional vector space k has two special commutative dagger-Frobenius structures, such that the multiplication and unit of either one and the comultiplication and counit of the other fit together to form a bimonoid. This sort of structure, but with tensor product replacing direct sum, is familiar from the "ZX-calculus" obeyed by a finite-dimensional Hilbert space with two mutually unbiased bases.Comment: 42 pages LaTe

    Simple and Effective Type Check Removal through Lazy Basic Block Versioning

    Get PDF
    Dynamically typed programming languages such as JavaScript and Python defer type checking to run time. In order to maximize performance, dynamic language VM implementations must attempt to eliminate redundant dynamic type checks. However, type inference analyses are often costly and involve tradeoffs between compilation time and resulting precision. This has lead to the creation of increasingly complex multi-tiered VM architectures. This paper introduces lazy basic block versioning, a simple JIT compilation technique which effectively removes redundant type checks from critical code paths. This novel approach lazily generates type-specialized versions of basic blocks on-the-fly while propagating context-dependent type information. This does not require the use of costly program analyses, is not restricted by the precision limitations of traditional type analyses and avoids the implementation complexity of speculative optimization techniques. We have implemented intraprocedural lazy basic block versioning in a JavaScript JIT compiler. This approach is compared with a classical flow-based type analysis. Lazy basic block versioning performs as well or better on all benchmarks. On average, 71% of type tests are eliminated, yielding speedups of up to 50%. We also show that our implementation generates more efficient machine code than TraceMonkey, a tracing JIT compiler for JavaScript, on several benchmarks. The combination of implementation simplicity, low algorithmic complexity and good run time performance makes basic block versioning attractive for baseline JIT compilers
    • …
    corecore