189,427 research outputs found
Data generator for evaluating ETL process quality
Obtaining the right set of data for evaluating the fulfillment of different quality factors in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while manually providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. More importantly, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over input data, and automatically generates testing datasets. Bijoux is highly modular and configurable to enable end-users to generate datasets for a variety of interesting test scenarios (e.g., evaluating specific parts of an input ETL process design, with different input dataset sizes, different distributions of data, and different operation selectivities). We have developed a running prototype that implements the functionality of our data generation framework and here we report our experimental findings showing the effectiveness and scalability of our approach.Peer ReviewedPostprint (author's final draft
A polynomial delay algorithm for the enumeration of bubbles with length constraints in directed graphs and its application to the detection of alternative splicing in RNA-seq data
We present a new algorithm for enumerating bubbles with length constraints in
directed graphs. This problem arises in transcriptomics, where the question is
to identify all alternative splicing events present in a sample of mRNAs
sequenced by RNA-seq. This is the first polynomial-delay algorithm for this
problem and we show that in practice, it is faster than previous approaches.
This enables us to deal with larger instances and therefore to discover novel
alternative splicing events, especially long ones, that were previously
overseen using existing methods.Comment: Peer-reviewed and presented as part of the 13th Workshop on
Algorithms in Bioinformatics (WABI2013
Bridging the gap between design and implementation of components libraries
Object-oriented design is usually driven by three main reusability principles:
step-by-step design, design for reuse and design with reuse. However, these
principles are just partially
applied to the subsequent object-oriented implementation, often due to efficienc
y
constraints, yielding to a gap between design and implementation. In this paper
we provide a solution for bridging this gap for a concrete framework, the one of
designing and implementing container-like component libraries, such as STL, Booc
h
Components, etc. Our approach is based on a new design pattern together with its
corresponding implementation. The proposal enhances the same principles that
drive the design process: step-by--step implementation (adding just what is
needed in every step), implementation with reuse (component implementations are
reused while library implementation
progresses and component hierarchies grow) and implementation for reuse
(intermediate component implementations can be reused in many different points o
f
the hierarchy). We use our approach in two different manners: for building a
brand-new container-like
component library, and for reengineering an existing one, Booch Components in
Ada95.Postprint (published version
Segment Routing: a Comprehensive Survey of Research Activities, Standardization Efforts and Implementation Results
Fixed and mobile telecom operators, enterprise network operators and cloud
providers strive to face the challenging demands coming from the evolution of
IP networks (e.g. huge bandwidth requirements, integration of billions of
devices and millions of services in the cloud). Proposed in the early 2010s,
Segment Routing (SR) architecture helps face these challenging demands, and it
is currently being adopted and deployed. SR architecture is based on the
concept of source routing and has interesting scalability properties, as it
dramatically reduces the amount of state information to be configured in the
core nodes to support complex services. SR architecture was first implemented
with the MPLS dataplane and then, quite recently, with the IPv6 dataplane
(SRv6). IPv6 SR architecture (SRv6) has been extended from the simple steering
of packets across nodes to a general network programming approach, making it
very suitable for use cases such as Service Function Chaining and Network
Function Virtualization. In this paper we present a tutorial and a
comprehensive survey on SR technology, analyzing standardization efforts,
patents, research activities and implementation results. We start with an
introduction on the motivations for Segment Routing and an overview of its
evolution and standardization. Then, we provide a tutorial on Segment Routing
technology, with a focus on the novel SRv6 solution. We discuss the
standardization efforts and the patents providing details on the most important
documents and mentioning other ongoing activities. We then thoroughly analyze
research activities according to a taxonomy. We have identified 8 main
categories during our analysis of the current state of play: Monitoring,
Traffic Engineering, Failure Recovery, Centrally Controlled Architectures, Path
Encoding, Network Programming, Performance Evaluation and Miscellaneous...Comment: SUBMITTED TO IEEE COMMUNICATIONS SURVEYS & TUTORIAL
- …