7,025 research outputs found
On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research
Scientific research requires access, analysis, and sharing of data that is
distributed across various heterogeneous data sources at the scale of the
Internet. An eager ETL process constructs an integrated data repository as its
first step, integrating and loading data in its entirety from the data sources.
The bootstrapping of this process is not efficient for scientific research that
requires access to data from very large and typically numerous distributed data
sources. a lazy ETL process loads only the metadata, but still eagerly. Lazy
ETL is faster in bootstrapping. However, queries on the integrated data
repository of eager ETL perform faster, due to the availability of the entire
data beforehand.
In this paper, we propose a novel ETL approach for scientific data
integration, as a hybrid of eager and lazy ETL approaches, and applied both to
data as well as metadata. This way, Hybrid ETL supports incremental integration
and loading of metadata and data from the data sources. We incorporate a
human-in-the-loop approach, to enhance the hybrid ETL, with selective data
integration driven by the user queries and sharing of integrated data between
users. We implement our hybrid ETL approach in a prototype platform, Obidos,
and evaluate it in the context of data sharing for medical research. Obidos
outperforms both the eager ETL and lazy ETL approaches, for scientific research
data integration and sharing, through its selective loading of data and
metadata, while storing the integrated data in a scalable integrated data
repository.Comment: Pre-print Submitted to the DMAH Special Issue of the Springer DAPD
Journa
Autonomous Fault Detection in Self-Healing Systems using Restricted Boltzmann Machines
Autonomously detecting and recovering from faults is one approach for
reducing the operational complexity and costs associated with managing
computing environments. We present a novel methodology for autonomously
generating investigation leads that help identify systems faults, and extends
our previous work in this area by leveraging Restricted Boltzmann Machines
(RBMs) and contrastive divergence learning to analyse changes in historical
feature data. This allows us to heuristically identify the root cause of a
fault, and demonstrate an improvement to the state of the art by showing
feature data can be predicted heuristically beyond a single instance to include
entire sequences of information.Comment: Published and presented in the 11th IEEE International Conference and
Workshops on Engineering of Autonomic and Autonomous Systems (EASe 2014
Experience-Based Planning with Sparse Roadmap Spanners
We present an experienced-based planning framework called Thunder that learns
to reduce computation time required to solve high-dimensional planning problems
in varying environments. The approach is especially suited for large
configuration spaces that include many invariant constraints, such as those
found with whole body humanoid motion planning. Experiences are generated using
probabilistic sampling and stored in a sparse roadmap spanner (SPARS), which
provides asymptotically near-optimal coverage of the configuration space,
making storing, retrieving, and repairing past experiences very efficient with
respect to memory and time. The Thunder framework improves upon past
experience-based planners by storing experiences in a graph rather than in
individual paths, eliminating redundant information, providing more
opportunities for path reuse, and providing a theoretical limit to the size of
the experience graph. These properties also lead to improved handling of
dynamically changing environments, reasoning about optimal paths, and reducing
query resolution time. The approach is demonstrated on a 30 degrees of freedom
humanoid robot and compared with the Lightning framework, an experience-based
planner that uses individual paths to store past experiences. In environments
with variable obstacles and stability constraints, experiments show that
Thunder is on average an order of magnitude faster than Lightning and planning
from scratch. Thunder also uses 98.8% less memory to store its experiences
after 10,000 trials when compared to Lightning. Our framework is implemented
and freely available in the Open Motion Planning Library.Comment: Submitted to ICRA 201
RePOR: Mimicking humans on refactoring tasks. Are we there yet?
Refactoring is a maintenance activity that aims to improve design quality
while preserving the behavior of a system. Several (semi)automated approaches
have been proposed to support developers in this maintenance activity, based on
the correction of anti-patterns, which are `poor' solutions to recurring design
problems. However, little quantitative evidence exists about the impact of
automatically refactored code on program comprehension, and in which context
automated refactoring can be as effective as manual refactoring. Leveraging
RePOR, an automated refactoring approach based on partial order reduction
techniques, we performed an empirical study to investigate whether automated
refactoring code structure affects the understandability of systems during
comprehension tasks. (1) We surveyed 80 developers, asking them to identify
from a set of 20 refactoring changes if they were generated by developers or by
a tool, and to rate the refactoring changes according to their design quality;
(2) we asked 30 developers to complete code comprehension tasks on 10 systems
that were refactored by either a freelancer or an automated refactoring tool.
To make comparison fair, for a subset of refactoring actions that introduce new
code entities, only synthetic identifiers were presented to practitioners. We
measured developers' performance using the NASA task load index for their
effort, the time that they spent performing the tasks, and their percentages of
correct answers. Our findings, despite current technology limitations, show
that it is reasonable to expect a refactoring tools to match developer code
Efficient Constellation-Based Map-Merging for Semantic SLAM
Data association in SLAM is fundamentally challenging, and handling ambiguity
well is crucial to achieve robust operation in real-world environments. When
ambiguous measurements arise, conservatism often mandates that the measurement
is discarded or a new landmark is initialized rather than risking an incorrect
association. To address the inevitable `duplicate' landmarks that arise, we
present an efficient map-merging framework to detect duplicate constellations
of landmarks, providing a high-confidence loop-closure mechanism well-suited
for object-level SLAM. This approach uses an incrementally-computable
approximation of landmark uncertainty that only depends on local information in
the SLAM graph, avoiding expensive recovery of the full system covariance
matrix. This enables a search based on geometric consistency (GC) (rather than
full joint compatibility (JC)) that inexpensively reduces the search space to a
handful of `best' hypotheses. Furthermore, we reformulate the commonly-used
interpretation tree to allow for more efficient integration of clique-based
pairwise compatibility, accelerating the branch-and-bound max-cardinality
search. Our method is demonstrated to match the performance of full JC methods
at significantly-reduced computational cost, facilitating robust object-based
loop-closure over large SLAM problems.Comment: Accepted to IEEE International Conference on Robotics and Automation
(ICRA) 201
- …