111,159 research outputs found
Learning Interpretable Rules for Multi-label Classification
Multi-label classification (MLC) is a supervised learning problem in which,
contrary to standard multiclass classification, an instance can be associated
with several class labels simultaneously. In this chapter, we advocate a
rule-based approach to multi-label classification. Rule learning algorithms are
often employed when one is not only interested in accurate predictions, but
also requires an interpretable theory that can be understood, analyzed, and
qualitatively evaluated by domain experts. Ideally, by revealing patterns and
regularities contained in the data, a rule-based theory yields new insights in
the application domain. Recently, several authors have started to investigate
how rule-based models can be used for modeling multi-label data. Discussing
this task in detail, we highlight some of the problems that make rule learning
considerably more challenging for MLC than for conventional classification.
While mainly focusing on our own previous work, we also provide a short
overview of related work in this area.Comment: Preprint version. To appear in: Explainable and Interpretable Models
in Computer Vision and Machine Learning. The Springer Series on Challenges in
Machine Learning. Springer (2018). See
http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further
informatio
Data quality evaluation through data quality rules and data provenance.
The application and exploitation of large amounts of data play an ever-increasing role in today’s research, government, and economy. Data understanding and decision making heavily rely on high quality data; therefore, in many different contexts, it is important to assess the quality of a dataset in order to determine if it is suitable to be used for a specific purpose. Moreover, as the access to and the exchange of datasets have become easier and more frequent, and as scientists increasingly use the World Wide Web to share scientific data, there is a growing need to know the provenance of a dataset (i.e., information about the processes and data sources that lead to its creation) in order to evaluate its trustworthiness. In this work, data quality rules and data provenance are used to evaluate the quality of datasets.
Concerning the first topic, the applied solution consists in the identification of types of data constraints that can be useful as data quality rules and in the development of a software tool to evaluate a dataset on the basis of a set of rules expressed in the XML markup language. We selected some of the data constraints and dependencies already considered in the data quality field, but we also used order dependencies and existence constraints as quality rules. In addition, we developed some algorithms to discover the types of dependencies used in the tool. To deal with the provenance of data, the Open Provenance Model (OPM) was adopted, an experimental query language for querying OPM graphs stored in a relational database was implemented, and an approach to design OPM graphs was proposed
Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches
Background: Large-scale biological jobs on high-performance computing systems
require manual intervention if one or more computing cores on which they
execute fail. This places not only a cost on the maintenance of the job, but
also a cost on the time taken for reinstating the job and the risk of losing
data and execution accomplished by the job before it failed. Approaches which
can proactively detect computing core failures and take action to relocate the
computing core's job onto reliable cores can make a significant step towards
automating fault tolerance.
Method: This paper describes an experimental investigation into the use of
multi-agent approaches for fault tolerance. Two approaches are studied, the
first at the job level and the second at the core level. The approaches are
investigated for single core failure scenarios that can occur in the execution
of parallel reduction algorithms on computer clusters. A third approach is
proposed that incorporates multi-agent technology both at the job and core
level. Experiments are pursued in the context of genome searching, a popular
computational biology application.
Result: The key conclusion is that the approaches proposed are feasible for
automating fault tolerance in high-performance computing systems with minimal
human intervention. In a typical experiment in which the fault tolerance is
studied, centralised and decentralised checkpointing approaches on an average
add 90% to the actual time for executing the job. On the other hand, in the
same experiment the multi-agent approaches add only 10% to the overall
execution time.Comment: Computers in Biology and Medicin
Rough sets theory for travel demand analysis in Malaysia
This study integrates the rough sets theory into tourism demand analysis. Originated from the area of Artificial Intelligence, the rough sets theory was introduced to disclose important structures and to classify objects. The Rough Sets methodology provides definitions and methods for finding which attributes separates one class or classification from another. Based on this theory can propose a formal framework for the automated transformation of data into knowledge. This makes the rough sets approach a useful classification and pattern recognition technique. This study introduces a new rough sets approach for deriving rules from information table of tourist in Malaysia. The induced rules were able to forecast change in demand with certain accuracy
Program Transformations for Asynchronous and Batched Query Submission
The performance of database/Web-service backed applications can be
significantly improved by asynchronous submission of queries/requests well
ahead of the point where the results are needed, so that results are likely to
have been fetched already when they are actually needed. However, manually
writing applications to exploit asynchronous query submission is tedious and
error-prone. In this paper we address the issue of automatically transforming a
program written assuming synchronous query submission, to one that exploits
asynchronous query submission. Our program transformation method is based on
data flow analysis and is framed as a set of transformation rules. Our rules
can handle query executions within loops, unlike some of the earlier work in
this area. We also present a novel approach that, at runtime, can combine
multiple asynchronous requests into batches, thereby achieving the benefits of
batching in addition to that of asynchronous submission. We have built a tool
that implements our transformation techniques on Java programs that use JDBC
calls; our tool can be extended to handle Web service calls. We have carried
out a detailed experimental study on several real-life applications, which
shows the effectiveness of the proposed rewrite techniques, both in terms of
their applicability and the performance gains achieved.Comment: 14 page
An Architectural Approach to Ensuring Consistency in Hierarchical Execution
Hierarchical task decomposition is a method used in many agent systems to
organize agent knowledge. This work shows how the combination of a hierarchy
and persistent assertions of knowledge can lead to difficulty in maintaining
logical consistency in asserted knowledge. We explore the problematic
consequences of persistent assumptions in the reasoning process and introduce
novel potential solutions. Having implemented one of the possible solutions,
Dynamic Hierarchical Justification, its effectiveness is demonstrated with an
empirical analysis
- …