2,279 research outputs found
RORS: Enhanced Rule-based OWL Reasoning on Spark
The rule-based OWL reasoning is to compute the deductive closure of an
ontology by applying RDF/RDFS and OWL entailment rules. The performance of the
rule-based OWL reasoning is often sensitive to the rule execution order. In
this paper, we present an approach to enhancing the performance of the
rule-based OWL reasoning on Spark based on a locally optimal executable
strategy. Firstly, we divide all rules (27 in total) into four main classes,
namely, SPO rules (5 rules), type rules (7 rules), sameAs rules (7 rules), and
schema rules (8 rules) since, as we investigated, those triples corresponding
to the first three classes of rules are overwhelming (e.g., over 99% in the
LUBM dataset) in our practical world. Secondly, based on the interdependence
among those entailment rules in each class, we pick out an optimal rule
executable order of each class and then combine them into a new rule execution
order of all rules. Finally, we implement the new rule execution order on Spark
in a prototype called RORS. The experimental results show that the running time
of RORS is improved by about 30% as compared to Kim & Park's algorithm (2015)
using the LUBM200 (27.6 million triples).Comment: 12 page
Large-scale Parallel Stratified Defeasible Reasoning
We are recently experiencing an unprecedented explosion of available data from the Web, sensors readings, scientific databases, government authorities and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling huge amounts of data for these applications. In this paper, we consider inconsistency-tolerant reasoning in the form of defeasible logic, and analyze how parallelization, using the MapReduce framework, can be used to reason with defeasible rules over huge datasets. We extend previous work by dealing with predicates of arbitrary arity, under the assumption of stratification. Moving from unary to multi-arity predicates is a decisive step towards practical applications, e.g. reasoning with linked open (RDF) data. Our experimental results demonstrate that defeasible reasoning with millions of data is performant, and has the potential to scale to billions of facts
Programming support for an integrated multi-party computation and MapReduce infrastructure
We describe and present a prototype of a distributed computational infrastructure and associated high-level programming language that allow multiple parties to leverage their own computational resources capable of supporting MapReduce [1] operations in combination with multi-party computation (MPC). Our architecture allows a programmer to author and compile a protocol using a uniform collection of standard constructs, even when that protocol involves computations that take place locally within each participant’s MapReduce cluster as well as across all the participants using an MPC protocol. The highlevel programming language provided to the user is accompanied by static analysis algorithms that allow the programmer to reason about the efficiency of the protocol before compiling and running it. We present two example applications demonstrating how such an infrastructure can be employed.This work was supported in part
by NSF Grants: #1430145, #1414119, #1347522, and #1012798
Evolving Large-Scale Data Stream Analytics based on Scalable PANFIS
Many distributed machine learning frameworks have recently been built to
speed up the large-scale data learning process. However, most distributed
machine learning used in these frameworks still uses an offline algorithm model
which cannot cope with the data stream problems. In fact, large-scale data are
mostly generated by the non-stationary data stream where its pattern evolves
over time. To address this problem, we propose a novel Evolving Large-scale
Data Stream Analytics framework based on a Scalable Parsimonious Network based
on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving
algorithm is distributed over the worker nodes in the cloud to learn
large-scale data stream. Scalable PANFIS framework incorporates the active
learning (AL) strategy and two model fusion methods. The AL accelerates the
distributed learning process to generate an initial evolving large-scale data
stream model (initial model), whereas the two model fusion methods aggregate an
initial model to generate the final model. The final model represents the
update of current large-scale data knowledge which can be used to infer future
data. Extensive experiments on this framework are validated by measuring the
accuracy and running time of four combinations of Scalable PANFIS and other
Spark-based built in algorithms. The results indicate that Scalable PANFIS with
AL improves the training time to be almost two times faster than Scalable
PANFIS without AL. The results also show both rule merging and the voting
mechanisms yield similar accuracy in general among Scalable PANFIS algorithms
and they are generally better than Spark-based algorithms. In terms of running
time, the Scalable PANFIS training time outperforms all Spark-based algorithms
when classifying numerous benchmark datasets.Comment: 20 pages, 5 figure
- …