Search CORE

25 research outputs found

RORS: Enhanced Rule-based OWL Reasoning on Spark

Author: Feng Zhiyong
Liu Zhihui
Rao Guozheng
Wang Xin
Zhang Xiaowang
Publication venue
Publication date: 09/05/2016
Field of study

The rule-based OWL reasoning is to compute the deductive closure of an ontology by applying RDF/RDFS and OWL entailment rules. The performance of the rule-based OWL reasoning is often sensitive to the rule execution order. In this paper, we present an approach to enhancing the performance of the rule-based OWL reasoning on Spark based on a locally optimal executable strategy. Firstly, we divide all rules (27 in total) into four main classes, namely, SPO rules (5 rules), type rules (7 rules), sameAs rules (7 rules), and schema rules (8 rules) since, as we investigated, those triples corresponding to the first three classes of rules are overwhelming (e.g., over 99% in the LUBM dataset) in our practical world. Secondly, based on the interdependence among those entailment rules in each class, we pick out an optimal rule executable order of each class and then combine them into a new rule execution order of all rules. Finally, we implement the new rule execution order on Spark in a prototype called RORS. The experimental results show that the running time of RORS is improved by about 30% as compared to Kim & Park's algorithm (2015) using the LUBM200 (27.6 million triples).Comment: 12 page

arXiv.org e-Print Archive

Crossref

Hierarchical Multi-Label Classification Using Web Reasoning for Large Datasets

Author: Aurélie Bertaux
Christophe Cruz
Nuno Silva
Rafael Peixoto
Thomas Hassan
Publication venue: RonPub
Publication date: 01/01/2016
Field of study

Extracting valuable data among large volumes of data is one of the main challenges in Big Data. In this paper, a Hierarchical Multi-Label Classification process called Semantic HMC is presented. This process aims to extract valuable data from very large data sources, by automatically learning a label hierarchy and classifying data items.The Semantic HMC process is composed of five scalable steps, namely Indexation, Vectorization, Hierarchization, Resolution and Realization. The first three steps construct automatically a label hierarchy from statistical analysis of data. This paper focuses on the last two steps which perform item classification according to the label hierarchy. The process is implemented as a scalable and distributed application, and deployed on a Big Data platform. A quality evaluation is described, which compares the approach with multi-label classification algorithms from the state of the art dedicated to the same goal. The Semantic HMC approach outperforms state of the art approaches in some areas

HAL-uB

RonPub -- Research Online Publishing

Semantic Query Reasoning in Distributed Environment

Author: Xiao Yu
Publication venue: University of Stavanger, Norway
Publication date: 01/01/2011
Field of study

Master's thesis in Computer scienceSemantic Web aims to elevate simple data in WWW to semantic layer, so that knowledge, processed by machine, can be shared more easily. Ontology is one of the key technologies to realize Semantic Web. Semantic reasoning is an important step in Semantic technology. For Ontology developers, semantic reasoning finds out collisions in Ontology definition, and optimizes it; for Ontology users, semantic reasoning retrieves implicit knowledge from known knowledge. The main research of this thesis is reasoning of semantic data querying in distributed environment, which tries to get correct results of semantic data querying, given Ontology definition and data. This research studied two methods: data materialization and query rewriting. Using Amazon cloud computing service and LUBM, we compared these two methods, and have concluded that when size of data to be queried scales up, query rewriting is more feasible than data materialization. Also, based on the conclusion, we developed an application, which manages and queries semantic data in a distributed environment. This application can be used as a prototype of similar applications, and a tool for other Semantic Web researches as well

NORA - Norwegian Open Research Archives

UiS Brage

Distributed RDF query processing and reasoning for big data / linked data

Author: Perasani Anudeep
Publication venue
Publication date
Field of study

Title from PDF of title page, viewed on August 27, 2014Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 61-65)Thesis (M. S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2014The Linked Data Movement is aimed at converting unstructured and semi-structured data on the documents to semantically connected documents called the "web of data." This is based on Resource Description Framework (RDF) that represents the semantic data and a collection of such statements shapes an RDF graph. SPARQL is a query language designed specifically to query RDF data. Linked Data faces the same challenge that Big Data does. We now lead the way to a new wave of a new paradigm, Big Data and Linked Data that identify massive amounts of data in a connected form. Indeed, utilizing Linked Data and Big Data continue to be in high demand. Therefore, we need a scalable and accessible query system for the reusability and availability of existing web data. However, existing SPAQL query systems are not sufficiently scalable for Big Data and Linked Data. In this thesis, we address an issue of how to improve the scalability and performance of query processing with Big Data / Linked Data. Our aim is to evaluate and assess presently available SPARQL query engines and develop an effective model to query RDF data that should be scalable with reasoning capabilities. We designed an efficient and distributed SPARQL engine using MapReduce (parallel and distributed processing for large data sets on a cluster) and the Apache Cassandra database (scalable and highly available peer to peer distributed database system). We evaluated an existing in-memory based ARQ engine provided by Jena framework and found that it cannot handle large datasets, as it only works based on the in-memory feature of the system. It was shown that the proposed model had powerful reasoning capabilities and dealt efficiently with big datasetsAbstract -- Illistrations -- Tables -- Introduction -- Background and related work -- Graph-store based SPARQL model -- Graph-store based SPARQL model implementation -- Results and evaluation -- Conclusion and future work -- Reference

University of Missouri: MOspace

Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data

Author: Mondal Amit Kumar 1987-
Publication venue: 'University of Saskatchewan Library'
Publication date: 24/04/2018
Field of study

With the rapid advancement of computing technologies, various scientific research communities have been extensively using cloud-based software tools or applications. Cloud-based applications allow users to access software applications from web browsers while relieving them from the installation of any software applications in their desktop environment. For example, Galaxy, GenAP, and iPlant Colaborative are popular cloud-based systems for scientific workflow analysis in the domain of plant Genotyping and Phenotyping. These systems are being used for conducting research, devising new techniques, and sharing the computer assisted analysis results among collaborators. Researchers need to integrate their new workflows/pipelines, tools or techniques with the base system over time. Moreover, large scale data need to be processed within the time-line for more effective analysis. Recently, Big Data technologies are emerging for facilitating large scale data processing with commodity hardware. Among the above-mentioned systems, GenAp is utilizing the Big Data technologies for specific cases only. The structure of such a cloud-based system is highly variable and complex in nature. Software architects and developers need to consider totally different properties and challenges during the development and maintenance phases compared to the traditional business/service oriented systems. Recent studies report that software engineers and data engineers confront challenges to develop analytic tools for supporting large scale and heterogeneous data analysis. Unfortunately, less focus has been given by the software researchers to devise a well-defined methodology and frameworks for flexible design of a cloud system for the Genotyping and Phenotyping domain. To that end, more effective design methodologies and frameworks are an urgent need for cloud based Genotyping and Phenotyping analysis system development that also supports large scale data processing. In our thesis, we conduct a few studies in order to devise a stable reference architecture and modularity model for the software developers and data engineers in the domain of Genotyping and Phenotyping. In the first study, we analyze the architectural changes of existing candidate systems to find out the stability issues. Then, we extract architectural patterns of the candidate systems and propose a conceptual reference architectural model. Finally, we present a case study on the modularity of computation-intensive tasks as an extension of the data-centric development. We show that the data-centric modularity model is at the core of the flexible development of a Genotyping and Phenotyping analysis system. Our proposed model and case study with thousands of images provide a useful knowledge-base for software researchers, developers, and data engineers for cloud based Genotyping and Phenotyping analysis system development

eCommons@USASK

University of Saskatchewan Research Archive

Recommended from our members

Exploiting a perdurantist foundational ontology and graph database for semantic data integration

Author: Foy George
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.The view of reality that is inherent to perdurantist philosophical ontologies, often termed four dimensional (4D) ontologies, has not been widely adopted within the mainstream of information system design practice. However, as the closed world of enterprise systems is opened to Internet scale Semantic Web and Open Data information sources, there is a need to better understand the semantics of both internal and external data and how they can be integrated. Philosophical foundational ontologies can help establish this understanding and there is, therefore, an emerging need to research how they can be applied to the problem of semantic data integration. Therefore, a prime objective of this research was to develop a framework through which to apply a 4D foundational ontology and a graph database to the problem of semantic data integration, and to assess the effectiveness of the approach. The research employed design science, a methodology which is applicable to undertaking research within information systems as it encompasses methods through which the research can be undertaken and the resultant artefacts evaluated. This methodology has a number of discrete stages: problem awareness; a core design-build-evaluate iterative cycle through which the research is conducted; and a conclusion stage. The design science research was conducted through the development of a number of artefacts, the prime being the 4D-Semantic Extract Load (4D-SETL) framework. The effectiveness of the framework was assessed by applying it to semantically interpret and integrate a number of large scale datasets and to instantiate a prototype graph database warehouse to persist the resultant ontology. A series of technical experiments confirmed that directly reflecting the model patterns of 4D ontology within a prototype data warehouse proved an effective means of both structuring and semantically integrating complex datasets and that the artefacts produced by 4D-SETL could function at scale. Through illustrative scenario, the effectiveness of the approach is described in relation to the ability of the framework to address a number of weaknesses in current approaches. Furthermore the major advantages of the 4D-SETL are elaborated; which include ability of the framework is to combine foundational, domain and instance level ontological models in a single coherent system that dispensed with much of the translation normally undertaken between conceptual, logical and physical data models. Additionally, adopting a perdurantist realist foundational ontology provided a clear means of establishing and maintaining the identity of physical objects as their constituent temporal and spatial parts unfold over the course of tim

Brunel University Research Archive

Selected Problems in Data Driven and Traffic Related Networks

Author: Farrugia A
Publication venue
Publication date
Field of study

In our research we concentrate on networks. The topic of networks has been extensively studied over the last few decades and it is still gaining popularity. In this thesis we study the challenge of gaining an understanding of networks when information about the network is unknown or limited in some way. Initially we consider the challenge of understanding from a vast amount of information what can be used to provide insight into the behaviour of the network, and for this we consider methods and techniques adopted from the social network analysis (SNA) community. Following this, we consider networks that have access to data that is limited in some way and demonstrate that statistical analysis methods can be used to overcome these challenges. Finally, we consider the challenge of having exposure to increasingly less information about the network, and we demonstrate this difficulty by considering the rendezvous problem in a restricted network

University of Liverpool Repository

BIM-Based Life Cycle Sustainability Assessment for Buildings

Author
Publication venue: 'MDPI AG'
Publication date: 17/11/2022
Field of study

In recent years, the progress of digitization in the architecture and construction sectors has produced enormous advances in the automation of analysis and evaluation processes. This is the case with environmental analysis systems, such as the life cycle analysis. Methodology practitioners have found a fundamental ally in the building information modeling platforms, which allow tasks that conventionally consume large amounts of energy and time to be carried out more automatically and efficiently. In this publication, the reader will find some of the latest advances in this area

Directory of Open Access Books (DOAB)

An approach to the semantic intelligence cloud

Author: Greenwell Richard
Publication venue
Publication date
Field of study

Cloud computing is a disruptive technology that aims to provide a utility approach to computing, where users can obtain their required computing resources without investment in infrastructure, computing platforms or services. Cloud computing resources can be obtained from a number internal or external sources. The heterogeneity of cloud service provision makes comparison of services difficult, with further complexity being introduced by a number of provision approaches such as reserved purchase, on-demand provisioning and spot markets.The aim of the research was to develop a semantic framework for cloud computing services which incorporated Cloud Service Agreements, requirements, pricing and Benefits Management.The proposed approach sees the development of an integrated framework where Cloud Service Agreements describe the relationship between cloud service providers and cloud service users. Requirements are developed from agreements and can use the concepts, relationships and assertions provided as requirements. Pricing in turn is established from requirements. Benefits Management is pervasive across the semantic framework developed.The methods used were to provide a comprehensive review of literature to establish a good theoretical basis for the research undertaken. Then problem solving ontology was developed that defined concepts and relationships for the proposed semantic framework. A number of case studies were used to populate the developed ontology with assertions. Reasoning was used to test the framework was correct.The results produced were a proposed framework of concepts, relationships and assertions for a cloud service descriptions, which are presented as ontology in textual and graphical form. Several parts of the ontology were published on public ontology platforms and, in journal and conference papers.The original contribution to knowledge is seen in the results produced. The proposed framework provides the foundations for development of a unified semantic framework for cloud computing service description and has been used by other researchers developing semantic cloud service description.In the area of Cloud Service Agreements a full coverage of the documents described by major standards organisations have been encoded into the framework. Requirements have been modelled as a unique multilevel semantic representation. Pricing of cloud services has been developed using semantic description that can be mapped to requirements. The existing Benefits Management approach has been reimplemented using semantic description.In conclusion a proposed framework has been developed that allows the semantic description of cloud computing services. This approach provides greater expression than simplistic frameworks that use mathematical formulas or models with simple relationships between concepts. The proposed framework is limited to a narrow area of service description and requires expansion to be viable in a commercial setting.Further work sees the development of software toolsets based on the semantic description developed to realise a viable product for mapping high level cloud service requirements to low level cloud resources

Repository@Napier