25 research outputs found

    RORS: Enhanced Rule-based OWL Reasoning on Spark

    Full text link
    The rule-based OWL reasoning is to compute the deductive closure of an ontology by applying RDF/RDFS and OWL entailment rules. The performance of the rule-based OWL reasoning is often sensitive to the rule execution order. In this paper, we present an approach to enhancing the performance of the rule-based OWL reasoning on Spark based on a locally optimal executable strategy. Firstly, we divide all rules (27 in total) into four main classes, namely, SPO rules (5 rules), type rules (7 rules), sameAs rules (7 rules), and schema rules (8 rules) since, as we investigated, those triples corresponding to the first three classes of rules are overwhelming (e.g., over 99% in the LUBM dataset) in our practical world. Secondly, based on the interdependence among those entailment rules in each class, we pick out an optimal rule executable order of each class and then combine them into a new rule execution order of all rules. Finally, we implement the new rule execution order on Spark in a prototype called RORS. The experimental results show that the running time of RORS is improved by about 30% as compared to Kim & Park's algorithm (2015) using the LUBM200 (27.6 million triples).Comment: 12 page

    Hierarchical Multi-Label Classification Using Web Reasoning for Large Datasets

    Get PDF
    Extracting valuable data among large volumes of data is one of the main challenges in Big Data. In this paper, a Hierarchical Multi-Label Classification process called Semantic HMC is presented. This process aims to extract valuable data from very large data sources, by automatically learning a label hierarchy and classifying data items.The Semantic HMC process is composed of five scalable steps, namely Indexation, Vectorization, Hierarchization, Resolution and Realization. The first three steps construct automatically a label hierarchy from statistical analysis of data. This paper focuses on the last two steps which perform item classification according to the label hierarchy. The process is implemented as a scalable and distributed application, and deployed on a Big Data platform. A quality evaluation is described, which compares the approach with multi-label classification algorithms from the state of the art dedicated to the same goal. The Semantic HMC approach outperforms state of the art approaches in some areas

    Semantic Query Reasoning in Distributed Environment

    Get PDF
    Master's thesis in Computer scienceSemantic Web aims to elevate simple data in WWW to semantic layer, so that knowledge, processed by machine, can be shared more easily. Ontology is one of the key technologies to realize Semantic Web. Semantic reasoning is an important step in Semantic technology. For Ontology developers, semantic reasoning finds out collisions in Ontology definition, and optimizes it; for Ontology users, semantic reasoning retrieves implicit knowledge from known knowledge. The main research of this thesis is reasoning of semantic data querying in distributed environment, which tries to get correct results of semantic data querying, given Ontology definition and data. This research studied two methods: data materialization and query rewriting. Using Amazon cloud computing service and LUBM, we compared these two methods, and have concluded that when size of data to be queried scales up, query rewriting is more feasible than data materialization. Also, based on the conclusion, we developed an application, which manages and queries semantic data in a distributed environment. This application can be used as a prototype of similar applications, and a tool for other Semantic Web researches as well

    Distributed RDF query processing and reasoning for big data / linked data

    Get PDF
    Title from PDF of title page, viewed on August 27, 2014Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 61-65)Thesis (M. S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2014The Linked Data Movement is aimed at converting unstructured and semi-structured data on the documents to semantically connected documents called the "web of data." This is based on Resource Description Framework (RDF) that represents the semantic data and a collection of such statements shapes an RDF graph. SPARQL is a query language designed specifically to query RDF data. Linked Data faces the same challenge that Big Data does. We now lead the way to a new wave of a new paradigm, Big Data and Linked Data that identify massive amounts of data in a connected form. Indeed, utilizing Linked Data and Big Data continue to be in high demand. Therefore, we need a scalable and accessible query system for the reusability and availability of existing web data. However, existing SPAQL query systems are not sufficiently scalable for Big Data and Linked Data. In this thesis, we address an issue of how to improve the scalability and performance of query processing with Big Data / Linked Data. Our aim is to evaluate and assess presently available SPARQL query engines and develop an effective model to query RDF data that should be scalable with reasoning capabilities. We designed an efficient and distributed SPARQL engine using MapReduce (parallel and distributed processing for large data sets on a cluster) and the Apache Cassandra database (scalable and highly available peer to peer distributed database system). We evaluated an existing in-memory based ARQ engine provided by Jena framework and found that it cannot handle large datasets, as it only works based on the in-memory feature of the system. It was shown that the proposed model had powerful reasoning capabilities and dealt efficiently with big datasetsAbstract -- Illistrations -- Tables -- Introduction -- Background and related work -- Graph-store based SPARQL model -- Graph-store based SPARQL model implementation -- Results and evaluation -- Conclusion and future work -- Reference

    Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data

    Get PDF
    With the rapid advancement of computing technologies, various scientific research communities have been extensively using cloud-based software tools or applications. Cloud-based applications allow users to access software applications from web browsers while relieving them from the installation of any software applications in their desktop environment. For example, Galaxy, GenAP, and iPlant Colaborative are popular cloud-based systems for scientific workflow analysis in the domain of plant Genotyping and Phenotyping. These systems are being used for conducting research, devising new techniques, and sharing the computer assisted analysis results among collaborators. Researchers need to integrate their new workflows/pipelines, tools or techniques with the base system over time. Moreover, large scale data need to be processed within the time-line for more effective analysis. Recently, Big Data technologies are emerging for facilitating large scale data processing with commodity hardware. Among the above-mentioned systems, GenAp is utilizing the Big Data technologies for specific cases only. The structure of such a cloud-based system is highly variable and complex in nature. Software architects and developers need to consider totally different properties and challenges during the development and maintenance phases compared to the traditional business/service oriented systems. Recent studies report that software engineers and data engineers confront challenges to develop analytic tools for supporting large scale and heterogeneous data analysis. Unfortunately, less focus has been given by the software researchers to devise a well-defined methodology and frameworks for flexible design of a cloud system for the Genotyping and Phenotyping domain. To that end, more effective design methodologies and frameworks are an urgent need for cloud based Genotyping and Phenotyping analysis system development that also supports large scale data processing. In our thesis, we conduct a few studies in order to devise a stable reference architecture and modularity model for the software developers and data engineers in the domain of Genotyping and Phenotyping. In the first study, we analyze the architectural changes of existing candidate systems to find out the stability issues. Then, we extract architectural patterns of the candidate systems and propose a conceptual reference architectural model. Finally, we present a case study on the modularity of computation-intensive tasks as an extension of the data-centric development. We show that the data-centric modularity model is at the core of the flexible development of a Genotyping and Phenotyping analysis system. Our proposed model and case study with thousands of images provide a useful knowledge-base for software researchers, developers, and data engineers for cloud based Genotyping and Phenotyping analysis system development

    Selected Problems in Data Driven and Traffic Related Networks

    Get PDF
    In our research we concentrate on networks. The topic of networks has been extensively studied over the last few decades and it is still gaining popularity. In this thesis we study the challenge of gaining an understanding of networks when information about the network is unknown or limited in some way. Initially we consider the challenge of understanding from a vast amount of information what can be used to provide insight into the behaviour of the network, and for this we consider methods and techniques adopted from the social network analysis (SNA) community. Following this, we consider networks that have access to data that is limited in some way and demonstrate that statistical analysis methods can be used to overcome these challenges. Finally, we consider the challenge of having exposure to increasingly less information about the network, and we demonstrate this difficulty by considering the rendezvous problem in a restricted network

    BIM-Based Life Cycle Sustainability Assessment for Buildings

    Get PDF
    In recent years, the progress of digitization in the architecture and construction sectors has produced enormous advances in the automation of analysis and evaluation processes. This is the case with environmental analysis systems, such as the life cycle analysis. Methodology practitioners have found a fundamental ally in the building information modeling platforms, which allow tasks that conventionally consume large amounts of energy and time to be carried out more automatically and efficiently. In this publication, the reader will find some of the latest advances in this area

    An approach to the semantic intelligence cloud

    Get PDF
    Cloud computing is a disruptive technology that aims to provide a utility approach to computing, where users can obtain their required computing resources without investment in infrastructure, computing platforms or services. Cloud computing resources can be obtained from a number internal or external sources. The heterogeneity of cloud service provision makes comparison of services difficult, with further complexity being introduced by a number of provision approaches such as reserved purchase, on-demand provisioning and spot markets.The aim of the research was to develop a semantic framework for cloud computing services which incorporated Cloud Service Agreements, requirements, pricing and Benefits Management.The proposed approach sees the development of an integrated framework where Cloud Service Agreements describe the relationship between cloud service providers and cloud service users. Requirements are developed from agreements and can use the concepts, relationships and assertions provided as requirements. Pricing in turn is established from requirements. Benefits Management is pervasive across the semantic framework developed.The methods used were to provide a comprehensive review of literature to establish a good theoretical basis for the research undertaken. Then problem solving ontology was developed that defined concepts and relationships for the proposed semantic framework. A number of case studies were used to populate the developed ontology with assertions. Reasoning was used to test the framework was correct.The results produced were a proposed framework of concepts, relationships and assertions for a cloud service descriptions, which are presented as ontology in textual and graphical form. Several parts of the ontology were published on public ontology platforms and, in journal and conference papers.The original contribution to knowledge is seen in the results produced. The proposed framework provides the foundations for development of a unified semantic framework for cloud computing service description and has been used by other researchers developing semantic cloud service description.In the area of Cloud Service Agreements a full coverage of the documents described by major standards organisations have been encoded into the framework. Requirements have been modelled as a unique multilevel semantic representation. Pricing of cloud services has been developed using semantic description that can be mapped to requirements. The existing Benefits Management approach has been reimplemented using semantic description.In conclusion a proposed framework has been developed that allows the semantic description of cloud computing services. This approach provides greater expression than simplistic frameworks that use mathematical formulas or models with simple relationships between concepts. The proposed framework is limited to a narrow area of service description and requires expansion to be viable in a commercial setting.Further work sees the development of software toolsets based on the semantic description developed to realise a viable product for mapping high level cloud service requirements to low level cloud resources
    corecore