Search CORE

2 research outputs found

Leveraging Structural and Semantic Measures for JSON Document Clustering

Author: P. Santhi Thilagam
Uma Priya D
Publication venue: 'Pensoft Publishers'
Publication date: 01/01/2023
Field of study

In recent years, the increased use of smart devices and digital business opportunities has generated massive heterogeneous JSON data daily, making efficient data storage and management more difficult. Existing research uses different similarity metrics and clusters the documents to support the above tasks effectively. However, extant approaches have focused on either structural or semantic similarity of schemas. As JSON documents are application-specific, differently annotated JSON schemas are not only structurally heterogeneous but also differ by the context of the JSON attributes. Therefore, there is a need to consider the structural, semantic, and contextual properties of JSON schemas to perform meaningful clustering of JSON documents. This work proposes an approach to cluster heterogeneous JSON documents using the similarity fusion method. The similarity fusion matrix is constructed using structural, semantic, and contextual measures of JSON schemas. The experimental results demonstrate that the proposed approach outperforms the existing approaches significantly.&nbsp

ZENODO

Directory of Open Access Journals

ARPHA Preprints

A theoretical exploration of data management and integration in organisation sectors

Author: Crowe Malcolm
Offia Chisom E.
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 28/02/2019
Field of study

Big data development is a disturbing issue that will affect enterprise across various sectors. The increase of data volume, high speed of data generation and increasing rate of different data from heterogeneous sources have led to difficulties in data management. This paper first reviews different aspects of big data management, including data integration and traditional data warehouse, and their associated challenges. The problems include increase of redundant data, data accessibility, time consumption in data modelling and data movement from heterogeneous sources into a central database, especially in the big data environment. We then propose a logical data management approach using RESTview technology to integrate and analyse data, without fully adopting traditional ETL processes. Data that for governance, corporate, security or other restriction reasons cannot be copied or moved, can easily be accessed, integrated and analysed, without creating a central repository. Data can be kept in its original form and location, eliminating the movement of data, significantly speeding up the process and allowing for live data interrogation. It may not be the practical solution for every situation but, it is a feasible solution that is comparably cost effectiv

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Research Repository and Portal - University of the West of Scotland