2 research outputs found

    Leveraging Structural and Semantic Measures for JSON Document Clustering

    Get PDF
    In recent years, the increased use of smart devices and digital business opportunities has generated massive heterogeneous JSON data daily, making efficient data storage and management more difficult. Existing research uses different similarity metrics and clusters the documents to support the above tasks effectively. However, extant approaches have focused on either structural or semantic similarity of schemas. As JSON documents are application-specific, differently annotated JSON schemas are not only structurally heterogeneous but also differ by the context of the JSON attributes. Therefore, there is a need to consider the structural, semantic, and contextual properties of JSON schemas to perform meaningful clustering of JSON documents. This work proposes an approach to cluster heterogeneous JSON documents using the similarity fusion method. The similarity fusion matrix is constructed using structural, semantic, and contextual measures of JSON schemas. The experimental results demonstrate that the proposed approach outperforms the existing approaches significantly.&nbsp

    A theoretical exploration of data management and integration in organisation sectors

    Get PDF
    Big data development is a disturbing issue that will affect enterprise across various sectors. The increase of data volume, high speed of data generation and increasing rate of different data from heterogeneous sources have led to difficulties in data management. This paper first reviews different aspects of big data management, including data integration and traditional data warehouse, and their associated challenges. The problems include increase of redundant data, data accessibility, time consumption in data modelling and data movement from heterogeneous sources into a central database, especially in the big data environment. We then propose a logical data management approach using RESTview technology to integrate and analyse data, without fully adopting traditional ETL processes. Data that for governance, corporate, security or other restriction reasons cannot be copied or moved, can easily be accessed, integrated and analysed, without creating a central repository. Data can be kept in its original form and location, eliminating the movement of data, significantly speeding up the process and allowing for live data interrogation. It may not be the practical solution for every situation but, it is a feasible solution that is comparably cost effectiv
    corecore