1,140 research outputs found

    Explain3D: Explaining Disagreements in Disjoint Datasets

    Get PDF
    Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

    Software Development of Automatic Data Collector for Bus Route Planning System

    Get PDF
    Public transportation is important issue in Taiwan. Recently, mobile application named Bus Route Planning was developed to help the user to get information about public transportation using bus. But, this application often gave the user inaccurate bus information and this application has less attractive GUI. To overcome those 2 problems, it needed 2 kinds of solutions. First, a more accurate time prediction algorithm is needed to predict the arrival time of bus. Second, augmented reality technology can be used to make a GUI improvement. In this research, Automatic Data Collector system was proposed to give support for those 2 solutions at once. This proposed system has 3 main functionalities. First, data collector function to provide some data sets that can be further analyzed as an base of time prediction algorithm. Second, data updater functions to provide the most updated bus information for used in augmented reality system. Third, data management function to gave the system better functionality to supported those 2 related systems. This proposed Automatic Data Collector system was developed using batch data processing scenario and SQL native query in Java programming language. The result of testing shown this data processing scenario was very effective to made database manipulation especially for large-sized data

    CPAS/CCM experiences: Perspectives for AI/ES research in accounting

    Get PDF
    https://egrove.olemiss.edu/dl_proceedings/1111/thumbnail.jp

    Challenges for MapReduce in Big Data

    Get PDF
    In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research
    corecore