1,140 research outputs found
Explain3D: Explaining Disagreements in Disjoint Datasets
Data plays an important role in applications, analytic processes, and many
aspects of human activity. As data grows in size and complexity, we are met
with an imperative need for tools that promote understanding and explanations
over data-related operations. Data management research on explanations has
focused on the assumption that data resides in a single dataset, under one
common schema. But the reality of today's data is that it is frequently
un-integrated, coming from different sources with different schemas. When
different datasets provide different answers to semantically similar questions,
understanding the reasons for the discrepancies is challenging and cannot be
handled by the existing single-dataset solutions.
In this paper, we propose Explain3D, a framework for explaining the
disagreements across disjoint datasets (3D). Explain3D focuses on identifying
the reasons for the differences in the results of two semantically similar
queries operating on two datasets with potentially different schemas. Our
framework leverages the queries to perform a semantic mapping across the
relevant parts of their provenance; discrepancies in this mapping point to
causes of the queries' differences. Exploiting the queries gives Explain3D an
edge over traditional schema matching and record linkage techniques, which are
query-agnostic. Our work makes the following contributions: (1) We formalize
the problem of deriving optimal explanations for the differences of the results
of semantically similar queries over disjoint datasets. (2) We design a 3-stage
framework for solving the optimal explanation problem. (3) We develop a
smart-partitioning optimizer that improves the efficiency of the framework by
orders of magnitude. (4)~We experiment with real-world and synthetic data to
demonstrate that Explain3D can derive precise explanations efficiently
Software Development of Automatic Data Collector for Bus Route Planning System
Public transportation is important issue in Taiwan. Recently, mobile application named Bus Route Planning was developed to help the user to get information about public transportation using bus. But, this application often gave the user inaccurate bus information and this application has less attractive GUI. To overcome those 2 problems, it needed 2 kinds of solutions. First, a more accurate time prediction algorithm is needed to predict the arrival time of bus. Second, augmented reality technology can be used to make a GUI improvement. In this research, Automatic Data Collector system was proposed to give support for those 2 solutions at once. This proposed system has 3 main functionalities. First, data collector function to provide some data sets that can be further analyzed as an base of time prediction algorithm. Second, data updater functions to provide the most updated bus information for used in augmented reality system. Third, data management function to gave the system better functionality to supported those 2 related systems. This proposed Automatic Data Collector system was developed using batch data processing scenario and SQL native query in Java programming language. The result of testing shown this data processing scenario was very effective to made database manipulation especially for large-sized data
CPAS/CCM experiences: Perspectives for AI/ES research in accounting
https://egrove.olemiss.edu/dl_proceedings/1111/thumbnail.jp
Challenges for MapReduce in Big Data
In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research
- …