12 research outputs found

    Incremental, online, and merge mining of partial periodic patterns in time-series databases

    No full text

    TrueWeb: A Proposal for Scalable Semantically-Guided Data Management and Truth Finding in Heterogeneous Web Sources

    No full text
    We envision a responsible web environment, termed TrueWeb, where a user should be able to find out whether any sentence he or she encounters in the web is true or false. The user should be able to track the provenance of any sentence or paragraph in the web. The target of TrueWeb is to compose factual knowledge from Internet resources about any subject of interest and present the collected knowledge in chronological order and distribute facts spatially and temporally as well as assign some belief factor for each fact. Another important target of TrueWeb is to be able to identify whether a statement in the Internet is true or false. The aim is to create an Internet infrastructure that, for each piece of published information, will be able to identify the truthfulness (or the degree of truthfulness) of that piece of information. © 2018 ACM

    Efficient Parallel Skyline Query Processing for High-Dimensional Data

    No full text
    Given a set of multidimensional data points, skyline queries retrieve those points that are not dominated by any other points in the set. Due to the ubiquitous se of skyline queries, such as in preference-based query answering and decision making, and the large amount of data that these queries have to deal with, enabling their scalable processing is of critical importance. However, there are several outstanding challenges that have not been well addressed. More specifically, in this paper, we are tackling the data straggler and data skew challenges introduced by distributed skyline query processing, as well as the ensuing high computation cost of merging skyline candidates. We thus introduce a new efficient three-phase approach for large scale processing of skyline queries. In the first preprocessing phase, the data is partitioned along the Z-order curve. We utilize a novel data partitioning approach that formulates data partitioning as an optimization problem to minimize the size of intermediate data. In the second phase, each compute node partitions the input data points into disjoint subsets, and then performs the skyline computation on each subset to produce skyline candidates in parallel. In the final phase, we build an index and employ an efficient algorithm to merge the generated skyline candidates. Extensive experiments demonstrate that the proposed skyline algorithm achieves more than one order of magnitude enhancement in performance compared to existing state-of-the-art approaches.The authors would like to thank reviewers for their insightful comments on the paper, as these comments led us to an improvement of the work. This publication was made possible by award NPRP4-1534-1-247 from the Qatar National Research Fund (a member of The Qatar Foundation) and by the National Science Foundation Grant IIS 1117766

    ClassView : Hierarchical Video Shot Classification, Indexing, and Accessing

    No full text

    AUDIT: approving and tracking updates with dependencies in collaborative databases

    No full text
    Collaborative databases such as genome databases, often involve extensive curation activities where collaborators need to interact to be able to converge and agree on the content of data. In a typical scenario, a member of the collaboration makes some updates and these become visible to all collaborators for possible comments and modifications. At the same time, these updates are usually pending the approval or rejection from the data custodian based on the related discussion and the content of the data. Unfortunately, the approval and authorization of updates in current databases is based solely on the identity of the user, e.g., via the SQL GRANT and REVOKE commands. In this paper, we present a scalable cloud-based collaborative database system to support collaboration and data curation scenarios. Our system is based on an Update Pending Approval model. In a nutshell, when a collaborator updates a given data item, it is marked as pending approval until the data custodian approves or rejects the update. Until then, any other collaborator can view and comment on the data, pending its approval. We fully realized our system inside HBase, a cloud-based platform. We also conducted extensive experiments showing that the system scales well under different workloads. - 2017, Springer Science+Business Media, LLC.Acknowledgements This publication was made possible by the support of an NPRP Grant 4-1534-1-247 from the the Qatar National Research Fund (a member of Qatar Foundation) and the National Science Foundation under Grants IIS-1117766 and IIS-0964639. The statements made herein are solely the responsibility of the authors.Scopu

    Self-tuning UDF Cost Modeling Using the Memory-Limited Quadtree

    No full text

    COACT: a query interface language for collaborative databases

    No full text
    Data curation activities in collaborative databases mandate that collaborators interact until they converge and agree on the content of their data. In a previous work, we presented a cloud-based collaborative database system that promotes and enables collaboration and data curation scenarios. Our system classifies different versions of a data item to either pending, approved, or rejected. The approval or rejection of a certain version is done by the database Principle Investigators (or PIs) based on its value. Our system also allows collaborators to view the status of each version and help PIs take decisions by providing feedback based on their experiments and/or opinions. Most importantly, our system provided mechanisms for history tracking of different versions to trace the modifications and approval/rejection done by both collaborators and PIs on different versions of a data item. We labeled our system as Update-Pending-Approval model (or UPA). In this paper, we describe a high level SQL query interface language for PIs and collaborators to interact with the UPA framework. We define a set of UPA keywords that are used as a part of the history tracking mechanism to select specific versions of a data item, and a set of UPA options that select specific versions based on possible future decisions of PIs. We implemented our query interface mechanism on top of Apache Phoenix, taking into consideration that the UPA system was implemented on top of Apache HBase. We test the performance of the UPA query language by executing several queries that contain different complexity levels and discuss their results. - 2017, Springer Science+Business Media, LLC.This publication was made possible by the support of an NPRP Grant 4-1534-1-247 from the the Qatar National Research Fund (a member of Qatar Foundation) and the National Science Foundation under Grants IIS-1117766 and IIS-0964639. The statements made herein are solely the responsibility of the authors
    corecore