6,137 research outputs found
Dynamic set kNN self-join
In many applications, data objects can be represented as sets. For example, in video on-demand and social network services, the user data consists of a set of movies that have been watched and a set of users (friends), respectively, and they can be used for recommendation and information extraction. The problem of set similarity self-join hence has been studied extensively. Existing studies assume that sets are static, but in the above applications, sets are dynamically updated, and this requires continuous updating the join result. In this paper, we study a novel problem, dynamic set kNN self-join, i.e., for each set, we continuously compute its k nearest neighbor sets. Our problem poses a challenge for the efficiency of computation, because just an element insertion (deletion) into (from) a set may affect the kNN results of many sets. To address this challenge, we first investigate the property of the dynamic set kNN self-join problem to observe the search space derived from a set update. Then, based on this observation, we propose an efficient algorithm. This algorithm employs an indexing technique that enables incremental similarity computation and prunes unnecessary similarity computation. Our empirical studies using real datasets show the efficiency and scalability of our algorithm.Amagata D., Hara T., Xiao C.. Dynamic set kNN self-join. Proceedings - International Conference on Data Engineering 2019-April, 818 (2019); https://doi.org/10.1109/ICDE.2019.00078
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Collusion in Peer-to-Peer Systems
Peer-to-peer systems have reached a widespread use, ranging from academic and industrial applications to home entertainment. The key advantage of this paradigm lies in its scalability and flexibility, consequences of the participants sharing their resources for the common welfare. Security in such systems is a desirable goal. For example, when mission-critical operations or bank transactions are involved, their effectiveness strongly depends on the perception that users have about the system dependability and trustworthiness. A major threat to the security of these systems is the phenomenon of collusion. Peers can be selfish colluders, when they try to fool the system to gain unfair advantages over other peers, or malicious, when their purpose is to subvert the system or disturb other users. The problem, however, has received so far only a marginal attention by the research community. While several solutions exist to counter attacks in peer-to-peer systems, very few of them are meant to directly counter colluders and their attacks. Reputation, micro-payments, and concepts of game theory are currently used as the main means to obtain fairness in the usage of the resources. Our goal is to provide an overview of the topic by examining the key issues involved. We measure the relevance of the problem in the current literature and the effectiveness of existing philosophies against it, to suggest fruitful directions in the further development of the field
- …