14,469 research outputs found
Efficient Distance Join Query Processing in Distributed Spatial Data Management Systems
Due to the ubiquitous use of spatial data applications and the large amounts of such data these applications use, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Distance Join Queries (DJQs) are important and frequently used operations in numerous applications, including data mining, multimedia and spatial databases. DJQs (e.g., k Nearest Neighbor Join Query, k Closest Pair Query, ε Distance Join Query, etc.) are costly operations, since they involve both the join and distance-based search, and performing DJQs efficiently is a challenging task. Recent Big Data developments have motivated the emergence of novel technologies for distributed processing of large-scale spatial data in clusters of computers, leading to Distributed Spatial Data Management Systems (DSDMSs). Distributed cluster-based computing systems can be classified as Hadoop-based or Spark-based systems. Based on this classification, in this paper, we compare two of the most recent and leading DSDMSs, SpatialHadoop and LocationSpark, by evaluating the performance of several existing and newly proposed parallel and distributed DJQ algorithms under various settings with large spatial real-world datasets. A general conclusion arising from the execution of the distributed DJQ algorithms studied is that, while SpatialHadoop is a robust and efficient system when large spatial datasets are joined (since it is built on top of the mature Hadoop platform), LocationSpark is the clear winner in total execution time efficiency when medium spatial datasets are combined (due to in-memory processing provided by Spark). However, LocationSpark requires higher memory allocation when large spatial datasets are involved in DJQs (even more so when k and ε are large). Finally, this detailed performance study has demonstrated that the new distributed DJQ algorithms we have proposed are efficient, robust and scalable with respect to different parameters, such as dataset sizes, k, ε and number of computing nodes
Hedonic Coalition Formation for Distributed Task Allocation among Wireless Agents
Autonomous wireless agents such as unmanned aerial vehicles or mobile base
stations present a great potential for deployment in next-generation wireless
networks. While current literature has been mainly focused on the use of agents
within robotics or software applications, we propose a novel usage model for
self-organizing agents suited to wireless networks. In the proposed model, a
number of agents are required to collect data from several arbitrarily located
tasks. Each task represents a queue of packets that require collection and
subsequent wireless transmission by the agents to a central receiver. The
problem is modeled as a hedonic coalition formation game between the agents and
the tasks that interact in order to form disjoint coalitions. Each formed
coalition is modeled as a polling system consisting of a number of agents which
move between the different tasks present in the coalition, collect and transmit
the packets. Within each coalition, some agents can also take the role of a
relay for improving the packet success rate of the transmission. The proposed
algorithm allows the tasks and the agents to take distributed decisions to join
or leave a coalition, based on the achieved benefit in terms of effective
throughput, and the cost in terms of delay. As a result of these decisions, the
agents and tasks structure themselves into independent disjoint coalitions
which constitute a Nash-stable network partition. Moreover, the proposed
algorithm allows the agents and tasks to adapt the topology to environmental
changes such as the arrival/removal of tasks or the mobility of the tasks.
Simulation results show how the proposed algorithm improves the performance, in
terms of average player (agent or task) payoff, of at least 30.26% (for a
network of 5 agents with up to 25 tasks) relatively to a scheme that allocates
nearby tasks equally among agents.Comment: to appear, IEEE Transactions on Mobile Computin
Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications
Wireless sensor networks monitor dynamic environments that change rapidly
over time. This dynamic behavior is either caused by external factors or
initiated by the system designers themselves. To adapt to such conditions,
sensor networks often adopt machine learning techniques to eliminate the need
for unnecessary redesign. Machine learning also inspires many practical
solutions that maximize resource utilization and prolong the lifespan of the
network. In this paper, we present an extensive literature review over the
period 2002-2013 of machine learning methods that were used to address common
issues in wireless sensor networks (WSNs). The advantages and disadvantages of
each proposed algorithm are evaluated against the corresponding problem. We
also provide a comparative guide to aid WSN designers in developing suitable
machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
Science Concierge: A fast content-based recommendation system for scientific publications
Finding relevant publications is important for scientists who have to cope
with exponentially increasing numbers of scholarly material. Algorithms can
help with this task as they help for music, movie, and product recommendations.
However, we know little about the performance of these algorithms with
scholarly material. Here, we develop an algorithm, and an accompanying Python
library, that implements a recommendation system based on the content of
articles. Design principles are to adapt to new content, provide near-real time
suggestions, and be open source. We tested the library on 15K posters from the
Society of Neuroscience Conference 2015. Human curated topics are used to cross
validate parameters in the algorithm and produce a similarity metric that
maximally correlates with human judgments. We show that our algorithm
significantly outperformed suggestions based on keywords. The work presented
here promises to make the exploration of scholarly material faster and more
accurate.Comment: 12 pages, 5 figure
Task mapping on a dragonfly supercomputer
The dragonfly network topology has recently gained traction in the design of high performance computing (HPC) systems and has been implemented in large-scale supercomputers. The impact of task mapping, i.e., placement of MPI ranks onto compute cores, on the communication performance of applications on dragonfly networks has not been comprehensively investigated on real large-scale systems. This paper demonstrates that task mapping affects the communication overhead significantly in dragonflies and the magnitude of this effect is sensitive to the application, job size, and the OpenMP settings. Among the three task mapping algorithms we study (in-order, random, and recursive coordinate bisection), selecting a suitable task mapper reduces application communication time by up to 47%
PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development
This paper describes PlinyCompute, a system for development of
high-performance, data-intensive, distributed computing tools and libraries. In
the large, PlinyCompute presents the programmer with a very high-level,
declarative interface, relying on automatic, relational-database style
optimization to figure out how to stage distributed computations. However, in
the small, PlinyCompute presents the capable systems programmer with a
persistent object data model and API (the "PC object model") and associated
memory management system that has been designed from the ground-up for high
performance, distributed, data-intensive computing. This contrasts with most
other Big Data systems, which are constructed on top of the Java Virtual
Machine (JVM), and hence must at least partially cede performance-critical
concerns such as memory management (including layout and de/allocation) and
virtual method/function dispatch to the JVM. This hybrid approach---declarative
in the large, trusting the programmer's ability to utilize PC object model
efficiently in the small---results in a system that is ideal for the
development of reusable, data-intensive tools and libraries. Through extensive
benchmarking, we show that implementing complex objects manipulation and
non-trivial, library-style computations on top of PlinyCompute can result in a
speedup of 2x to more than 50x or more compared to equivalent implementations
on Spark.Comment: 48 pages, including references and Appendi
- …