1,337 research outputs found
REX: Recursive, Delta-Based Data-Centric Computation
In today's Web and social network environments, query workloads include ad
hoc and OLAP queries, as well as iterative algorithms that analyze data
relationships (e.g., link analysis, clustering, learning). Modern DBMSs support
ad hoc and OLAP queries, but most are not robust enough to scale to large
clusters. Conversely, "cloud" platforms like MapReduce execute chains of batch
tasks across clusters in a fault tolerant way, but have too much overhead to
support ad hoc queries.
Moreover, both classes of platform incur significant overhead in executing
iterative data analysis algorithms. Most such iterative algorithms repeatedly
refine portions of their answers, until some convergence criterion is reached.
However, general cloud platforms typically must reprocess all data in each
step. DBMSs that support recursive SQL are more efficient in that they
propagate only the changes in each step -- but they still accumulate each
iteration's state, even if it is no longer useful. User-defined functions are
also typically harder to write for DBMSs than for cloud platforms.
We seek to unify the strengths of both styles of platforms, with a focus on
supporting iterative computations in which changes, in the form of deltas, are
propagated from iteration to iteration, and state is efficiently updated in an
extensible way. We present a programming model oriented around deltas, describe
how we execute and optimize such programs in our REX runtime system, and
validate that our platform also handles failures gracefully. We experimentally
validate our techniques, and show speedups over the competing methods ranging
from 2.5 to nearly 100 times.Comment: VLDB201
Markov Perfect Industry Dynamics with Many Firms
We propose an approximation method for analyzing Ericson and Pakes (1995)-style dynamic models of imperfect competition. We develop a simple algorithm for computing an ``oblivious equilibrium,'' in which each firm is assumed to make decisions based only on its own state and knowledge of the long run average industry state, but where firms ignore current information about competitors' states. We prove that, as the market becomes large, if the equilibrium distribution of firm states obeys a certain ``light-tail'' condition, then oblivious equilibria closely approximate Markov perfect equilibria. We develop bounds that can be computed to assess the accuracy of the approximation for any given applied problem. Through computational experiments, we find that the method often generates useful approximations for industries with hundreds of firms and in some cases even tens of firms.
Big Data and the Internet of Things
Advances in sensing and computing capabilities are making it possible to
embed increasing computing power in small devices. This has enabled the sensing
devices not just to passively capture data at very high resolution but also to
take sophisticated actions in response. Combined with advances in
communication, this is resulting in an ecosystem of highly interconnected
devices referred to as the Internet of Things - IoT. In conjunction, the
advances in machine learning have allowed building models on this ever
increasing amounts of data. Consequently, devices all the way from heavy assets
such as aircraft engines to wearables such as health monitors can all now not
only generate massive amounts of data but can draw back on aggregate analytics
to "improve" their performance over time. Big data analytics has been
identified as a key enabler for the IoT. In this chapter, we discuss various
avenues of the IoT where big data analytics either is already making a
significant impact or is on the cusp of doing so. We also discuss social
implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski
(eds.) Big Data Analysis: New algorithms for a new society, Springer Series
on Studies in Big Data, to appea
Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective
Rapid advances in human genomics are enabling researchers to gain a better
understanding of the role of the genome in our health and well-being,
stimulating hope for more effective and cost efficient healthcare. However,
this also prompts a number of security and privacy concerns stemming from the
distinctive characteristics of genomic data. To address them, a new research
community has emerged and produced a large number of publications and
initiatives.
In this paper, we rely on a structured methodology to contextualize and
provide a critical analysis of the current knowledge on privacy-enhancing
technologies used for testing, storing, and sharing genomic data, using a
representative sample of the work published in the past decade. We identify and
discuss limitations, technical challenges, and issues faced by the community,
focusing in particular on those that are inherently tied to the nature of the
problem and are harder for the community alone to address. Finally, we report
on the importance and difficulty of the identified challenges based on an
online survey of genome data privacy expertsComment: To appear in the Proceedings on Privacy Enhancing Technologies
(PoPETs), Vol. 2019, Issue
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Overlapping of Communication and Computation and Early Binding: Fundamental Mechanisms for Improving Parallel Performance on Clusters of Workstations
This study considers software techniques for improving performance on clusters of workstations and approaches for designing message-passing middleware that facilitate scalable, parallel processing. Early binding and overlapping of communication and computation are identified as fundamental approaches for improving parallel performance and scalability on clusters. Currently, cluster computers using the Message-Passing Interface for interprocess communication are the predominant choice for building high-performance computing facilities, which makes the findings of this work relevant to a wide audience from the areas of high-performance computing and parallel processing. The performance-enhancing techniques studied in this work are presently underutilized in practice because of the lack of adequate support by existing message-passing libraries and are also rarely considered by parallel algorithm designers. Furthermore, commonly accepted methods for performance analysis and evaluation of parallel systems omit these techniques and focus primarily on more obvious communication characteristics such as latency and bandwidth. This study provides a theoretical framework for describing early binding and overlapping of communication and computation in models for parallel programming. This framework defines four new performance metrics that facilitate new approaches for performance analysis of parallel systems and algorithms. This dissertation provides experimental data that validate the correctness and accuracy of the performance analysis based on the new framework. The theoretical results of this performance analysis can be used by designers of parallel system and application software for assessing the quality of their implementations and for predicting the effective performance benefits of early binding and overlapping. This work presents MPI/Pro, a new MPI implementation that is specifically optimized for clusters of workstations interconnected with high-speed networks. This MPI implementation emphasizes features such as persistent communication, asynchronous processing, low processor overhead, and independent message progress. These features are identified as critical for delivering maximum performance to applications. The experimental section of this dissertation demonstrates the capability of MPI/Pro to facilitate software techniques that result in significant application performance improvements. Specific demonstrations with Virtual Interface Architecture and TCP/IP over Ethernet are offered
- …