19,815 research outputs found
Achieving the Optimal Steaming Capacity and Delay Using Random Regular Digraphs in P2P Networks
In earlier work, we showed that it is possible to achieve
streaming delay with high probability in a peer-to-peer network, where each
peer has as little as four neighbors, while achieving any arbitrary fraction of
the maximum possible streaming rate. However, the constant in the
delay term becomes rather large as we get closer to the maximum streaming rate.
In this paper, we design an alternative pairing and chunk dissemination
algorithm that allows us to transmit at the maximum streaming rate while
ensuring that all, but a negligible fraction of the peers, receive the data
stream with delay with high probability. The result is established
by examining the properties of graph formed by the union of two or more random
1-regular digraphs, i.e., directed graphs in which each node has an incoming
and an outgoing node degree both equal to one
Hierarchical Partial Planarity
In this paper we consider graphs whose edges are associated with a degree of
{\em importance}, which may depend on the type of connections they represent or
on how recently they appeared in the scene, in a streaming setting. The goal is
to construct layouts of these graphs in which the readability of an edge is
proportional to its importance, that is, more important edges have fewer
crossings. We formalize this problem and study the case in which there exist
three different degrees of importance. We give a polynomial-time testing
algorithm when the graph induced by the two most important sets of edges is
biconnected. We also discuss interesting relationships with other
constrained-planarity problems.Comment: Conference version appeared in WG201
Sublinear Estimation of Weighted Matchings in Dynamic Data Streams
This paper presents an algorithm for estimating the weight of a maximum
weighted matching by augmenting any estimation routine for the size of an
unweighted matching. The algorithm is implementable in any streaming model
including dynamic graph streams. We also give the first constant estimation for
the maximum matching size in a dynamic graph stream for planar graphs (or any
graph with bounded arboricity) using space which also
extends to weighted matching. Using previous results by Kapralov, Khanna, and
Sudan (2014) we obtain a approximation for general graphs
using space in random order streams, respectively. In
addition, we give a space lower bound of for any
randomized algorithm estimating the size of a maximum matching up to a
factor for adversarial streams
Time lower bounds for nonadaptive turnstile streaming algorithms
We say a turnstile streaming algorithm is "non-adaptive" if, during updates,
the memory cells written and read depend only on the index being updated and
random coins tossed at the beginning of the stream (and not on the memory
contents of the algorithm). Memory cells read during queries may be decided
upon adaptively. All known turnstile streaming algorithms in the literature are
non-adaptive.
We prove the first non-trivial update time lower bounds for both randomized
and deterministic turnstile streaming algorithms, which hold when the
algorithms are non-adaptive. While there has been abundant success in proving
space lower bounds, there have been no non-trivial update time lower bounds in
the turnstile model. Our lower bounds hold against classically studied problems
such as heavy hitters, point query, entropy estimation, and moment estimation.
In some cases of deterministic algorithms, our lower bounds nearly match known
upper bounds
Modeling Big Medical Survival Data Using Decision Tree Analysis with Apache Spark
In many medical studies, an outcome of interest is not only whether an event occurred, but when an event occurred; and an example of this is Alzheimer’s disease (AD). Identifying patients with Mild Cognitive Impairment (MCI) who are likely to develop Alzheimer’s disease (AD) is highly important for AD treatment. Previous studies suggest that not all MCI patients will convert to AD. Massive amounts of data from longitudinal and extensive studies on thousands of Alzheimer’s patients have been generated. Building a computational model that can predict conversion form MCI to AD can be highly beneficial for early intervention and treatment planning for AD. This work presents a big data model that contains machine-learning techniques to determine the level of AD in a participant and predict the time of conversion to AD. The proposed framework considers one of the widely used screening assessment for detecting cognitive impairment called Montreal Cognitive Assessment (MoCA). MoCA data set was collected from different centers and integrated into our large data framework storage using a Hadoop Data File System (HDFS); the data was then analyzed using an Apache Spark framework. The accuracy of the proposed framework was compared with a semi-parametric Cox survival analysis model
DROP: Dimensionality Reduction Optimization for Time Series
Dimensionality reduction is a critical step in scaling machine learning
pipelines. Principal component analysis (PCA) is a standard tool for
dimensionality reduction, but performing PCA over a full dataset can be
prohibitively expensive. As a result, theoretical work has studied the
effectiveness of iterative, stochastic PCA methods that operate over data
samples. However, termination conditions for stochastic PCA either execute for
a predetermined number of iterations, or until convergence of the solution,
frequently sampling too many or too few datapoints for end-to-end runtime
improvements. We show how accounting for downstream analytics operations during
DR via PCA allows stochastic methods to efficiently terminate after operating
over small (e.g., 1%) subsamples of input data, reducing whole workload
runtime. Leveraging this, we propose DROP, a DR optimizer that enables speedups
of up to 5x over Singular-Value-Decomposition-based PCA techniques, and exceeds
conventional approaches like FFT and PAA by up to 16x in end-to-end workloads
- …