125,092 research outputs found
Processing and analysis of large volumes of satellite-derived thermal infrared data
Reducing the large volume of TIROS-N series advanced very high resolution radiometer-derived data to a practical size for application to regional physcial oceanographic studies is a formidable task. Such data exist on a global basis for January 1979 to the present at approximately 4-km resolution (global area coverage data, ≈2 passes per day) and in selected areas at high resolution (local area coverage and high-resolution picture transmission data, at ≈1-km resolution) for the same period. An approach that has been successful for a number of studies off the east coast of the United States divided the processing into two procedures: preprocessing and data reduction. The preprocessing procedure can reduce the data volume per satellite pass by over 98% for full-resolution data or by ≈84% for the lower-resolution data while the number of passes remains unchanged. The output of the preprocessing procedure for the examples presented is a set of sea surface temperature (SST) fields of 512 × 1024 pixels covering a region of approximately 2000 × 4000 km. In the data reduction procedure the number of SST fields (beginning with one per satellite pass) is generally reduced to a number manageable from the analyst's perspective (of the order of one SST field per day). This is done in most of the applications presented by compositing the data into 1- or 2-day groups. The phenomena readily addressed by such procedures are the mean position of the Gulf Stream, the envelope of Gulf Stream meandering, cold core Gulf Stream ring trajectories, statistics on diurnal warming, and the region and period of 18°C water formation. The flexibility of this approach to regional oceanographic problems will certainly extend the list of applications quickly
Almost Optimal Streaming Algorithms for Coverage Problems
Maximum coverage and minimum set cover problems --collectively called
coverage problems-- have been studied extensively in streaming models. However,
previous research not only achieve sub-optimal approximation factors and space
complexities, but also study a restricted set arrival model which makes an
explicit or implicit assumption on oracle access to the sets, ignoring the
complexity of reading and storing the whole set at once. In this paper, we
address the above shortcomings, and present algorithms with improved
approximation factor and improved space complexity, and prove that our results
are almost tight. Moreover, unlike most of previous work, our results hold on a
more general edge arrival model. More specifically, we present (almost) optimal
approximation algorithms for maximum coverage and minimum set cover problems in
the streaming model with an (almost) optimal space complexity of
, i.e., the space is {\em independent of the size of the sets or
the size of the ground set of elements}. These results not only improve over
the best known algorithms for the set arrival model, but also are the first
such algorithms for the more powerful {\em edge arrival} model. In order to
achieve the above results, we introduce a new general sketching technique for
coverage functions: This sketching scheme can be applied to convert an
-approximation algorithm for a coverage problem to a
(1-\eps)\alpha-approximation algorithm for the same problem in streaming, or
RAM models. We show the significance of our sketching technique by ruling out
the possibility of solving coverage problems via accessing (as a black box) a
(1 \pm \eps)-approximate oracle (e.g., a sketch function) that estimates the
coverage function on any subfamily of the sets
Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover
Set cover, over a universe of size , may be modelled as a data-streaming
problem, where the sets that comprise the instance are to be read one by
one. A semi-streaming algorithm is allowed only space to process this stream. For each , we give a very
simple deterministic algorithm that makes passes over the input stream and
returns an appropriately certified -approximation to the
optimum set cover. More importantly, we proceed to show that this approximation
factor is essentially tight, by showing that a factor better than
is unachievable for a -pass semi-streaming
algorithm, even allowing randomisation. In particular, this implies that
achieving a -approximation requires
passes, which is tight up to the factor. These results extend to a
relaxation of the set cover problem where we are allowed to leave an
fraction of the universe uncovered: the tight bounds on the best
approximation factor achievable in passes turn out to be
. Our lower bounds are based
on a construction of a family of high-rank incidence geometries, which may be
thought of as vast generalisations of affine planes. This construction, based
on algebraic techniques, appears flexible enough to find other applications and
is therefore interesting in its own right.Comment: 20 page
An Efficient Streaming Algorithm for the Submodular Cover Problem
We initiate the study of the classical Submodular Cover (SC) problem in the
data streaming model which we refer to as the Streaming Submodular Cover (SSC).
We show that any single pass streaming algorithm using sublinear memory in the
size of the stream will fail to provide any non-trivial approximation
guarantees for SSC. Hence, we consider a relaxed version of SSC, where we only
seek to find a partial cover.
We design the first Efficient bicriteria Submodular Cover Streaming
(ESC-Streaming) algorithm for this problem, and provide theoretical guarantees
for its performance supported by numerical evidence. Our algorithm finds
solutions that are competitive with the near-optimal offline greedy algorithm
despite requiring only a single pass over the data stream. In our numerical
experiments, we evaluate the performance of ESC-Streaming on active set
selection and large-scale graph cover problems.Comment: To appear in NIPS'1
Graph Sample and Hold: A Framework for Big-Graph Analytics
Sampling is a standard approach in big-graph analytics; the goal is to
efficiently estimate the graph properties by consulting a sample of the whole
population. A perfect sample is assumed to mirror every property of the whole
population. Unfortunately, such a perfect sample is hard to collect in complex
populations such as graphs (e.g. web graphs, social networks etc), where an
underlying network connects the units of the population. Therefore, a good
sample will be representative in the sense that graph properties of interest
can be estimated with a known degree of accuracy. While previous work focused
particularly on sampling schemes used to estimate certain graph properties
(e.g. triangle count), much less is known for the case when we need to estimate
various graph properties with the same sampling scheme. In this paper, we
propose a generic stream sampling framework for big-graph analytics, called
Graph Sample and Hold (gSH). To begin, the proposed framework samples from
massive graphs sequentially in a single pass, one edge at a time, while
maintaining a small state. We then show how to produce unbiased estimators for
various graph properties from the sample. Given that the graph analysis
algorithms will run on a sample instead of the whole population, the runtime
complexity of these algorithm is kept under control. Moreover, given that the
estimators of graph properties are unbiased, the approximation error is kept
under control. Finally, we show the performance of the proposed framework (gSH)
on various types of graphs, such as social graphs, among others
Streaming Algorithms for Submodular Function Maximization
We consider the problem of maximizing a nonnegative submodular set function
subject to a -matchoid
constraint in the single-pass streaming setting. Previous work in this context
has considered streaming algorithms for modular functions and monotone
submodular functions. The main result is for submodular functions that are {\em
non-monotone}. We describe deterministic and randomized algorithms that obtain
a -approximation using -space, where is
an upper bound on the cardinality of the desired set. The model assumes value
oracle access to and membership oracles for the matroids defining the
-matchoid constraint.Comment: 29 pages, 7 figures, extended abstract to appear in ICALP 201
Semi-Streaming Set Cover
This paper studies the set cover problem under the semi-streaming model. The
underlying set system is formalized in terms of a hypergraph whose
edges arrive one-by-one and the goal is to construct an edge cover with the objective of minimizing the cardinality (or cost in the weighted
case) of . We consider a parameterized relaxation of this problem, where
given some , the goal is to construct an edge -cover, namely, a subset of edges incident to all but an
-fraction of the vertices (or their benefit in the weighted case).
The key limitation imposed on the algorithm is that its space is limited to
(poly)logarithmically many bits per vertex.
Our main result is an asymptotically tight trade-off between and
the approximation ratio: We design a semi-streaming algorithm that on input
graph , constructs a succinct data structure such that for
every , an edge -cover that approximates
the optimal edge \mbox{(-)cover} within a factor of can be
extracted from (efficiently and with no additional space
requirements), where In particular for the traditional
set cover problem we obtain an -approximation. This algorithm is
proved to be best possible by establishing a family (parameterized by
) of matching lower bounds.Comment: Full version of the extended abstract that will appear in Proceedings
of ICALP 2014 track
- …