8,070 research outputs found
Tracking Human Mobility using WiFi signals
We study six months of human mobility data, including WiFi and GPS traces
recorded with high temporal resolution, and find that time series of WiFi scans
contain a strong latent location signal. In fact, due to inherent stability and
low entropy of human mobility, it is possible to assign location to WiFi access
points based on a very small number of GPS samples and then use these access
points as location beacons. Using just one GPS observation per day per person
allows us to estimate the location of, and subsequently use, WiFi access points
to account for 80\% of mobility across a population. These results reveal a
great opportunity for using ubiquitous WiFi routers for high-resolution outdoor
positioning, but also significant privacy implications of such side-channel
location tracking
Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis
Exploring data requires a fast feedback loop from the analyst to the system,
with a latency below about 10 seconds because of human cognitive limitations.
When data becomes large or analysis becomes complex, sequential computations
can no longer be completed in a few seconds and data exploration is severely
hampered. This article describes a novel computation paradigm called
Progressive Computation for Data Analysis or more concisely Progressive
Analytics, that brings at the programming language level a low-latency
guarantee by performing computations in a progressive fashion. Moving this
progressive computation at the language level relieves the programmer of
exploratory data analysis systems from implementing the whole analytics
pipeline in a progressive way from scratch, streamlining the implementation of
scalable exploratory data analysis systems. This article describes the new
paradigm through a prototype implementation called ProgressiVis, and explains
the requirements it implies through examples.Comment: 10 page
Statistical structures for internet-scale data management
Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability
Reordering Rows for Better Compression: Beyond the Lexicographic Order
Sorting database tables before compressing them improves the compression
rate. Can we do better than the lexicographical order? For minimizing the
number of runs in a run-length encoding compression scheme, the best approaches
to row-ordering are derived from traveling salesman heuristics, although there
is a significant trade-off between running time and compression. A new
heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades
off compression for a major running-time speedup, is a good option for very
large tables. However, for some compression schemes, it is more important to
generate long runs rather than few runs. For this case, another novel
heuristic, Vortex, is promising. We find that we can improve run-length
encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%:
these gains are on top of the gains due to lexicographically sorting the table.
We prove that the new row reordering is optimal (within 10%) at minimizing the
runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD
NetLSD: Hearing the Shape of a Graph
Comparison among graphs is ubiquitous in graph analytics. However, it is a
hard task in terms of the expressiveness of the employed similarity measure and
the efficiency of its computation. Ideally, graph comparison should be
invariant to the order of nodes and the sizes of compared graphs, adaptive to
the scale of graph patterns, and scalable. Unfortunately, these properties have
not been addressed together. Graph comparisons still rely on direct approaches,
graph kernels, or representation-based methods, which are all inefficient and
impractical for large graph collections.
In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD):
the first, to our knowledge, permutation- and size-invariant, scale-adaptive,
and efficiently computable graph representation method that allows for
straightforward comparisons of large graphs. NetLSD extracts a compact
signature that inherits the formal properties of the Laplacian spectrum,
specifically its heat or wave kernel; thus, it hears the shape of a graph. Our
evaluation on a variety of real-world graphs demonstrates that it outperforms
previous works in both expressiveness and efficiency.Comment: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, August 19--23, 2018, London, United Kingdo
- …