46 research outputs found
Towards Comfortable Cycling: A Practical Approach to Monitor the Conditions in Cycling Paths
This is a no brainer. Using bicycles to commute is the most sustainable form
of transport, is the least expensive to use and are pollution-free. Towns and
cities have to be made bicycle-friendly to encourage their wide usage.
Therefore, cycling paths should be more convenient, comfortable, and safe to
ride. This paper investigates a smartphone application, which passively
monitors the road conditions during cyclists ride. To overcome the problems of
monitoring roads, we present novel algorithms that sense the rough cycling
paths and locate road bumps. Each event is detected in real time to improve the
user friendliness of the application. Cyclists may keep their smartphones at
any random orientation and placement. Moreover, different smartphones sense the
same incident dissimilarly and hence report discrepant sensor values. We
further address the aforementioned difficulties that limit such crowd-sourcing
application. We evaluate our sensing application on cycling paths in Singapore,
and show that it can successfully detect such bad road conditions.Comment: 6 pages, 5 figures, Accepted by IEEE 4th World Forum on Internet of
Things (WF-IoT) 201
Structure-Aware Sampling: Flexible and Accurate Summarization
In processing large quantities of data, a fundamental problem is to obtain a
summary which supports approximate query answering. Random sampling yields
flexible summaries which naturally support subset-sum queries with unbiased
estimators and well-understood confidence bounds.
Classic sample-based summaries, however, are designed for arbitrary subset
queries and are oblivious to the structure in the set of keys. The particular
structure, such as hierarchy, order, or product space (multi-dimensional),
makes range queries much more relevant for most analysis of the data.
Dedicated summarization algorithms for range-sum queries have also been
extensively studied. They can outperform existing sampling schemes in terms of
accuracy on range queries per summary size. Their accuracy, however, rapidly
degrades when, as is often the case, the query spans multiple ranges. They are
also less flexible - being targeted for range sum queries alone - and are often
quite costly to build and use.
In this paper we propose and evaluate variance optimal sampling schemes that
are structure-aware. These summaries improve over the accuracy of existing
structure-oblivious sampling schemes on range queries while retaining the
benefits of sample-based summaries: flexible summaries, with high accuracy on
both range queries and arbitrary subset queries
Histograms and Wavelets on Probabilistic Data
There is a growing realization that uncertain information is a first-class
citizen in modern database management. As such, we need techniques to correctly
and efficiently process uncertain data in database systems. In particular, data
reduction techniques that can produce concise, accurate synopses of large
probabilistic relations are crucial. Similar to their deterministic relation
counterparts, such compact probabilistic data synopses can form the foundation
for human understanding and interactive data exploration, probabilistic query
planning and optimization, and fast approximate query processing in
probabilistic database systems.
In this paper, we introduce definitions and algorithms for building
histogram- and wavelet-based synopses on probabilistic data. The core problem
is to choose a set of histogram bucket boundaries or wavelet coefficients to
optimize the accuracy of the approximate representation of a collection of
probabilistic tuples under a given error metric. For a variety of different
error metrics, we devise efficient algorithms that construct optimal or near
optimal B-term histogram and wavelet synopses. This requires careful analysis
of the structure of the probability distributions, and novel extensions of
known dynamic-programming-based techniques for the deterministic domain. Our
experiments show that this approach clearly outperforms simple ideas, such as
building summaries for samples drawn from the data distribution, while taking
equal or less time
Doctor of Philosophy
dissertationWe are living in an age where data are being generated faster than anyone has previously imagined across a broad application domain, including customer studies, social media, sensor networks, and the sciences, among many others. In some cases, data are generated in massive quantities as terabytes or petabytes. There have been numerous emerging challenges when dealing with massive data, including: (1) the explosion in size of data; (2) data have increasingly more complex structures and rich semantics, such as representing temporal data as a piecewise linear representation; (3) uncertain data are becoming a common occurrence for numerous applications, e.g., scientific measurements or observations such as meteorological measurements; (4) and data are becoming increasingly distributed, e.g., distributed data collected and integrated from distributed locations as well as data stored in a distributed file system within a cluster. Due to the massive nature of modern data, it is oftentimes infeasible for computers to efficiently manage and query them exactly. An attractive alternative is to use data summarization techniques to construct data summaries, where even efficiently constructing data summaries is a challenging task given the enormous size of data. The data summaries we focus on in this thesis include the histogram and ranking operator. Both data summaries enable us to summarize a massive dataset to a more succinct representation which can then be used to make queries orders of magnitude more efficient while still allowing approximation guarantees on query answers. Our study has focused on the critical task of designing efficient algorithms to summarize, query, and manage massive data
A Survey of Model-based Sensor Data Acquisition and Management
In recent years, due to the proliferation of sensor networks, there has been a genuine need of researching techniques for sensor data acquisition and management. To this end, a large number of techniques have emerged that advocate model-based sensor data acquisition and management. These techniques use mathematical models for performing various, day-to-day tasks involved in managing sensor data. In this chapter, we survey the state-of-the-art techniques for model-based sensor data acquisition and management. We start by discussing the techniques for acquiring sensor data. We, then, discuss the application of models in sensor data cleaning; followed by a discussion on model-based methods for querying sensor data. Lastly, we survey model-based methods proposed for data compression and synopsis generation
Novel methods for distributed acoustic sensing data
In this thesis, we propose novel methods for analysing nonstationary, multivariate time series, focusing in particular on the problems of classification and imputation within this context. Many existing methods for time series classification are static, in that they assign the entire series to one class and do not allow for temporal dependence with the signal. In the first part of this thesis, we propose a computationally efficient extension of an existing dynamic classification method to the online setting. Dependence within the series is captured by adopting the multivariate locally stationary wavelet (mvLSW) framework and the signal is classified at each time point into one of a number of known classes. We apply the method to multivariate acoustic sensing data in order to detect anomalous regions and evaluate the results against alternative methods in the literature. The second part of this thesis considers imputation in multivariate locally stationary time series containing missing values. We first introduce a method for estimating the local wavelet spectral matrix that can be used in the presence of missingness. We then propose a novel method for imputing missing values that uses the local auto and cross-covariance functions of a mvLSW process to perform one step-ahead forecasting and backcasting. The performance of this nonstationary imputation approach is then assessed against competitor methods for simulated examples and a case study involving a dataset from a Carbon Capture and Storage facility. The software that implements this imputation scheme is also described, together with examples of the R package functionality
Algebraic Approaches for Constructing Multi-D Wavelets
Wavelets have been a powerful tool in data representation and had a growing impact on various signal processing applications. As multi-dimensional (multi-D) wavelets are needed in multi-D data representation, the construction methods of multi-D wavelets are of great interest. Tensor product has been the most prevailing method in multi-D wavelet construction, however, there are many limitations of tensor product that make it insufficient in some cases. In this dissertation, we provide three non-tensor-based methods to construct multi-D wavelets. The first method is an alternative to tensor product, called coset sum, to construct multi-D wavelets from a pair of -D biorthogonal refinement masks. Coset sum shares many important features of tensor product. It is associated with fast algorithms, which in certain cases, are faster than the tensor product fast algorithms. Moreover, it shows great potentials in image processing applications. The second method is a generalization of coset sum to non-dyadic dilation cases. In particular, we deal with the situations when the dilation matrix is \dil=p{\tt I}_\dm, where is a prime number and {\tt I}_\dm is the \dm-D identity matrix, thus we call it the prime coset sum method. Prime coset sum inherits many advantages from coset sum including that it is also associated with fast algorithms. The third method is a relatively more general recipe to construct multi-D wavelets. Different from the first two methods, we attempt to solve the wavelet construction problem as a matrix equation problem. By employing the Quillen-Suslin Theorem in Algebraic Geometry, we are able to build \dm-D wavelets from a single \dm-D refinement mask. This method is more general in the sense that it works for any dilation matrix and does not assume additional constraints on the refinement masks.
This dissertation also includes one appendix on the topic of constructing directional wavelet filter banks
Improving Filtering for Computer Graphics
When drawing images onto a computer screen, the information in the scene is typically
more detailed than can be displayed. Most objects, however, will not be close to the
camera, so details have to be filtered out, or anti-aliased, when the objects are drawn on
the screen. I describe new methods for filtering images and shapes with high fidelity while
using computational resources as efficiently as possible.
Vector graphics are everywhere, from drawing 3D polygons to 2D text and maps for
navigation software. Because of its numerous applications, having a fast, high-quality
rasterizer is important. I developed a method for analytically rasterizing shapes using
wavelets. This approach allows me to produce accurate 2D rasterizations of images and
3D voxelizations of objects, which is the first step in 3D printing. I later improved my
method to handle more filters. The resulting algorithm creates higher-quality images than
commercial software such as Adobe Acrobat and is several times faster than the most
highly optimized commercial products.
The quality of texture filtering also has a dramatic impact on the quality of a rendered
image. Textures are images that are applied to 3D surfaces, which typically cannot be
mapped to the 2D space of an image without introducing distortions. For situations in
which it is impossible to change the rendering pipeline, I developed a method for precomputing
image filters over 3D surfaces. If I can also change the pipeline, I show that it
is possible to improve the quality of texture sampling significantly in real-time rendering
while using the same memory bandwidth as used in traditional methods