46 research outputs found

    Towards Comfortable Cycling: A Practical Approach to Monitor the Conditions in Cycling Paths

    Full text link
    This is a no brainer. Using bicycles to commute is the most sustainable form of transport, is the least expensive to use and are pollution-free. Towns and cities have to be made bicycle-friendly to encourage their wide usage. Therefore, cycling paths should be more convenient, comfortable, and safe to ride. This paper investigates a smartphone application, which passively monitors the road conditions during cyclists ride. To overcome the problems of monitoring roads, we present novel algorithms that sense the rough cycling paths and locate road bumps. Each event is detected in real time to improve the user friendliness of the application. Cyclists may keep their smartphones at any random orientation and placement. Moreover, different smartphones sense the same incident dissimilarly and hence report discrepant sensor values. We further address the aforementioned difficulties that limit such crowd-sourcing application. We evaluate our sensing application on cycling paths in Singapore, and show that it can successfully detect such bad road conditions.Comment: 6 pages, 5 figures, Accepted by IEEE 4th World Forum on Internet of Things (WF-IoT) 201

    Structure-Aware Sampling: Flexible and Accurate Summarization

    Full text link
    In processing large quantities of data, a fundamental problem is to obtain a summary which supports approximate query answering. Random sampling yields flexible summaries which naturally support subset-sum queries with unbiased estimators and well-understood confidence bounds. Classic sample-based summaries, however, are designed for arbitrary subset queries and are oblivious to the structure in the set of keys. The particular structure, such as hierarchy, order, or product space (multi-dimensional), makes range queries much more relevant for most analysis of the data. Dedicated summarization algorithms for range-sum queries have also been extensively studied. They can outperform existing sampling schemes in terms of accuracy on range queries per summary size. Their accuracy, however, rapidly degrades when, as is often the case, the query spans multiple ranges. They are also less flexible - being targeted for range sum queries alone - and are often quite costly to build and use. In this paper we propose and evaluate variance optimal sampling schemes that are structure-aware. These summaries improve over the accuracy of existing structure-oblivious sampling schemes on range queries while retaining the benefits of sample-based summaries: flexible summaries, with high accuracy on both range queries and arbitrary subset queries

    Histograms and Wavelets on Probabilistic Data

    Full text link
    There is a growing realization that uncertain information is a first-class citizen in modern database management. As such, we need techniques to correctly and efficiently process uncertain data in database systems. In particular, data reduction techniques that can produce concise, accurate synopses of large probabilistic relations are crucial. Similar to their deterministic relation counterparts, such compact probabilistic data synopses can form the foundation for human understanding and interactive data exploration, probabilistic query planning and optimization, and fast approximate query processing in probabilistic database systems. In this paper, we introduce definitions and algorithms for building histogram- and wavelet-based synopses on probabilistic data. The core problem is to choose a set of histogram bucket boundaries or wavelet coefficients to optimize the accuracy of the approximate representation of a collection of probabilistic tuples under a given error metric. For a variety of different error metrics, we devise efficient algorithms that construct optimal or near optimal B-term histogram and wavelet synopses. This requires careful analysis of the structure of the probability distributions, and novel extensions of known dynamic-programming-based techniques for the deterministic domain. Our experiments show that this approach clearly outperforms simple ideas, such as building summaries for samples drawn from the data distribution, while taking equal or less time

    Doctor of Philosophy

    Get PDF
    dissertationWe are living in an age where data are being generated faster than anyone has previously imagined across a broad application domain, including customer studies, social media, sensor networks, and the sciences, among many others. In some cases, data are generated in massive quantities as terabytes or petabytes. There have been numerous emerging challenges when dealing with massive data, including: (1) the explosion in size of data; (2) data have increasingly more complex structures and rich semantics, such as representing temporal data as a piecewise linear representation; (3) uncertain data are becoming a common occurrence for numerous applications, e.g., scientific measurements or observations such as meteorological measurements; (4) and data are becoming increasingly distributed, e.g., distributed data collected and integrated from distributed locations as well as data stored in a distributed file system within a cluster. Due to the massive nature of modern data, it is oftentimes infeasible for computers to efficiently manage and query them exactly. An attractive alternative is to use data summarization techniques to construct data summaries, where even efficiently constructing data summaries is a challenging task given the enormous size of data. The data summaries we focus on in this thesis include the histogram and ranking operator. Both data summaries enable us to summarize a massive dataset to a more succinct representation which can then be used to make queries orders of magnitude more efficient while still allowing approximation guarantees on query answers. Our study has focused on the critical task of designing efficient algorithms to summarize, query, and manage massive data

    A Survey of Model-based Sensor Data Acquisition and Management

    Get PDF
    In recent years, due to the proliferation of sensor networks, there has been a genuine need of researching techniques for sensor data acquisition and management. To this end, a large number of techniques have emerged that advocate model-based sensor data acquisition and management. These techniques use mathematical models for performing various, day-to-day tasks involved in managing sensor data. In this chapter, we survey the state-of-the-art techniques for model-based sensor data acquisition and management. We start by discussing the techniques for acquiring sensor data. We, then, discuss the application of models in sensor data cleaning; followed by a discussion on model-based methods for querying sensor data. Lastly, we survey model-based methods proposed for data compression and synopsis generation

    Novel methods for distributed acoustic sensing data

    Get PDF
    In this thesis, we propose novel methods for analysing nonstationary, multivariate time series, focusing in particular on the problems of classification and imputation within this context. Many existing methods for time series classification are static, in that they assign the entire series to one class and do not allow for temporal dependence with the signal. In the first part of this thesis, we propose a computationally efficient extension of an existing dynamic classification method to the online setting. Dependence within the series is captured by adopting the multivariate locally stationary wavelet (mvLSW) framework and the signal is classified at each time point into one of a number of known classes. We apply the method to multivariate acoustic sensing data in order to detect anomalous regions and evaluate the results against alternative methods in the literature. The second part of this thesis considers imputation in multivariate locally stationary time series containing missing values. We first introduce a method for estimating the local wavelet spectral matrix that can be used in the presence of missingness. We then propose a novel method for imputing missing values that uses the local auto and cross-covariance functions of a mvLSW process to perform one step-ahead forecasting and backcasting. The performance of this nonstationary imputation approach is then assessed against competitor methods for simulated examples and a case study involving a dataset from a Carbon Capture and Storage facility. The software that implements this imputation scheme is also described, together with examples of the R package functionality

    Algebraic Approaches for Constructing Multi-D Wavelets

    Get PDF
    Wavelets have been a powerful tool in data representation and had a growing impact on various signal processing applications. As multi-dimensional (multi-D) wavelets are needed in multi-D data representation, the construction methods of multi-D wavelets are of great interest. Tensor product has been the most prevailing method in multi-D wavelet construction, however, there are many limitations of tensor product that make it insufficient in some cases. In this dissertation, we provide three non-tensor-based methods to construct multi-D wavelets. The first method is an alternative to tensor product, called coset sum, to construct multi-D wavelets from a pair of 11-D biorthogonal refinement masks. Coset sum shares many important features of tensor product. It is associated with fast algorithms, which in certain cases, are faster than the tensor product fast algorithms. Moreover, it shows great potentials in image processing applications. The second method is a generalization of coset sum to non-dyadic dilation cases. In particular, we deal with the situations when the dilation matrix is \dil=p{\tt I}_\dm, where pp is a prime number and {\tt I}_\dm is the \dm-D identity matrix, thus we call it the prime coset sum method. Prime coset sum inherits many advantages from coset sum including that it is also associated with fast algorithms. The third method is a relatively more general recipe to construct multi-D wavelets. Different from the first two methods, we attempt to solve the wavelet construction problem as a matrix equation problem. By employing the Quillen-Suslin Theorem in Algebraic Geometry, we are able to build \dm-D wavelets from a single \dm-D refinement mask. This method is more general in the sense that it works for any dilation matrix and does not assume additional constraints on the refinement masks. This dissertation also includes one appendix on the topic of constructing directional wavelet filter banks

    Improving Filtering for Computer Graphics

    Get PDF
    When drawing images onto a computer screen, the information in the scene is typically more detailed than can be displayed. Most objects, however, will not be close to the camera, so details have to be filtered out, or anti-aliased, when the objects are drawn on the screen. I describe new methods for filtering images and shapes with high fidelity while using computational resources as efficiently as possible. Vector graphics are everywhere, from drawing 3D polygons to 2D text and maps for navigation software. Because of its numerous applications, having a fast, high-quality rasterizer is important. I developed a method for analytically rasterizing shapes using wavelets. This approach allows me to produce accurate 2D rasterizations of images and 3D voxelizations of objects, which is the first step in 3D printing. I later improved my method to handle more filters. The resulting algorithm creates higher-quality images than commercial software such as Adobe Acrobat and is several times faster than the most highly optimized commercial products. The quality of texture filtering also has a dramatic impact on the quality of a rendered image. Textures are images that are applied to 3D surfaces, which typically cannot be mapped to the 2D space of an image without introducing distortions. For situations in which it is impossible to change the rendering pipeline, I developed a method for precomputing image filters over 3D surfaces. If I can also change the pipeline, I show that it is possible to improve the quality of texture sampling significantly in real-time rendering while using the same memory bandwidth as used in traditional methods
    corecore