8 research outputs found
A Randomized Greedy Algorithm for Near-Optimal Sensor Scheduling in Large-Scale Sensor Networks
We study the problem of scheduling sensors in a resource-constrained linear
dynamical system, where the objective is to select a small subset of sensors
from a large network to perform the state estimation task. We formulate this
problem as the maximization of a monotone set function under a matroid
constraint. We propose a randomized greedy algorithm that is significantly
faster than state-of-the-art methods. By introducing the notion of curvature
which quantifies how close a function is to being submodular, we analyze the
performance of the proposed algorithm and find a bound on the expected mean
square error (MSE) of the estimator that uses the selected sensors in terms of
the optimal MSE. Moreover, we derive a probabilistic bound on the curvature for
the scenario where{\color{black}{ the measurements are i.i.d. random vectors
with bounded norm.}} Simulation results demonstrate efficacy of the
randomized greedy algorithm in a comparison with greedy and semidefinite
programming relaxation methods
A Randomized Greedy Algorithm for Near-Optimal Sensor Scheduling in Large-Scale Sensor Networks
We study the problem of scheduling sensors in a resource-constrained linear
dynamical system, where the objective is to select a small subset of sensors
from a large network to perform the state estimation task. We formulate this
problem as the maximization of a monotone set function under a matroid
constraint. We propose a randomized greedy algorithm that is significantly
faster than state-of-the-art methods. By introducing the notion of curvature
which quantifies how close a function is to being submodular, we analyze the
performance of the proposed algorithm and find a bound on the expected mean
square error (MSE) of the estimator that uses the selected sensors in terms of
the optimal MSE. Moreover, we derive a probabilistic bound on the curvature for
the scenario where{\color{black}{ the measurements are i.i.d. random vectors
with bounded norm.}} Simulation results demonstrate efficacy of the
randomized greedy algorithm in a comparison with greedy and semidefinite
programming relaxation methods
kD-STR : a method for spatio-temporal data reduction and modelling
Analysing and learning from spatio-temporal datasets is an important process in many domains, including transportation, healthcare and meteorology. In particular, data collected by sensors in the environment allows us to understand and model the processes acting within the environment. Recently, the volume of spatio-temporal data collected has increased significantly, presenting several challenges for data scientists. Methods are therefore needed to reduce the quantity of data that needs to be processed in order to analyse and learn from spatio-temporal datasets. In this article, we present the -Dimensional Spatio-Temporal Reduction method (D-STR) for reducing the quantity of data used to store a dataset whilst enabling multiple types of analysis on the reduced dataset. D-STR uses hierarchical partitioning to find spatio-temporal regions of similar instances, and models the instances within each region to summarise the dataset. We demonstrate the generality of D-STR with three datasets exhibiting different spatio-temporal characteristics and present results for a range of data modelling techniques. Finally, we compare D-STR with other techniques for reducing the volume of spatio-temporal data. Our results demonstrate that D-STR is effective in reducing spatio-temporal data and generalises to datasets that exhibit different properties
Reducing spatio-temporal data : methods and analysis
Analysing and learning from spatio-temporal datasets is an important process in many domains, including transportation, healthcare and meteorology. However, in recent years, the volume of data generated for such datasets has increased significantly. This poses several challenges for data scientists, including increased processing overheads and costs. Thus, several methods have been proposed for reducing the volume of data stored and processed to analyse and learn from these datasets. However, existing methods fail to take advantage of the spatial and temporal autocorrelation present in spatio-temporal data, incur unnecessary overheads when retrieving the data, or fail to retain information about all instances and features.
This thesis introduces several data reduction methods to address these limitations. First, the kD-STR algorithm is introduced, which hierarchically partitions and models the data, thereby reducing the storage overhead of the dataset. This method minimises the storage used and error incurred. Second, this reduction method is adapted for the context of data linking, and an alternative heuristic proposed that minimises error in the features engineered during linking. Third, adapted algorithms are presented for reducing multiple datasets simultaneously, and reducing large datasets in a distributed manner.
Through empirical analysis using real-world datasets, the utility of these algorithms is investigated. The results presented demonstrate the data reduction that can be achieved using these algorithms, as well as the impact of using different spatial referencing systems and modelling techniques. Further analysis is presented that demonstrates the effect of error in location and time, noise and missing data on the data reduction. Combined, the algorithms presented offer an improvement over the state-of-the-art in spatio-temporal data reduction, and the analysis presented demonstrates the results that may be achieved for datasets exhibiting a range of characteristics