3 research outputs found
Real-time Mobile Sensor Management Framework for city-scale environmental monitoring
Environmental disasters such as flash floods are becoming more and more
prevalent and carry an increasing burden on human civilization. They are
usually unpredictable, fast in development, and extend across large
geographical areas. The consequences of such disasters can be reduced through
better monitoring, for example using mobile sensing platforms that can give
timely and accurate information to first responders and the public. Given the
extended scale of the areas to monitor, and the time-varying nature of the
phenomenon, we need fast algorithms to quickly determine the best sequence of
locations to be monitored. This problem is very challenging: the present
informative mobile sensor routing algorithms are either short-sighted or
computationally demanding when applied to large scale systems. In this paper, a
real-time sensor task scheduling algorithm that suits the features and needs of
city-scale environmental monitoring tasks is proposed. The algorithm is run in
forward search and makes use of the predictions of an associated distributed
parameter system, modeling flash flood propagation. It partly inherits the
causal relation expressed by a search tree, which describes all possible
sequential decisions. The computationally heavy data assimilation steps in the
forward search tree are replaced by functions dependent on the covariance
matrix between observation sets. Taking flood tracking in an urban area as a
concrete example, numerical experiments in this paper indicate that this
scheduling algorithm can achieve better results than myopic planning algorithms
and other heuristics based sensor placement algorithms. Furthermore, this paper
relies on a deep learning-based data-driven model to track the system states,
and experiments suggest that popular estimation techniques have very good
performance when applied to precise data-driven models.Comment: for associated data and code, see
https://drive.google.com/drive/folders/1gRz4T2KGFXtlnSugarfUL8r355cXb7Ko?usp=sharin
Greed is Good: Exploration and Exploitation Trade-offs in Bayesian Optimisation
The performance of acquisition functions for Bayesian optimisation is
investigated in terms of the Pareto front between exploration and exploitation.
We show that Expected Improvement and the Upper Confidence Bound always select
solutions to be expensively evaluated on the Pareto front, but Probability of
Improvement is never guaranteed to do so and Weighted Expected Improvement does
only for a restricted range of weights. We introduce two novel
-greedy acquisition functions. Extensive empirical evaluation of
these together with random search, purely exploratory and purely exploitative
search on 10 benchmark problems in 1 to 10 dimensions shows that
-greedy algorithms are generally at least as effective as
conventional acquisition functions, particularly with a limited budget. In
higher dimensions -greedy approaches are shown to have improved
performance over conventional approaches. These results are borne out on a real
world computational fluid dynamics optimisation problem and a robotics active
learning problem.Comment: Submitted to ACM Transactions on Evolutionary Learning and
Optimization (TELO). 19 pages (main paper) + 18 pages (supplementary
material
Scalable Statistical Modeling and Query Processing over Large Scale Uncertain Databases
The past decade has witnessed a large number of novel applications that generate imprecise, uncertain and incomplete data. Examples include monitoring infrastructures such as RFIDs, sensor networks and web-based applications such as information extraction, data integration, social networking and so on. In my dissertation, I addressed several challenges in managing such data and developed algorithms for efficiently executing queries over large volumes of such data. Specifically, I focused on the following challenges.
First, for meaningful analysis of such data, we need the ability to remove noise and infer useful information from uncertain data. To address this challenge, I first developed a declarative system for applying dynamic probabilistic models to databases and data streams. The output of such probabilistic modeling is probabilistic data, i.e., data annotated with probabilities of correctness/existence. Often, the data also exhibits strong correlations. Although there is prior work in managing and querying such probabilistic data using probabilistic databases, those approaches largely assume independence and cannot handle probabilistic data with rich correlation structures. Hence, I built a probabilistic database system that can manage large-scale correlations and developed algorithms for efficient query evaluation. Our system allows users to provide uncertain data as input and to specify arbitrary correlations among the entries in the database. In the back end, we represent correlations as a forest of junction trees, an alternative representation for probabilistic graphical models (PGM). We execute queries over the probabilistic database by transforming them into message passing algorithms (inference) over the junction tree. However, traditional algorithms over junction trees typically require accessing the entire tree, even for small queries. Hence, I developed an index data structure over the junction tree called INDSEP that allows us to circumvent this process and thereby scalably evaluate inference queries, aggregation queries and SQL queries over the probabilistic database.
Finally, query evaluation in probabilistic databases typically returns output tuples along with their probability values. However, the existing query evaluation model provides very little intuition to the users: for instance, a user might want to know Why is this tuple in my result? or Why does this output tuple have such high probability? or Which are the most influential input tuples for my query ?'' Hence, I designed a query evaluation model, and a suite of algorithms, that provide users with explanations for query results, and enable users to perform sensitivity analysis to better understand the query results