172 research outputs found
Hidden Markov models for the activity profile of terrorist groups
The main focus of this work is on developing models for the activity profile
of a terrorist group, detecting sudden spurts and downfalls in this profile,
and, in general, tracking it over a period of time. Toward this goal, a
-state hidden Markov model (HMM) that captures the latent states underlying
the dynamics of the group and thus its activity profile is developed. The
simplest setting of corresponds to the case where the dynamics are
coarsely quantized as Active and Inactive, respectively. A state estimation
strategy that exploits the underlying HMM structure is then developed for spurt
detection and tracking. This strategy is shown to track even nonpersistent
changes that last only for a short duration at the cost of learning the
underlying model. Case studies with real terrorism data from open-source
databases are provided to illustrate the performance of the proposed
methodology.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS682 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Decoding visemes: improving machine lip-reading
Abstract
This thesis is about improving machine lip-reading, that is, the classi�cation of
speech from only visual cues of a speaker. Machine lip-reading is a niche research
problem in both areas of speech processing and computer vision.
Current challenges for machine lip-reading fall into two groups: the content of the
video, such as the rate at which a person is speaking or; the parameters of the video
recording for example, the video resolution. We begin our work with a literature
review to understand the restrictions current technology limits machine lip-reading
recognition and conduct an experiment into resolution a�ects. We show that high
de�nition video is not needed to successfully lip-read with a computer.
The term \viseme" is used in machine lip-reading to represent a visual cue or
gesture which corresponds to a subgroup of phonemes where the phonemes are
indistinguishable in the visual speech signal. Whilst a viseme is yet to be formally
de�ned, we use the common working de�nition `a viseme is a group of phonemes
with identical appearance on the lips'. A phoneme is the smallest acoustic unit a
human can utter. Because there are more phonemes per viseme, mapping between
the units creates a many-to-one relationship. Many mappings have been presented,
and we conduct an experiment to determine which mapping produces the most
accurate classi�cation. Our results show Lee's [82] is best. Lee's classi�cation also
outperforms machine lip-reading systems which use the popular Fisher [48] phonemeto-
viseme map.
Further to this, we propose three methods of deriving speaker-dependent phonemeto-
viseme maps and compare our new approaches to Lee's. Our results show the
ii
iii
sensitivity of phoneme clustering and we use our new knowledge for our �rst suggested
augmentation to the conventional lip-reading system.
Speaker independence in machine lip-reading classi�cation is another unsolved
obstacle. It has been observed, in the visual domain, that classi�ers need training
on the test subject to achieve the best classi�cation. Thus machine lip-reading is
highly dependent upon the speaker. Speaker independence is the opposite of this,
or in other words, is the classi�cation of a speaker not present in the classi�er's
training data. We investigate the dependence of phoneme-to-viseme maps between
speakers. Our results show there is not a high variability of visual cues, but there is
high variability in trajectory between visual cues of an individual speaker with the
same ground truth. This implies a dependency upon the number of visemes within
each set for each individual.
Finally, we investigate how many visemes is the optimum number within a set.
We show the phoneme-to-viseme maps in literature rarely have enough visemes
and the optimal number, which varies by speaker, ranges from 11 to 35. The last
di�culty we address is decoding from visemes back to phonemes and into words.
Traditionally this is completed using a language model. The language model unit is
either: the same as the classi�er, e.g. visemes or phonemes; or the language model
unit is words. In a novel approach we use these optimum range viseme sets within
hierarchical training of phoneme labelled classi�ers. This new method of classi�er
training demonstrates signi�cant increase in classi�cation with a word language
network
Understanding the performance of Internet video over residential networks
Video streaming applications are now commonplace among home Internet users, who typically access the Internet using DSL or Cable technologies.
However, the effect of these technologies on video performance, in terms of degradations in video quality, is not well understood.
To enable continued deployment of applications with improved quality of experience for home users, it is essential to understand the nature of network impairments and develop means to overcome them.
In this dissertation, I demonstrate the type of network conditions experienced by Internet video traffic, by presenting a new dataset of the packet level performance of real-time streaming to residential Internet users.
Then, I use these packet level traces to evaluate the performance of commonly used models for packet loss simulation, and finding the models to be insufficient, present a new type of model that more accurately captures the loss behaviour.
Finally, to demonstrate how a better understanding of the network can improve video quality in a real application scenario, I evaluate the performance of forward error correction schemes for Internet video using the measurements.
I show that performance can be poor, devise a new metric to predict performance of error recovery from the characteristics of the input, and validate that the new packet loss model allows more realistic simulations.
For the effective deployment of Internet video systems to users of residential access networks, a firm understanding of these networks is required.
This dissertation provides insights into the packet level characteristics that can be expected from such networks, and techniques to realistically simulate their behaviour, promoting development of future video applications
Bayesian learning for the robust verification of autonomous robots
Autonomous robots used in infrastructure inspection, space exploration and other critical missions operate in highly dynamic environments. As such, they must continually verify their ability to complete the tasks associated with these missions safely and effectively. Here we present a Bayesian learning framework that enables this runtime verification of autonomous robots. The framework uses prior knowledge and observations of the verified robot to learn expected ranges for the occurrence rates of regular and singular (e.g., catastrophic failure) events. Interval continuous-time Markov models defined using these ranges are then analysed to obtain expected intervals of variation for system properties such as mission duration and success probability. We apply the framework to an autonomous robotic mission for underwater infrastructure inspection and repair. The formal proofs and experiments presented in the paper show that our framework produces results that reflect the uncertainty intrinsic to many real-world systems, enabling the robust verification of their quantitative properties under parametric uncertainty
On Computable Protein Functions
Proteins are biological machines that perform the majority of functions necessary for life. Nature has evolved many different proteins, each of which perform a subset of an organism’s functional repertoire. One aim of biology is to solve the sparse high dimensional problem of annotating all proteins with their true functions. Experimental characterisation remains the gold standard for assigning function, but is a major bottleneck due to resource scarcity. In this thesis, we develop a variety of computational methods to predict protein function, reduce the functional search space for proteins, and guide the design of experimental studies. Our methods take two distinct approaches: protein-centric methods that predict the functions of a given protein, and function-centric methods that predict which proteins perform a given function. We applied our methods to help solve a number of open problems in biology. First, we identified new proteins involved in the progression of Alzheimer’s disease using proteomics data of brains from a fly model of the disease. Second, we predicted novel plastic hydrolase enzymes in a large data set of 1.1 billion protein sequences from metagenomes. Finally, we optimised a neural network method that extracts a small number of informative features from protein networks, which we used to predict functions of fission yeast proteins
Recommended from our members
Performance-Based Modeling of Spatial and Temporal Variability of Treated Wastewater Quality for Improved Nutrient Management
The United States Environmental Protection Agency (U.S.EPA) has identified nutrient pollution as the leading cause of use impairment in U.S. waters. Consequently, for improved nutrient management, the U.S.EPA recommends ensuring point sources comply with their permit limits. However, there is limited understanding of compliance with discharge limits, primarily due to great spatial and temporal variability in effluent nutrient concentrations as well as discharge permit limits. Further, the regulatory climate for nutrients is rapidly changing in several states of the country, with the adoption of more stringent discharge limits for wastewater treatment plants. This research presents a performance-based statistical modeling approach to understand the spatial and temporal variability of nutrient compliance (specifically nitrogen species) with changing regulations, in treated wastewaters of the United States. A hierarchical model is built using Generalized Linear Models (GLMs) and Kriging, and effluent ammonia concentrations from Discharge Monthly Report (DMR) data from more than 100 wastewater treatment plants across the US. Compliance with current ammonia permit discharge limits is seen to be determined by the flow rate and its compliance history. The probability, frequency and magnitude of risk of non-compliance with ammonia discharge limits is modeled using GLMs and Extreme Value Theory (EVT). The probability, frequency and magnitude of risk of non-compliance with ammonia discharge limits is found to be determined by both the flow rate and compliance history, in addition to the fractional use of design capacity. Wastewater treatment plant compliance with decreasing ammonia discharge limits is assessed using a regression trees. Once again, the compliance history and the flow rate are seen to affect compliance with both existing and lowered discharge limits. Some states, such as Colorado, are considering broader regulations, for all nitrogen species, by regulating levels of Total Inorganic Nitrogen (TIN) in effluent wastewaters. A Hidden Markov Model (HMM) and multinomial logistic regression based modeling framework is presented to predict TIN concentrations in treated wastewaters, using data from an operating wastewater treatment plant in Colorado, US. Effluent TIN concentrations are found to be a function of climate variables (such as minimum air temperature and precipitation), seasonality, effluent ammonia concentrations and effluent TIN concentrations in the previous month. The performance-based models presented in this research can be beneficial to several stakeholders; they can be useful for predictive purposes or reliability analysis on both a single treatment plant or multi-plant level. While they can help individual plant operators ensure compliance with changing nutrient regulations, monitoring and enforcement efforts can now be better channelized towards frequent and egregious violators. Additionally, for various proposed (lowered) discharge limits, these models can be implemented to delineate reliable sources of demand and supply for a point source-to-point source nutrient credit trading scheme
Tracking interacting targets in multi-modal sensors
PhDObject tracking is one of the fundamental tasks in various applications such as surveillance,
sports, video conferencing and activity recognition. Factors such as occlusions,
illumination changes and limited field of observance of the sensor make tracking a challenging
task. To overcome these challenges the focus of this thesis is on using multiple
modalities such as audio and video for multi-target, multi-modal tracking. Particularly,
this thesis presents contributions to four related research topics, namely, pre-processing of
input signals to reduce noise, multi-modal tracking, simultaneous detection and tracking,
and interaction recognition.
To improve the performance of detection algorithms, especially in the presence
of noise, this thesis investigate filtering of the input data through spatio-temporal feature
analysis as well as through frequency band analysis. The pre-processed data from multiple
modalities is then fused within Particle filtering (PF). To further minimise the discrepancy
between the real and the estimated positions, we propose a strategy that associates the
hypotheses and the measurements with a real target, using a Weighted Probabilistic Data
Association (WPDA). Since the filtering involved in the detection process reduces the
available information and is inapplicable on low signal-to-noise ratio data, we investigate
simultaneous detection and tracking approaches and propose a multi-target track-beforedetect
Particle filtering (MT-TBD-PF). The proposed MT-TBD-PF algorithm bypasses
the detection step and performs tracking in the raw signal. Finally, we apply the proposed
multi-modal tracking to recognise interactions between targets in regions within, as well
as outside the cameras’ fields of view.
The efficiency of the proposed approaches are demonstrated on large uni-modal,
multi-modal and multi-sensor scenarios from real world detections, tracking and event
recognition datasets and through participation in evaluation campaigns
Kompensation positionsbezogener Artefakte in Aktivitätserkennung
This thesis investigates, how placement variations of electronic devices influence the possibility of using sensors integrated in those devices for context recognition. The vast majority of context recognition research assumes well defined, fixed sen- sor locations. Although this might be acceptable for some application domains (e.g. in an industrial setting), users, in general, will have a hard time coping with these limitations. If one needs to remember to carry dedicated sensors and to adjust their orientation from time to time, the activity recognition system is more distracting than helpful. How can we deal with device location and orientation changes to make context sensing mainstream? This thesis presents a systematic evaluation of device placement effects in context recognition. We first deal with detecting if a device is carried on the body or placed somewhere in the environ- ment. If the device is placed on the body, it is useful to know on which body part. We also address how to deal with sensors changing their position and their orientation during use. For each of these topics some highlights are given in the following. Regarding environmental placement, we introduce an active sampling ap- proach to infer symbolic object location. This approach requires only simple sensors (acceleration, sound) and no infrastructure setup. The method works for specific placements such as "on the couch", "in the desk drawer" as well as for general location classes, such as "closed wood compartment" or "open iron sur- face". In the experimental evaluation we reach a recognition accuracy of 90% and above over a total of over 1200 measurements from 35 specific locations (taken from 3 different rooms) and 12 abstract location classes. To derive the coarse device placement on the body, we present a method solely based on rotation and acceleration signals from the device. It works independent of the device orientation. The on-body placement recognition rate is around 80% over 4 min. of unconstrained motion data for the worst scenario and up to 90% over a 2 min. interval for the best scenario. We use over 30 hours of motion data for the analysis. Two special issues of device placement are orientation and displacement. This thesis proposes a set of heuristics that significantly increase the robustness of motion sensor-based activity recognition with respect to sen- sor displacement. We show how, within certain limits and with modest quality degradation, motion sensor-based activity recognition can be implemented in a displacement tolerant way. We evaluate our heuristics first on a set of synthetic lower arm motions which are well suited to illustrate the strengths and limits of our approach, then on an extended modes of locomotion problem (sensors on the upper leg) and finally on a set of exercises performed on various gym machines (sensors placed on the lower arm). In this example our heuristic raises the dis- placed recognition rate from 24% for a displaced accelerometer, which had 96% recognition when not displaced, to 82%
- …