4 research outputs found
Structural Generative Descriptions for Temporal Data
In data mining problems the representation or description of data plays a fundamental role, since it defines the set of essential properties for the extraction and characterisation of patterns. However, for the case of temporal data, such as time series and data streams, one outstanding issue when developing mining algorithms is finding an appropriate data description or representation.
In this thesis two novel domain-independent representation frameworks for temporal data suitable for off-line and online mining tasks are formulated.
First, a domain-independent temporal data representation framework based on a novel data description strategy which combines structural and statistical pattern recognition approaches is developed. The key idea here is to move the structural pattern recognition problem to the probability domain. This framework is composed of three general tasks: a) decomposing input temporal patterns into subpatterns in time or any other transformed domain (for instance, wavelet domain); b) mapping these subpatterns into the probability domain to find attributes of elemental probability subpatterns called primitives; and c) mining input temporal patterns according to the attributes of their corresponding probability domain subpatterns. This framework is referred to as Structural Generative Descriptions (SGDs).
Two off-line and two online algorithmic instantiations of the proposed SGDs framework are then formulated: i) For the off-line case, the first instantiation is based on the use of Discrete Wavelet Transform (DWT) and Wavelet Density Estimators (WDE), while the second algorithm includes DWT and Finite Gaussian Mixtures. ii) For the online case, the first instantiation relies on an online implementation of DWT and a recursive version of WDE (RWDE), whereas the second algorithm is based on a multi-resolution exponentially weighted moving average filter and RWDE. The empirical evaluation of proposed SGDs-based algorithms is performed in the context of time series classification, for off-line algorithms, and in the context of change detection and clustering, for online algorithms. For this purpose, synthetic and publicly available real-world data are used.
Additionally, a novel framework for multidimensional data stream evolution diagnosis incorporating RWDE into the context of Velocity Density Estimation (VDE) is formulated. Changes in streaming data and changes in their correlation structure are characterised by means of local and global evolution coefficients as well as by means of recursive correlation coefficients. The proposed VDE framework is evaluated using temperature data from the UK and air pollution data from Hong Kong.Open Acces
An Advanced A-V- Player to Support Scalable Personalised Interaction with Multi-Stream Video Content
PhDCurrent Audio-Video (A-V) players are limited to pausing, resuming, selecting and
viewing a single video stream of a live broadcast event that is orchestrated by a
professional director. The main objective of this research is to investigate how to create a
new custom-built interactive A V player that enables viewers to personalise their own
orchestrated views of live events from multiple simultaneous camera streams, via
interacting with tracked moving objects, being able to zoom in and out of targeted
objects, and being able to switch views based upon detected incidents in specific camera
views. This involves research and development of a personalisation framework to create
and maintain user profiles that are acquired implicitly and explicitly and modelling how
this framework supports an evaluation of the effectiveness and usability of
personalisation.
Personalisation is considered from both an application oriented and a quality supervision
oriented perspective within the proposed framework. Personalisation models can be
individually or collaboratively linked with specific personalisation usage scenarios. The
quality of different personalised interaction in terms of explicit evaluative metrics such as
scalability and consistency can be monitored and measured using specific evaluation
mechanisms.European Union's
Seventh Framework Programme ([FP7/2007-2013]) under grant agreement No. ICT-
215248 and from Queen Mary University of London
On String Classification in Data Streams
String data has recently become important because of its use in a number of applications such as computational and molecular biology, protein analysis, and market basket data. In many cases, these strings contain a wide variety of substructures which may have physical significance for that application. For example, such substructures could represent important fragments of a DNA string or an interesting portion of a fraudulent transaction. In such a case, it is desirable to determine the identity, location, and extent of that substructure in the data. This is a much more difficult generalization of the classification problem, since the latter problem labels entire strings rather than deal with the more complex task of determining string fragments with a particular kind of behavior. The problem becomes even more complicated when different kinds of substrings show complicated nesting patterns. Therefore, we define a somewhat different problem which we refer to as the generalized classification problem. We propose a scalable approach based on hidden markov models for this problem. We show how to implement the generalized string classification procedure for very large data bases and data streams. We present experimental results over a number of large data sets and data streams
Research Track Paper ABSTRACT On String Classification in Data Streams
String data has recently become important because of its use in a number of applications such as computational and molecular biology, protein analysis, and market basket data. In many cases, these strings contain a wide variety of substructures which may have physical significance for that application. For example, such substructures could represent important fragments of a DNA string or an interesting portion of a fraudulent transaction. In such a case, it is desirable to determine the identity, location, and extent of that substructure in the data. This is a much more difficult generalization of the classification problem, since the latter problem labels entire strings rather than deal with the more complex task of determining string fragments with a particular kind of behavior. The problem becomes even more complicated when different kinds of substrings show complicated nesting patterns. Therefore, we define a somewhat different problem which we refer to as the generalized classification problem. We propose a scalable approach based on hidden markov models for this problem. We show how to implement the generalized string classification procedure for very large data bases and data streams. We present experimental results over a number of large data sets and data streams