228,130 research outputs found
Structural Generative Descriptions for Temporal Data
In data mining problems the representation or description of data plays a fundamental role, since it defines the set of essential properties for the extraction and characterisation of patterns. However, for the case of temporal data, such as time series and data streams, one outstanding issue when developing mining algorithms is finding an appropriate data description or representation.
In this thesis two novel domain-independent representation frameworks for temporal data suitable for off-line and online mining tasks are formulated.
First, a domain-independent temporal data representation framework based on a novel data description strategy which combines structural and statistical pattern recognition approaches is developed. The key idea here is to move the structural pattern recognition problem to the probability domain. This framework is composed of three general tasks: a) decomposing input temporal patterns into subpatterns in time or any other transformed domain (for instance, wavelet domain); b) mapping these subpatterns into the probability domain to find attributes of elemental probability subpatterns called primitives; and c) mining input temporal patterns according to the attributes of their corresponding probability domain subpatterns. This framework is referred to as Structural Generative Descriptions (SGDs).
Two off-line and two online algorithmic instantiations of the proposed SGDs framework are then formulated: i) For the off-line case, the first instantiation is based on the use of Discrete Wavelet Transform (DWT) and Wavelet Density Estimators (WDE), while the second algorithm includes DWT and Finite Gaussian Mixtures. ii) For the online case, the first instantiation relies on an online implementation of DWT and a recursive version of WDE (RWDE), whereas the second algorithm is based on a multi-resolution exponentially weighted moving average filter and RWDE. The empirical evaluation of proposed SGDs-based algorithms is performed in the context of time series classification, for off-line algorithms, and in the context of change detection and clustering, for online algorithms. For this purpose, synthetic and publicly available real-world data are used.
Additionally, a novel framework for multidimensional data stream evolution diagnosis incorporating RWDE into the context of Velocity Density Estimation (VDE) is formulated. Changes in streaming data and changes in their correlation structure are characterised by means of local and global evolution coefficients as well as by means of recursive correlation coefficients. The proposed VDE framework is evaluated using temperature data from the UK and air pollution data from Hong Kong.Open Acces
Clustering Memes in Social Media
The increasing pervasiveness of social media creates new opportunities to
study human social behavior, while challenging our capability to analyze their
massive data streams. One of the emerging tasks is to distinguish between
different kinds of activities, for example engineered misinformation campaigns
versus spontaneous communication. Such detection problems require a formal
definition of meme, or unit of information that can spread from person to
person through the social network. Once a meme is identified, supervised
learning methods can be applied to classify different types of communication.
The appropriate granularity of a meme, however, is hardly captured from
existing entities such as tags and keywords. Here we present a framework for
the novel task of detecting memes by clustering messages from large streams of
social data. We evaluate various similarity measures that leverage content,
metadata, network features, and their combinations. We also explore the idea of
pre-clustering on the basis of existing entities. A systematic evaluation is
carried out using a manually curated dataset as ground truth. Our analysis
shows that pre-clustering and a combination of heterogeneous features yield the
best trade-off between number of clusters and their quality, demonstrating that
a simple combination based on pairwise maximization of similarity is as
effective as a non-trivial optimization of parameters. Our approach is fully
automatic, unsupervised, and scalable for real-time detection of memes in
streaming data.Comment: Proceedings of the 2013 IEEE/ACM International Conference on Advances
in Social Networks Analysis and Mining (ASONAM'13), 201
The future of technology enhanced active learning – a roadmap
The notion of active learning refers to the active involvement of learner in the learning process,
capturing ideas of learning-by-doing and the fact that active participation and knowledge construction leads to deeper and more sustained learning. Interactivity, in particular learnercontent interaction, is a central aspect of technology-enhanced active learning. In this roadmap,
the pedagogical background is discussed, the essential dimensions of technology-enhanced active learning systems are outlined and the factors that are expected to influence these systems currently and in the future are identified. A central aim is to address this promising field from a
best practices perspective, clarifying central issues and formulating an agenda for future developments in the form of a roadmap
The contribution of data mining to information science
The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research
A recommender system for process discovery
Over the last decade, several algorithms for process discovery and process conformance have been proposed. Still, it is well-accepted that there is no dominant algorithm in any of these two disciplines, and then it is often difficult to apply them successfully. Most of these algorithms need a close-to expert knowledge in order to be applied satisfactorily. In this paper, we present a recommender system that uses portfolio-based algorithm selection strategies to face the following problems: to find the best discovery algorithm for the data at hand, and to allow bridging the gap between general users and process mining algorithms. Experiments performed with the developed tool witness the usefulness of the approach for a variety of instances.Peer ReviewedPostprint (author’s final draft
Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis
Notwithstanding recent work which has demonstrated the potential of using
Twitter messages for content-specific data mining and analysis, the depth of
such analysis is inherently limited by the scarcity of data imposed by the 140
character tweet limit. In this paper we describe a novel approach for targeted
knowledge exploration which uses tweet content analysis as a preliminary step.
This step is used to bootstrap more sophisticated data collection from directly
related but much richer content sources. In particular we demonstrate that
valuable information can be collected by following URLs included in tweets. We
automatically extract content from the corresponding web pages and treating
each web page as a document linked to the original tweet show how a temporal
topic model based on a hierarchical Dirichlet process can be used to track the
evolution of a complex topic structure of a Twitter community. Using
autism-related tweets we demonstrate that our method is capable of capturing a
much more meaningful picture of information exchange than user-chosen hashtags.Comment: IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining, 201
- …