106 research outputs found
Data mining by means of generalized patterns
The thesis is mainly focused on the study and the application of pattern discovery algorithms that aggregate database knowledge to discover and exploit valuable correlations, hidden in the analyzed data, at different abstraction levels. The aim of the research effort described in this work is two-fold: the discovery of associations, in the form of generalized patterns, from large data collections and the inference of semantic models, i.e., taxonomies and ontologies, suitable for driving the mining proces
Making machine intelligence less scary for criminal analysts: reflections on designing a visual comparative case analysis tool
A fundamental task in Criminal Intelligence Analysis is to analyze the similarity of crime cases, called CCA, to identify common crime patterns and to reason about unsolved crimes. Typically, the data is complex and high dimensional and the use of complex analytical processes would be appropriate. State-of-the-art CCA tools lack flexibility in interactive data exploration and fall short of computational transparency in terms of revealing alternative methods and results. In this paper, we report on the design of the Concept Explorer, a flexible, transparent and interactive CCA system. During this design process, we observed that most criminal analysts are not able to understand the underlying complex technical processes, which decrease the users' trust in the results and hence a reluctance to use the tool}. Our CCA solution implements a computational pipeline together with a visual platform that allows the analysts to interact with each stage of the analysis process and to validate the result. The proposed Visual Analytics workflow iteratively supports the interpretation of the results of clustering with the respective feature relations, the development of alternative models, as well as cluster verification. The visualizations offer an understandable and usable way for the analyst to provide feedback to the system and to observe the impact of their interactions. Expert feedback confirmed that our user-centred design decisions made this computational complexity less scary to criminal analysts
A survey of temporal knowledge discovery paradigms and methods
With the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit time-varying nature of the universe. This paper investigates the confluence of these two areas, surveys the work to date, and explores the issues involved and the outstanding problems in temporal data mining
A COMPREHENSIVE GEOSPATIAL KNOWLEDGE DISCOVERY FRAMEWORK FOR SPATIAL ASSOCIATION RULE MINING
Continuous advances in modern data collection techniques help spatial scientists gain access to massive and high-resolution spatial and spatio-temporal data. Thus there is an urgent need to develop effective and efficient methods seeking to find unknown and useful information embedded in big-data datasets of unprecedentedly large size (e.g., millions of observations), high dimensionality (e.g., hundreds of variables), and complexity (e.g., heterogeneous data sources, space–time dynamics, multivariate connections, explicit and implicit spatial relations and interactions). Responding to this line of development, this research focuses on the utilization of the association rule (AR) mining technique for a geospatial knowledge discovery process.
Prior attempts have sidestepped the complexity of the spatial dependence structure embedded in the studied phenomenon. Thus, adopting association rule mining in spatial analysis is rather problematic. Interestingly, a very similar predicament afflicts spatial regression analysis with a spatial weight matrix that would be assigned a priori, without validation on the specific domain of application. Besides, a dependable geospatial knowledge discovery process necessitates algorithms supporting automatic and robust but accurate procedures for the evaluation of mined results. Surprisingly, this has received little attention in the context of spatial association rule mining.
To remedy the existing deficiencies mentioned above, the foremost goal for this research is to construct a comprehensive geospatial knowledge discovery framework using spatial association rule mining for the detection of spatial patterns embedded in geospatial databases and to demonstrate its application within the domain of crime analysis. It is the first attempt at delivering a complete geo-spatial knowledge discovery framework using spatial association rule mining
From Pattern Discovery to Pattern Interpretation of Semantically-Enriched Trajectory Data
The widespread use of positioning technologies ranging from GSM and GPS to WiFi devices tend to produce large-scale datasets of trajectories, which represent the movement of travelling entities. Several application domains, such as recreational area management, may benefit from analysing such datasets. However, analysis results only become truly useful and meaningful for the end user when the intrinsically complex nature of the movement data in terms of context is taken into account during the knowledge discovery process. For this reason we propose a pattern interpretation framework that consists of three main steps, namely, pattern discovery, semantic annotation and pattern analysis. The framework supports the understanding of movement patterns that were extracted using some trajectory mining algorithm.
In order to demonstrate the feasibility and effectiveness of the framework, we have specifically applied it for understanding moving flock patterns in pedestrian movement. For the pattern discovery step, we have formally defined the concept of moving flock, distinguishing it from stationary flock, and developed a detection algorithm for it. A set of guidelines for setting the parameters of the algorithm is provided and a specific technique is implemented for the radius parameter.As for the semantic annotation step, we have proposed a guideline for selecting appropriate attributes for semantic enrichment of individual entities and of moving flocks. Two levels of annotation, which are at individual and pattern level, were also described. Finally, for the pattern interpretation step, we have combined the results obtained using hierarchichal clustering and decision tree classification in order to analyse the attributes of flock members and of the flocks, and the flocks themselves.
The entire framework was tested on the Dwingelderveld National Park (DNP) dataset and the Delft dataset, both of which are pedestrian datasets based in the Netherlands. The DNP dataset contains records of observations on the movement of visitors in the park while the Delft dataset describes movement of the pedestrians in the city. As a result, some forms of interactions, such as certain groups of visitors following the most popular path in the park, were inferred. Furthermore, some flocks were linked with specific attractions of the park
Ontology Learning Using Formal Concept Analysis and WordNet
Manual ontology construction takes time, resources, and domain specialists.
Supporting a component of this process for automation or semi-automation would
be good. This project and dissertation provide a Formal Concept Analysis and
WordNet framework for learning concept hierarchies from free texts. The process
has steps. First, the document is Part-Of-Speech labeled, then parsed to
produce sentence parse trees. Verb/noun dependencies are derived from parse
trees next. After lemmatizing, pruning, and filtering the word pairings, the
formal context is created. The formal context may contain some erroneous and
uninteresting pairs because the parser output may be erroneous, not all derived
pairs are interesting, and it may be large due to constructing it from a large
free text corpus. Deriving lattice from the formal context may take longer,
depending on the size and complexity of the data. Thus, decreasing formal
context may eliminate erroneous and uninteresting pairs and speed up idea
lattice derivation. WordNet-based and Frequency-based approaches are tested.
Finally, we compute formal idea lattice and create a classical concept
hierarchy. The reduced concept lattice is compared to the original to evaluate
the outcomes. Despite several system constraints and component discrepancies
that may prevent logical conclusion, the following data imply idea hierarchies
in this project and dissertation are promising. First, the reduced idea lattice
and original concept have commonalities. Second, alternative language or
statistical methods can reduce formal context size. Finally, WordNet-based and
Frequency-based approaches reduce formal context differently, and the order of
applying them is examined to reduce context efficiently
Making machine intelligence less scary for criminal analysts: reflections on designing a visual comparative case analysis tool
A fundamental task in Criminal Intelligence Analysis is to analyze the similarity of crime cases, called CCA, to identify common crime patterns and to reason about unsolved crimes. Typically, the data is complex and high dimensional and the use of complex analytical processes would be appropriate. State-of-the-art CCA tools lack flexibility in interactive data exploration and fall short of computational transparency in terms of revealing alternative methods and results. In this paper, we report on the design of the Concept Explorer, a flexible, transparent and interactive CCA system. During this design process, we observed that most criminal analysts are not able to understand the underlying complex technical processes, which decrease the users' trust in the results and hence a reluctance to use the tool}. Our CCA solution implements a computational pipeline together with a visual platform that allows the analysts to interact with each stage of the analysis process and to validate the result. The proposed Visual Analytics workflow iteratively supports the interpretation of the results of clustering with the respective feature relations, the development of alternative models, as well as cluster verification. The visualizations offer an understandable and usable way for the analyst to provide feedback to the system and to observe the impact of their interactions. Expert feedback confirmed that our user-centred design decisions made this computational complexity less scary to criminal analysts
EFFECT OF COGNITIVE BIASES ON HUMAN UNDERSTANDING OF RULE-BASED MACHINE LEARNING MODELS
PhDThis thesis investigates to what extent do cognitive biases a ect human understanding of
interpretable machine learning models, in particular of rules discovered from data. Twenty
cognitive biases (illusions, e ects) are analysed in detail, including identi cation of possibly
e ective debiasing techniques that can be adopted by designers of machine learning algorithms
and software. This qualitative research is complemented by multiple experiments
aimed to verify, whether, and to what extent, do selected cognitive biases in uence human
understanding of actual rule learning results. Two experiments were performed, one
focused on eliciting plausibility judgments for pairs of inductively learned rules, second
experiment involved replication of the Linda experiment with crowdsourcing and two of
its modi cations. Altogether nearly 3.000 human judgments were collected. We obtained
empirical evidence for the insensitivity to sample size e ect. There is also limited evidence
for the disjunction fallacy, misunderstanding of and , weak evidence e ect and availability
heuristic.
While there seems no universal approach for eliminating all the identi ed cognitive biases,
it follows from our analysis that the e ect of many biases can be ameliorated by
making rule-based models more concise. To this end, in the second part of thesis we propose
a novel machine learning framework which postprocesses rules on the output of the
seminal association rule classi cation algorithm CBA [Liu et al, 1998]. The framework
uses original undiscretized numerical attributes to optimize the discovered association
rules, re ning the boundaries of literals in the antecedent of the rules produced by CBA.
Some rules as well as literals from the rules can consequently be removed, which makes the
resulting classi er smaller. Benchmark of our approach on 22 UCI datasets shows average
53% decrease in the total size of the model as measured by the total number of conditions
in all rules. Model accuracy remains on the same level as for CBA
- …