106 research outputs found

    Data mining by means of generalized patterns

    Get PDF
    The thesis is mainly focused on the study and the application of pattern discovery algorithms that aggregate database knowledge to discover and exploit valuable correlations, hidden in the analyzed data, at different abstraction levels. The aim of the research effort described in this work is two-fold: the discovery of associations, in the form of generalized patterns, from large data collections and the inference of semantic models, i.e., taxonomies and ontologies, suitable for driving the mining proces

    Making machine intelligence less scary for criminal analysts: reflections on designing a visual comparative case analysis tool

    Get PDF
    A fundamental task in Criminal Intelligence Analysis is to analyze the similarity of crime cases, called CCA, to identify common crime patterns and to reason about unsolved crimes. Typically, the data is complex and high dimensional and the use of complex analytical processes would be appropriate. State-of-the-art CCA tools lack flexibility in interactive data exploration and fall short of computational transparency in terms of revealing alternative methods and results. In this paper, we report on the design of the Concept Explorer, a flexible, transparent and interactive CCA system. During this design process, we observed that most criminal analysts are not able to understand the underlying complex technical processes, which decrease the users' trust in the results and hence a reluctance to use the tool}. Our CCA solution implements a computational pipeline together with a visual platform that allows the analysts to interact with each stage of the analysis process and to validate the result. The proposed Visual Analytics workflow iteratively supports the interpretation of the results of clustering with the respective feature relations, the development of alternative models, as well as cluster verification. The visualizations offer an understandable and usable way for the analyst to provide feedback to the system and to observe the impact of their interactions. Expert feedback confirmed that our user-centred design decisions made this computational complexity less scary to criminal analysts

    A survey of temporal knowledge discovery paradigms and methods

    Get PDF
    With the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit time-varying nature of the universe. This paper investigates the confluence of these two areas, surveys the work to date, and explores the issues involved and the outstanding problems in temporal data mining

    A COMPREHENSIVE GEOSPATIAL KNOWLEDGE DISCOVERY FRAMEWORK FOR SPATIAL ASSOCIATION RULE MINING

    Get PDF
    Continuous advances in modern data collection techniques help spatial scientists gain access to massive and high-resolution spatial and spatio-temporal data. Thus there is an urgent need to develop effective and efficient methods seeking to find unknown and useful information embedded in big-data datasets of unprecedentedly large size (e.g., millions of observations), high dimensionality (e.g., hundreds of variables), and complexity (e.g., heterogeneous data sources, space–time dynamics, multivariate connections, explicit and implicit spatial relations and interactions). Responding to this line of development, this research focuses on the utilization of the association rule (AR) mining technique for a geospatial knowledge discovery process. Prior attempts have sidestepped the complexity of the spatial dependence structure embedded in the studied phenomenon. Thus, adopting association rule mining in spatial analysis is rather problematic. Interestingly, a very similar predicament afflicts spatial regression analysis with a spatial weight matrix that would be assigned a priori, without validation on the specific domain of application. Besides, a dependable geospatial knowledge discovery process necessitates algorithms supporting automatic and robust but accurate procedures for the evaluation of mined results. Surprisingly, this has received little attention in the context of spatial association rule mining. To remedy the existing deficiencies mentioned above, the foremost goal for this research is to construct a comprehensive geospatial knowledge discovery framework using spatial association rule mining for the detection of spatial patterns embedded in geospatial databases and to demonstrate its application within the domain of crime analysis. It is the first attempt at delivering a complete geo-spatial knowledge discovery framework using spatial association rule mining

    From Pattern Discovery to Pattern Interpretation of Semantically-Enriched Trajectory Data

    Get PDF
    The widespread use of positioning technologies ranging from GSM and GPS to WiFi devices tend to produce large-scale datasets of trajectories, which represent the movement of travelling entities. Several application domains, such as recreational area management, may benefit from analysing such datasets. However, analysis results only become truly useful and meaningful for the end user when the intrinsically complex nature of the movement data in terms of context is taken into account during the knowledge discovery process. For this reason we propose a pattern interpretation framework that consists of three main steps, namely, pattern discovery, semantic annotation and pattern analysis. The framework supports the understanding of movement patterns that were extracted using some trajectory mining algorithm. In order to demonstrate the feasibility and effectiveness of the framework, we have specifically applied it for understanding moving flock patterns in pedestrian movement. For the pattern discovery step, we have formally defined the concept of moving flock, distinguishing it from stationary flock, and developed a detection algorithm for it. A set of guidelines for setting the parameters of the algorithm is provided and a specific technique is implemented for the radius parameter.As for the semantic annotation step, we have proposed a guideline for selecting appropriate attributes for semantic enrichment of individual entities and of moving flocks. Two levels of annotation, which are at individual and pattern level, were also described. Finally, for the pattern interpretation step, we have combined the results obtained using hierarchichal clustering and decision tree classification in order to analyse the attributes of flock members and of the flocks, and the flocks themselves. The entire framework was tested on the Dwingelderveld National Park (DNP) dataset and the Delft dataset, both of which are pedestrian datasets based in the Netherlands. The DNP dataset contains records of observations on the movement of visitors in the park while the Delft dataset describes movement of the pedestrians in the city. As a result, some forms of interactions, such as certain groups of visitors following the most popular path in the park, were inferred. Furthermore, some flocks were linked with specific attractions of the park

    Ontology Learning Using Formal Concept Analysis and WordNet

    Full text link
    Manual ontology construction takes time, resources, and domain specialists. Supporting a component of this process for automation or semi-automation would be good. This project and dissertation provide a Formal Concept Analysis and WordNet framework for learning concept hierarchies from free texts. The process has steps. First, the document is Part-Of-Speech labeled, then parsed to produce sentence parse trees. Verb/noun dependencies are derived from parse trees next. After lemmatizing, pruning, and filtering the word pairings, the formal context is created. The formal context may contain some erroneous and uninteresting pairs because the parser output may be erroneous, not all derived pairs are interesting, and it may be large due to constructing it from a large free text corpus. Deriving lattice from the formal context may take longer, depending on the size and complexity of the data. Thus, decreasing formal context may eliminate erroneous and uninteresting pairs and speed up idea lattice derivation. WordNet-based and Frequency-based approaches are tested. Finally, we compute formal idea lattice and create a classical concept hierarchy. The reduced concept lattice is compared to the original to evaluate the outcomes. Despite several system constraints and component discrepancies that may prevent logical conclusion, the following data imply idea hierarchies in this project and dissertation are promising. First, the reduced idea lattice and original concept have commonalities. Second, alternative language or statistical methods can reduce formal context size. Finally, WordNet-based and Frequency-based approaches reduce formal context differently, and the order of applying them is examined to reduce context efficiently

    Making machine intelligence less scary for criminal analysts: reflections on designing a visual comparative case analysis tool

    Get PDF
    A fundamental task in Criminal Intelligence Analysis is to analyze the similarity of crime cases, called CCA, to identify common crime patterns and to reason about unsolved crimes. Typically, the data is complex and high dimensional and the use of complex analytical processes would be appropriate. State-of-the-art CCA tools lack flexibility in interactive data exploration and fall short of computational transparency in terms of revealing alternative methods and results. In this paper, we report on the design of the Concept Explorer, a flexible, transparent and interactive CCA system. During this design process, we observed that most criminal analysts are not able to understand the underlying complex technical processes, which decrease the users' trust in the results and hence a reluctance to use the tool}. Our CCA solution implements a computational pipeline together with a visual platform that allows the analysts to interact with each stage of the analysis process and to validate the result. The proposed Visual Analytics workflow iteratively supports the interpretation of the results of clustering with the respective feature relations, the development of alternative models, as well as cluster verification. The visualizations offer an understandable and usable way for the analyst to provide feedback to the system and to observe the impact of their interactions. Expert feedback confirmed that our user-centred design decisions made this computational complexity less scary to criminal analysts

    Mining subjectively interesting patterns in rich data

    Get PDF

    EFFECT OF COGNITIVE BIASES ON HUMAN UNDERSTANDING OF RULE-BASED MACHINE LEARNING MODELS

    Get PDF
    PhDThis thesis investigates to what extent do cognitive biases a ect human understanding of interpretable machine learning models, in particular of rules discovered from data. Twenty cognitive biases (illusions, e ects) are analysed in detail, including identi cation of possibly e ective debiasing techniques that can be adopted by designers of machine learning algorithms and software. This qualitative research is complemented by multiple experiments aimed to verify, whether, and to what extent, do selected cognitive biases in uence human understanding of actual rule learning results. Two experiments were performed, one focused on eliciting plausibility judgments for pairs of inductively learned rules, second experiment involved replication of the Linda experiment with crowdsourcing and two of its modi cations. Altogether nearly 3.000 human judgments were collected. We obtained empirical evidence for the insensitivity to sample size e ect. There is also limited evidence for the disjunction fallacy, misunderstanding of and , weak evidence e ect and availability heuristic. While there seems no universal approach for eliminating all the identi ed cognitive biases, it follows from our analysis that the e ect of many biases can be ameliorated by making rule-based models more concise. To this end, in the second part of thesis we propose a novel machine learning framework which postprocesses rules on the output of the seminal association rule classi cation algorithm CBA [Liu et al, 1998]. The framework uses original undiscretized numerical attributes to optimize the discovered association rules, re ning the boundaries of literals in the antecedent of the rules produced by CBA. Some rules as well as literals from the rules can consequently be removed, which makes the resulting classi er smaller. Benchmark of our approach on 22 UCI datasets shows average 53% decrease in the total size of the model as measured by the total number of conditions in all rules. Model accuracy remains on the same level as for CBA

    Event detection in high throughput social media

    Get PDF
    • …
    corecore