10 research outputs found

    Sequential Patterns Post-processing for Structural Relation Patterns Mining

    Get PDF
    Sequential patterns mining is an important data-mining technique used to identify frequently observed sequential occurrence of items across ordered transactions over time. It has been extensively studied in the literature, and there exists a diversity of algorithms. However, more complex structural patterns are often hidden behind sequences. This article begins with the introduction of a model for the representation of sequential patternsβ€”Sequential Patterns Graphβ€”which motivates the search for new structural relation patterns. An integrative framework for the discovery of these patterns–Postsequential Patterns Mining–is then described which underpins the postprocessing of sequential patterns. A corresponding data-mining method based on sequential patterns postprocessing is proposed and shown to be effective in the search for concurrent patterns. From experiments conducted on three component algorithms, it is demonstrated that sequential patterns-based concurrent patterns mining provides an efficient method for structural knowledge discover

    PSEUDO PROJECTION BASED APPROACH TO DISCOVERTIME INTERVAL SEQUENTIAL PATTERN

    Get PDF
    ABSTRAC

    Mining Traversal Patterns from Weighted Traversals and Graph

    Get PDF
    μ‹€μ„Έκ³„μ˜ λ§Žμ€ λ¬Έμ œλ“€μ€ κ·Έλž˜ν”„μ™€ κ·Έ κ·Έλž˜ν”„λ₯Ό μˆœνšŒν•˜λŠ” νŠΈλžœμž­μ…˜μœΌλ‘œ λͺ¨λΈλ§λ  수 μžˆλ‹€. 예λ₯Ό λ“€λ©΄, μ›Ή νŽ˜μ΄μ§€μ˜ μ—°κ²°κ΅¬μ‘°λŠ” κ·Έλž˜ν”„λ‘œ ν‘œν˜„λ  수 있고, μ‚¬μš©μžμ˜ μ›Ή νŽ˜μ΄μ§€ λ°©λ¬Έκ²½λ‘œλŠ” κ·Έ κ·Έλž˜ν”„λ₯Ό μˆœνšŒν•˜λŠ” νŠΈλžœμž­μ…˜μœΌλ‘œ λͺ¨λΈλ§λ  수 μžˆλ‹€. 이와 같이 κ·Έλž˜ν”„λ₯Ό μˆœνšŒν•˜λŠ” νŠΈλžœμž­μ…˜μœΌλ‘œλΆ€ν„° μ€‘μš”ν•˜κ³  κ°€μΉ˜ μžˆλŠ” νŒ¨ν„΄μ„ μ°Ύμ•„λ‚΄λŠ” 것은 의미 μžˆλŠ” 일이닀. μ΄λŸ¬ν•œ νŒ¨ν„΄μ„ μ°ΎκΈ° μœ„ν•œ μ§€κΈˆκΉŒμ§€μ˜ μ—°κ΅¬μ—μ„œλŠ” μˆœνšŒλ‚˜ κ·Έλž˜ν”„μ˜ κ°€μ€‘μΉ˜λ₯Ό κ³ λ €ν•˜μ§€ μ•Šκ³  λ‹¨μˆœνžˆ λΉˆλ°œν•˜λŠ” νŒ¨ν„΄λ§Œμ„ μ°ΎλŠ” μ•Œκ³ λ¦¬μ¦˜μ„ μ œμ•ˆν•˜μ˜€λ‹€. μ΄λŸ¬ν•œ μ•Œκ³ λ¦¬μ¦˜μ˜ ν•œκ³„λŠ” 보닀 μ‹ λ’°μ„± 있고 μ •ν™•ν•œ νŒ¨ν„΄μ„ νƒμ‚¬ν•˜λŠ” 데 어렀움이 μžˆλ‹€λŠ” 것이닀. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μˆœνšŒλ‚˜ κ·Έλž˜ν”„μ˜ 정점에 λΆ€μ—¬λœ κ°€μ€‘μΉ˜λ₯Ό κ³ λ €ν•˜μ—¬ νŒ¨ν„΄μ„ νƒμ‚¬ν•˜λŠ” 두 가지 방법듀을 μ œμ•ˆν•œλ‹€. 첫 번째 방법은 κ·Έλž˜ν”„λ₯Ό μˆœνšŒν•˜λŠ” 정보에 κ°€μ€‘μΉ˜κ°€ μ‘΄μž¬ν•˜λŠ” κ²½μš°μ— 빈발 순회 νŒ¨ν„΄μ„ νƒμ‚¬ν•˜λŠ” 것이닀. κ·Έλž˜ν”„ μˆœνšŒμ— 뢀여될 수 μžˆλŠ” κ°€μ€‘μΉ˜λ‘œλŠ” 두 λ„μ‹œκ°„μ˜ 이동 μ‹œκ°„μ΄λ‚˜ μ›Ή μ‚¬μ΄νŠΈλ₯Ό λ°©λ¬Έν•  λ•Œ ν•œ νŽ˜μ΄μ§€μ—μ„œ λ‹€λ₯Έ νŽ˜μ΄μ§€λ‘œ μ΄λ™ν•˜λŠ” μ‹œκ°„ 등이 될 수 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ’€ 더 μ •ν™•ν•œ 순회 νŒ¨ν„΄μ„ λ§ˆμ΄λ‹ν•˜κΈ° μœ„ν•΄ ν†΅κ³„ν•™μ˜ μ‹ λ’° ꡬ간을 μ΄μš©ν•œλ‹€. 즉, 전체 순회의 각 간선에 λΆ€μ—¬λœ κ°€μ€‘μΉ˜λ‘œλΆ€ν„° μ‹ λ’° ꡬ간을 κ΅¬ν•œ ν›„ μ‹ λ’° κ΅¬κ°„μ˜ 내에 μžˆλŠ” μˆœνšŒλ§Œμ„ μœ νš¨ν•œ κ²ƒμœΌλ‘œ μΈμ •ν•˜λŠ” 방법이닀. μ΄λŸ¬ν•œ 방법을 μ μš©ν•¨μœΌλ‘œμ¨ λ”μš± μ‹ λ’°μ„± μžˆλŠ” 순회 νŒ¨ν„΄μ„ λ§ˆμ΄λ‹ν•  수 μžˆλ‹€. λ˜ν•œ μ΄λ ‡κ²Œ κ΅¬ν•œ νŒ¨ν„΄κ³Ό κ·Έλž˜ν”„ 정보λ₯Ό μ΄μš©ν•˜μ—¬ νŒ¨ν„΄ κ°„μ˜ μš°μ„ μˆœμœ„λ₯Ό κ²°μ •ν•  수 μžˆλŠ” 방법과 μ„±λŠ₯ ν–₯상을 μœ„ν•œ μ•Œκ³ λ¦¬μ¦˜λ„ μ œμ‹œν•œλ‹€. 두 번째 방법은 κ·Έλž˜ν”„μ˜ 정점에 κ°€μ€‘μΉ˜κ°€ λΆ€μ—¬λœ κ²½μš°μ— κ°€μ€‘μΉ˜κ°€ 고렀된 빈발 순회 νŒ¨ν„΄μ„ νƒμ‚¬ν•˜λŠ” 방법이닀. κ·Έλž˜ν”„μ˜ 정점에 뢀여될 수 μžˆλŠ” κ°€μ€‘μΉ˜λ‘œλŠ” μ›Ή μ‚¬μ΄νŠΈ λ‚΄μ˜ 각 λ¬Έμ„œμ˜ μ •λ³΄λŸ‰μ΄λ‚˜ μ€‘μš”λ„ 등이 될 수 μžˆλ‹€. 이 λ¬Έμ œμ—μ„œλŠ” 빈발 순회 νŒ¨ν„΄μ„ κ²°μ •ν•˜κΈ° μœ„ν•˜μ—¬ νŒ¨ν„΄μ˜ λ°œμƒ λΉˆλ„λΏλ§Œ μ•„λ‹ˆλΌ λ°©λ¬Έν•œ μ •μ μ˜ κ°€μ€‘μΉ˜λ₯Ό λ™μ‹œμ— κ³ λ €ν•˜μ—¬μ•Ό ν•œλ‹€. 이λ₯Ό μœ„ν•΄ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ •μ μ˜ κ°€μ€‘μΉ˜λ₯Ό μ΄μš©ν•˜μ—¬ ν–₯후에 빈발 νŒ¨ν„΄μ΄ 될 κ°€λŠ₯성이 μžˆλŠ” 후보 νŒ¨ν„΄μ€ 각 λ§ˆμ΄λ‹ λ‹¨κ³„μ—μ„œ μ œκ±°ν•˜μ§€ μ•Šκ³  μœ μ§€ν•˜λŠ” μ•Œκ³ λ¦¬μ¦˜μ„ μ œμ•ˆν•œλ‹€. λ˜ν•œ μ„±λŠ₯ ν–₯상을 μœ„ν•΄ 후보 νŒ¨ν„΄μ˜ 수λ₯Ό κ°μ†Œμ‹œν‚€λŠ” μ•Œκ³ λ¦¬μ¦˜λ„ μ œμ•ˆν•œλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ•ˆν•œ 두 가지 방법에 λŒ€ν•˜μ—¬ λ‹€μ–‘ν•œ μ‹€ν—˜μ„ ν†΅ν•˜μ—¬ μˆ˜ν–‰ μ‹œκ°„ 및 μƒμ„±λ˜λŠ” νŒ¨ν„΄μ˜ 수 등을 비ꡐ λΆ„μ„ν•˜μ˜€λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μˆœνšŒμ— κ°€μ€‘μΉ˜κ°€ μžˆλŠ” κ²½μš°μ™€ κ·Έλž˜ν”„μ˜ 정점에 κ°€μ€‘μΉ˜κ°€ μžˆλŠ” κ²½μš°μ— 빈발 순회 νŒ¨ν„΄μ„ νƒμ‚¬ν•˜λŠ” μƒˆλ‘œμš΄ 방법듀을 μ œμ•ˆν•˜μ˜€λ‹€. μ œμ•ˆν•œ 방법듀을 μ›Ή λ§ˆμ΄λ‹κ³Ό 같은 뢄야에 μ μš©ν•¨μœΌλ‘œμ¨ μ›Ή ꡬ쑰의 효율적인 λ³€κ²½μ΄λ‚˜ μ›Ή λ¬Έμ„œμ˜ μ ‘κ·Ό 속도 ν–₯상, μ‚¬μš©μžλ³„ κ°œμΈν™”λœ μ›Ή λ¬Έμ„œ ꡬ좕 등이 κ°€λŠ₯ν•  것이닀.Abstract β…Ά Chapter 1 Introduction 1.1 Overview 1.2 Motivations 1.3 Approach 1.4 Organization of Thesis Chapter 2 Related Works 2.1 Itemset Mining 2.2 Weighted Itemset Mining 2.3 Traversal Mining 2.4 Graph Traversal Mining Chapter 3 Mining Patterns from Weighted Traversals on Unweighted Graph 3.1 Definitions and Problem Statements 3.2 Mining Frequent Patterns 3.2.1 Augmentation of Base Graph 3.2.2 In-Mining Algorithm 3.2.3 Pre-Mining Algorithm 3.2.4 Priority of Patterns 3.3 Experimental Results Chapter 4 Mining Patterns from Unweighted Traversals on Weighted Graph 4.1 Definitions and Problem Statements 4.2 Mining Weighted Frequent Patterns 4.2.1 Pruning by Support Bounds 4.2.2 Candidate Generation 4.2.3 Mining Algorithm 4.3 Estimation of Support Bounds 4.3.1 Estimation by All Vertices 4.3.2 Estimation by Reachable Vertices 4.4 Experimental Results Chapter 5 Conclusions and Further Works Reference

    Documentation-Guided Fuzzing for Testing Deep Learning API Functions

    Get PDF
    Widely-used deep learning (DL) libraries demand reliability. Thus, it is integral to test DL libraries’ API functions. Despite the effectiveness of fuzz testing, there are few techniques that are specialized in fuzzing API functions of DL libraries. To fill this gap, we design and implement a fuzzing technique called DocTer for API functions of DL libraries. Fuzzing DL API functions is challenging because many API functions expect structured inputs that follow DL-specific constraints. If a fuzzer is (1) unaware of these constraints or (2) incapable of using these constraints to fuzz, it is practically impossible to generate valid inputs, i.e., inputs that follow these DL-specific constraints, to explore deep to test the core functionality of API functions. DocTer extracts DL-specific constraints from API documents and uses these constraints to guide the fuzzing to generate valid inputs automatically. DocTer also generates inputs that violate these constraints to test the input validity checking code. To reduce manual effort, DocTer applies a sequential pattern mining technique on API documents to help DocTer users create rules to extract constraints from API documents automatically. Our evaluation on three popular DL libraries (TensorFlow, PyTorch, and MXNet) shows that DocTer’s accuracy in extracting input constraints is 82.2-90.5%. DocTer detects 46 bugs, while a baseline fuzzer without input constraints detects only 19 bugs. Most (33) of the 46 bugs are previously unknown, 26 of which have been fixed or confirmed by developers after we report them. In addition, DocTer detects 37 inconsistencies within documents, including 25 fixed or confirmed after we report them

    parator Database and SPM-Tree Framework for Mining Sequential Patterns Using PrefixSpan with Pseudoprojection

    Get PDF
    Sequential pattern mining is a new branch of data mining science that solves intertransaction pattern mining problems. Efficiency and scalability on mining complete set of patterns is the challenge of sequential pattern mining. A comprehensive performance study has been reported that PrefixSpan, one of the sequential pattern mining algorithms, outperforms GSP, SPADE, as well as FreeSpan in most cases, and PrefixSpan integrated with pseudoprojection technique is the fastest among those tested algorithms. Nevertheless, Pseudoprojection technique, which requires maintaining and visiting the in-memory sequence database frequently until all patterns are found, consumes a considerable amount of memory space and induces the algorithm to undertake many redundant and unnecessary checks to this copy of original database into memory when the candidate patterns are examined. Moreover, improper management of intermediate databases may adversely affect the execution time and memory utilization. In the present work, Separator Database is proposed to improve PrefixSpan with pseudoprojection through early removal of uneconomical in-memory sequence database, whilst SPM-Tree Framework is proposed to build the intermediate databases. By means of procedures for building index set of longer patterns using Separator Database, some procedure in accordance to in-memory sequence database can be removed, thus most of the memory space can be released and some obliteration of redundant checks to in-memory sequence database reduce the execution time. By storing intermediate databases into SPM-Tree Framework, the sequence database can be stored into memory and the index set may be built. Using Java as a case study, a series of experiment was conducted to select a suitable API class named Collections for this framework. The experimental results show that Separator Database always improves, exponentially in some cases, PrefixSpan with pseudoprojection. The results also show that in Java, ArrayList is the most suitable choice for storing Object and ArrayintList is the most suitable choice for storing integer data. This novel approach of integrating Separator Database and SPM-Tree Framework using these choices of Java Collections outperforms PrefixSpan with pseudoprojection in terms of CPU performance and memory utilization. Future research includes exploring the use of Separator Database in PrefixSpan with pseudoprojection to improve mining generalized sequential patterns, particularly in handling mining constrained sequential patterns

    From sequential patterns to concurrent branch patterns: a new post sequential patterns mining approach

    Get PDF
    A thesis submitted for the degree of Doctor ofPhilosophy of the University of BedfordshireSequential patterns mining is an important pattern discovery technique used to identify frequently observed sequential occurrence of items across ordered transactions over time. It has been intensively studied and there exists a great diversity of algorithms. However, there is a major problem associated with the conventional sequential patterns mining in that patterns derived are often large and not very easy to understand or use. In addition, more complex relations among events are often hidden behind sequences. A novel model for sequential patterns called Sequential Patterns Graph (SPG) is proposed. The construction algorithm of SPG is presented with experimental results to substantiate the concept. The thesis then sets out to define some new structural patterns such as concurrent branch patterns, exclusive patterns and iterative patterns which are generally hidden behind sequential patterns. Finally, an integrative framework, named Post Sequential Patterns Mining (PSPM), which is based on sequential patterns mining, is also proposed for the discovery and visualisation of structural patterns. This thesis is intended to prove that discrete sequential patterns derived from traditional sequential patterns mining can be modelled graphically using SPG. It is concluded from experiments and theoretical studies that SPG is not only a minimal representation of sequential patterns mining, but it also represents the interrelation among patterns and establishes further the foundation for mining structural knowledge (i.e. concurrent branch patterns, exclusive patterns and iterative patterns). from experiments conducted on both synthetic and real datasets, it is shown that Concurrent Branch Patterns (CBP) mining is an effective and efficient mining algorithm suitable for concurrent branch patterns

    Mining Predictive Patterns and Extension to Multivariate Temporal Data

    Get PDF
    An important goal of knowledge discovery is the search for patterns in the data that can help explaining its underlying structure. To be practically useful, the discovered patterns should be novel (unexpected) and easy to understand by humans. In this thesis, we study the problem of mining patterns (defining subpopulations of data instances) that are important for predicting and explaining a specific outcome variable. An example is the task of identifying groups of patients that respond better to a certain treatment than the rest of the patients. We propose and present efficient methods for mining predictive patterns for both atemporal and temporal (time series) data. Our first method relies on frequent pattern mining to explore the search space. It applies a novel evaluation technique for extracting a small set of frequent patterns that are highly predictive and have low redundancy. We show the benefits of this method on several synthetic and public datasets. Our temporal pattern mining method works on complex multivariate temporal data, such as electronic health records, for the event detection task. It first converts time series into time-interval sequences of temporal abstractions and then mines temporal patterns backwards in time, starting from patterns related to the most recent observations. We show the benefits of our temporal pattern mining method on two real-world clinical tasks

    A framework for trend mining with application to medical data

    Get PDF
    This thesis presents research work conducted in the field of knowledge discovery. It presents an integrated trend-mining framework and SOMA, which is the application of the trend-mining framework in diabetic retinopathy data. Trend mining is the process of identifying and analysing trends in the context of the variation of support of the association/classification rules that have been extracted from longitudinal datasets. The integrated framework concerns all major processes from data preparation to the extraction of knowledge. At the pre-process stage, data are cleaned, transformed if necessary, and sorted into time-stamped datasets using logic rules. At the next stage, time-stamp datasets are passed through the main processing, in which the ARM technique of matrix algorithm is applied to identify frequent rules with acceptable confidence. Mathematical conditions are applied to classify the sequences of support values into trends. Afterwards, interestingness criteria are applied to obtain interesting knowledge, and a visualization technique is proposed that maps how objects are moving from the previous to the next time stamp. A validation and verification (external and internal validation) framework is described that aims to ensure that the results at the intermediate stages of the framework are correct and that the framework as a whole can yield results that demonstrate causality. To evaluate the thesis, SOMA was developed. The dataset is, in itself, also of interest, as it is very noisy (in common with other similar medical datasets) and does not feature a clear association between specific time stamps and subsets of the data. The Royal Liverpool University Hospital has been a major centre for retinopathy research since 1991. Retinopathy is a generic term used to describe damage to the retina of the eye, which can, in the long term, lead to visual loss. Diabetic retinopathy is used to evaluate the framework, to determine whether SOMA can extract knowledge that is already known to the medics. The results show that those datasets can be used to extract knowledge that can show causality between patients’ characteristics such as the age of patient at diagnosis, type of diabetes, duration of diabetes, and diabetic retinopathy
    corecore