281,265 research outputs found

    Flexible constrained sampling with guarantees for pattern mining

    Get PDF
    Pattern sampling has been proposed as a potential solution to the infamous pattern explosion. Instead of enumerating all patterns that satisfy the constraints, individual patterns are sampled proportional to a given quality measure. Several sampling algorithms have been proposed, but each of them has its limitations when it comes to 1) flexibility in terms of quality measures and constraints that can be used, and/or 2) guarantees with respect to sampling accuracy. We therefore present Flexics, the first flexible pattern sampler that supports a broad class of quality measures and constraints, while providing strong guarantees regarding sampling accuracy. To achieve this, we leverage the perspective on pattern mining as a constraint satisfaction problem and build upon the latest advances in sampling solutions in SAT as well as existing pattern mining algorithms. Furthermore, the proposed algorithm is applicable to a variety of pattern languages, which allows us to introduce and tackle the novel task of sampling sets of patterns. We introduce and empirically evaluate two variants of Flexics: 1) a generic variant that addresses the well-known itemset sampling task and the novel pattern set sampling task as well as a wide range of expressive constraints within these tasks, and 2) a specialized variant that exploits existing frequent itemset techniques to achieve substantial speed-ups. Experiments show that Flexics is both accurate and efficient, making it a useful tool for pattern-based data exploration.Comment: Accepted for publication in Data Mining & Knowledge Discovery journal (ECML/PKDD 2017 journal track

    Usage Pattern Exploration of Effective Contraception Tool

    Get PDF
    Determination of methods or contraception tool used by acceptors to support the Family Planning (“Keluarga Berencana”) is a problematic. In choosing methods or contraception tool, the acceptor must consider several factors, namely health factor, partner factor, and contraceptive method. Each method or contraception tool which is used has its advantages or disadvantages. Although it has been considering the advantages and disadvantages, it is still difficult to control fertility safely and effectively. Consequently acceptor change the method or a contraception tool that is used more than once. In order acceptors get the appropriate contraception tool then the patterns of changing in the usage of effective methods or contraception tool is determined. One of the methods that can be used to look for the patterns of changing in the usage of contraception tool is data mining. Data mining is an interesting pattern extraction of large amounts of data. A pattern is said to be interesting if the pattern is not trivial, implicit, previously unknown, and useful. The patterns presented should be easy to understand, can be applied to data that will be predicted with a certain degree, useful, and new. The early stage before applying data mining is using k nearest neighbors algorithm to determine the factors shortest distance selecting the contraception tool. The next step is applying data mining to usage changing data of method or contraception tool of family planning acceptors which is expected to dig up information related to acceptor behavior pattern in using the method or contraception tool. Furthermore, from the formed pattern, it can be used in decision making regarding the usage of effective contraception tool. The results obtained from this research is the k nearest neighbors by using the Euclidean distance can be used to determine the similarity of attributes owned by the acceptors of Family Planning to the training data is already available. Based on available training data, it can be determined the usage pattern of contraceptiion tool with the concept of data mining, where the acceptors of Family Planning are given a recommendation if the pattern is on the training data pattern. Conversely, if the pattern is none match, then the system does not provide recommendations of contraception tool which should be used

    Application Of Naive Bayes Classifier Algorithm In Determining New Student Admission Promotion Strategies

    Full text link
    Data Mining is a process that uses statistical techniques, mathematics, artificial intelligence, machine learning to extract and identify useful information and related knowledge from large databases. Data mining is the process of finding new patterns in data by filtering large amounts of data. Data mining uses pattern recognition technology that is similar to statistical techniques and mathematical techniques. The patterns found can provide useful information for generating economic benefits, effectiveness and efficiency. Algorithm Naive Bayes Classifier is one method of data mining that can be used to support effective and efficient promotion strategies. The Naive Bayes Classifier algorithm is used to predict the interest of the study based on the calculations performed. The data used are new student registration data from 2014 until 2016 at Bina Darma University. The results of this study are new models that are expected to provide important information can be used to assist the Marketing Team of Bina Darma University Palembang in policy making and implementation of appropriate marketing strategy. The results obtained are expected to help to support the promotion strategies that impact on the effectiveness and efficiency of promotion and increase the number of new students who will register

    PENERAPAN KAIDAH ASOSIASI PADA DATA TRANSAKSI MINIMARKET DENGAN MENGGUNAKAN ALGORITMAFREQUENT PATTERN GROWTH (FP-GROWTH)

    Get PDF
    Transaction data are stored only as many records can provide useful knowledge in making policies and marketing strategies for the mini market KOCIKA UNESA in State University of Surabaya Ketintang. For that purpose one can apply the techniques of DATA MINING association rules. Association rules is a procedure to search for knowledge in the form of consumer purchasing patterns. This pattern can be input in making policy and marketing strategy. A pattern is determined by two parameters, namely support (support value) and confidence (certainty value). This association rules using frequent growth algorithm (FP-growth) by applying the FP-tree data structure to find the purchase patterns. One pattern resulting from the analysis of transaction data last 1 month with 23 categories of items that if buy detergent, buy soap too with support = 19% and = 75% confidence value.Keyword: Transactions data, Association rules, FP-growt

    Clustering of rainfall data using k-means algorithm

    Get PDF
    Clustering algorithms in data mining is the method for extracting useful information for a given data. It can precisely analyze the volume of data produced by modern applications. The main goal of clustering is to categorize data into clusters according to similarities, traits and behavior. This study aims to describe regional cluster pattern of rainfall based on maximum daily rainfall in Johor, Malaysia. K-Means algorithm is used to obtain optimal rainfall clusters. This clustering is expected to serve as an analysis tool for a decision making to assist hydrologist in the water research problem

    Parallel Methods for Mining Frequent Sequential patterns

    Get PDF
    The explosive growth of data and the rapid progress of technology have led to a huge amount of data that is collected every day. In that data volume contains much valuable information. Data mining is the emerging field of applying statistical and artificial intelligence techniques to the problem of finding novel, useful and non-trivial patterns from large databases. It is the task of discovering interesting patterns from large amounts of data. This is achieved by determining both implicit and explicit unidentified patterns in data that can direct the process of decision making. There are many data mining tasks, such as classification, clustering, association rule mining and sequential pattern mining. In that, sequential pattern mining is an important problem in data mining. It provides an effective way to analyze the sequence data. The goal of sequential pattern mining is to discover interesting, unexpected and useful patterns from sequence databases. This task is used in many wide applications such as financial data analysis of banks, retail industry, customer shopping history, goods transportation, consumption and services, telecommunication industry, biological data analysis, scientific applications, network intrusion detection, scientific research, etc. Different types of sequential pattern mining can be performed, they are sequential patterns, maximal sequential patterns, closed sequences, constraint based and time interval based sequential patterns. Sequential pattern mining refers to the identification of frequent subsequences in sequence databases as patterns. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent sequential patterns, in which the downward closure property plays a fundamental role. Sequential pattern is a sequence of itemsets that frequently occur in a specific order, where all items in the same itemsets are supposed to have the same transaction time value. One of the challenges for sequential pattern mining is the computational costs beside that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for sequential pattern mining and develop parallel methods for mining frequent sequential patterns in sequence databases that can tackle emerging data processing workloads while coping with larger and larger scales.The explosive growth of data and the rapid progress of technology have led to a huge amount of data that is collected every day. In that data volume contains much valuable information. Data mining is the emerging field of applying statistical and artificial intelligence techniques to the problem of finding novel, useful and non-trivial patterns from large databases. It is the task of discovering interesting patterns from large amounts of data. This is achieved by determining both implicit and explicit unidentified patterns in data that can direct the process of decision making. There are many data mining tasks, such as classification, clustering, association rule mining and sequential pattern mining. In that, sequential pattern mining is an important problem in data mining. It provides an effective way to analyze the sequence data. The goal of sequential pattern mining is to discover interesting, unexpected and useful patterns from sequence databases. This task is used in many wide applications such as financial data analysis of banks, retail industry, customer shopping history, goods transportation, consumption and services, telecommunication industry, biological data analysis, scientific applications, network intrusion detection, scientific research, etc. Different types of sequential pattern mining can be performed, they are sequential patterns, maximal sequential patterns, closed sequences, constraint based and time interval based sequential patterns. Sequential pattern mining refers to the identification of frequent subsequences in sequence databases as patterns. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent sequential patterns, in which the downward closure property plays a fundamental role. Sequential pattern is a sequence of itemsets that frequently occur in a specific order, where all items in the same itemsets are supposed to have the same transaction time value. One of the challenges for sequential pattern mining is the computational costs beside that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for sequential pattern mining and develop parallel methods for mining frequent sequential patterns in sequence databases that can tackle emerging data processing workloads while coping with larger and larger scales.460 - Katedra informatikyvyhově

    Data Mining untuk Menganalisis Gagal Serah Dana pada Transaksi Jual Beli Saham

    Full text link
    Competitive environment among investment firms encourage them to be able to conduct analysis related to investments activities with the potential failure of transfers\u27 funds. The purpose of this writing is to analyze the needs of the information required in the process of analysis the potential failure of funds transferring in a process of buying and selling stocks and using data mining techniques (Clustering) to support analysis process of the failure of fund transfrerring in a process of buying and selling stocks . Data collection methods used in this study consists of: Literature study, interviews and observations. While the implementation phase of data mining techniques consist of: Data Cleaning, Data Integration, Data Selection, Data Transformation, Data Mining, Pattern Evaluation and Knowledge Presentation. Data mining technique is used to analyze the data in order to know which transactions can lead to failed transfers funds and group them based on Receiving Failure. The clustering result will produce knowledge that is useful in the decision making especially to the top management level. The conclusions of this writing is using data mining techniques that have been designed in this writing, help the management take decisions with more precise so as to avoid losses for company

    A Location Analytics Method for the Utilisation of Geotagged Photos in Travel Marketing Decision-Making

    Get PDF
    Location analytics offers statistical analysis of any geo- or spatial data concerning user location. Such analytics can produce useful insights into the attractions of interest to travellers or visitation patterns of a demographic group. Based on these insights, strategic decision-making by travel marketing agents, such as travel package design, may be improved. In this paper, we develop and evaluate an original method of location analytics to analyse travellers' social media data for improving managerial decision support. The method proposes an architectural framework that combines emerging pattern data mining techniques with image processing to identify and process appropriate data content. The design artefact is evaluated through a focus group and a detailed case study of Australian outbound travellers. The proposed method is generic, and can be applied to other specific locations or demographics to provide analytical outcomes useful for strategic decision support
    corecore