961 research outputs found

    SemGrAM - Integrating semantic graphs into association rule mining

    Get PDF
    To date, most association rule mining algorithms have assumed that the domains of items are either discrete or, in a limited number of cases, hierarchical, categorical or linear. This constrains the search for interesting rules to those that satisfy the specified quality metrics as independent values or as higher level concepts of those values. However, in many cases the determination of a single hierarchy is not practicable and, for many datasets, an item’s value may be taken from a domain that is more conveniently structured as a graph with weights indicating semantic (or conceptual) distance. Research in the development of algorithms that generate disjunctive association rules has allowed the production of rules such as Radios V TVs -> Cables. In many cases there is little semantic relationship between the disjunctive terms and arguably less readable rules such as Radios V Tuesday -> Cables can result. This paper describes two association rule mining algorithms, SemGrAMG and SemGrAMP, that accommodate conceptual distance information contained in a semantic graph. The SemGrAM algorithms permit the discovery of rules that include an association between sets of cognate groups of item values. The paper discusses the algorithms, the design decisions made during their development and some experimental results.Sydney, NS

    Related Study of Soft Set and Its Application A Review

    Get PDF
    Abstract In the present paper some literature related to soft sets are collected. The literature is motivated by Molodsov

    Constraint-based sequence mining using constraint programming

    Full text link
    The goal of constraint-based sequence mining is to find sequences of symbols that are included in a large number of input sequences and that satisfy some constraints specified by the user. Many constraints have been proposed in the literature, but a general framework is still missing. We investigate the use of constraint programming as general framework for this task. We first identify four categories of constraints that are applicable to sequence mining. We then propose two constraint programming formulations. The first formulation introduces a new global constraint called exists-embedding. This formulation is the most efficient but does not support one type of constraint. To support such constraints, we develop a second formulation that is more general but incurs more overhead. Both formulations can use the projected database technique used in specialised algorithms. Experiments demonstrate the flexibility towards constraint-based settings and compare the approach to existing methods.Comment: In Integration of AI and OR Techniques in Constraint Programming (CPAIOR), 201

    Development of Association Rule Mining with Efficient Positive and Negative Rules

    Get PDF
    Association rule mining (ARM) is one of the most researched areas of data mining and recently from the database community it has received much attention. In the marketing and retail communities, they are proven to be quite useful in the other more diverse fields. On this area some of the previous research is done, the concept behind association rules are provided at the beginning followed by an overview to some research. The advantages and limitations are concluded with an inference. There are several algorithms, in frequent pattern mining. The classical and most famous algorithm is Apriori. To find frequent item sets and association between different items sets is the objective of using Apriori algorithm, i.e. association rule. In this paper author considers data (Online Seller transaction data) and tries to obtain the results using weak a data mining tool. To find out best combination, association rule algorithm are used of different attributes in any data

    Dualisation, decision lists and identification of monotone discrete functions

    Get PDF
    Many data-analysis algorithms in machine learning, datamining and a variety of other disciplines essentially operate on discrete multi-attribute data sets. By means of discretisation or binarisation also numerical data sets can be successfully analysed. Therefore, in this paper we view/introduce the theory of (partially defined) discrete functions as an important theoretical tool for the analysis of multi-attribute data sets. In particular we study monotone (partially defined) discrete functions. Compared with the theory of Boolean functions relatively little is known about (partially defined) monotone discrete functions. It appears that decision lists are useful for the representation of monotone discrete functions. Since dualisation is an important tool in the theory of (monotone) Boolean functions, we study the interpretation and properties of the dual of a (monotone) binary or discrete function. We also introduce the dual of a pseudo-Boolean function. The results are used to investigate extensions of partially defined monotone discrete functions and the identification of monotone discrete functions. In particular we present a polynomial time algorithm for the identification of so-called stable discrete functions

    Feature Model Mining

    Get PDF
    Software systems have grown larger and more complex in recent years. Generative software development strives to automate software development from a systems family by generating implementations using domain-specific languages. In current practice, specifying domain-specific languages is a manual task requiring expert analysis of multiple information sources. Furthermore, the concepts and relations represented in a language are grown through its usage. Keeping the language consistent with its usage is a time-consuming process requiring manual comparison between the language instances and its language specification. Feature model mining addresses these issues by synthesizing a representative model bottom-up from a sample set of instances called configurations. This thesis presents a mining algorithm that reverse-engineers a probabilistic feature model from a set of individual configurations. A configuration consists of a list of features that are defined as system properties that a stakeholder is interested in. Probabilistic expressions are retrieved from the sample configurations through the use of conjunctive and disjunctive association rule mining. These expressions are used to construct a probabilistic feature model. The mined feature model consists of a hierarchy of features, a set of additional hard constraints and soft constraints. The hierarchy describes the dependencies and alternative relations exhibited among the features. The additional hard constraints are a set of propositional formulas which must be satisfied in a legal configuration. Soft constraints describe likely defaults or common patterns. Systems families are often realized using object-oriented frameworks that provide reusable designs for constructing a family of applications. The mining algorithm is evaluated on a set of applications to retrieve a metamodel of the Java Applet framework. The feature model is then applied to the development of framework-specific modeling languages (FSMLs). FSMLs are domain-specific languages that model the framework-provided concepts and their rules for development. The work presented in this thesis provides the foundation for further research in feature model mining. The strengths and weaknesses of the algorithm are analyzed and the thesis concludes with a discussion of possible extensions.</p

    Local search for efficient causal effect estimation

    Full text link
    Causal effect estimation from observational data is an important but challenging problem. Causal effect estimation with unobserved variables in data is even more difficult. The challenges lie in (1) whether the causal effect can be estimated from observational data (identifiability); (2) accuracy of estimation (unbiasedness), and (3) fast data-driven algorithm for the estimation (efficiency). Each of the above problems by its own, is challenging. There does not exist many data-driven methods for causal effect estimation so far, and they solve one or two of the above problems, but not all. In this paper, we present an algorithm that is fast, unbiased and is able to confirm if a causal effect is identifiable or not under a very practical and commonly seen problem setting. To achieve high efficiency, we approach the causal effect estimation problem as a local search for the minimal adjustment variable sets in data. We have shown that identifiability and unbiased estimation can be both resolved using data in our problem setting, and we have developed theorems to support the local search for searching for adjustment variable sets to achieve unbiased causal effect estimation. We make use of frequent pattern mining strategy to further speed up the search process. Experiments performed on an extensive collection of synthetic and real-world datasets demonstrate that the proposed algorithm outperforms the state-of-the-art causal effect estimation methods in both accuracy and time-efficiency.Comment: 30 page

    Induction of accurate and interpretable fuzzy rules from preliminary crisp representation

    Get PDF
    This paper proposes a novel approach for building transparent knowledge-based systems by generating accurate and interpretable fuzzy rules. The learning mechanism reported here induces fuzzy rules via making use of only predefined fuzzy labels that reflect prescribed notations and domain expertise, thereby ensuring transparency in the knowledge model adopted for problem solving. It works by mapping every coarsely learned crisp production rule in the knowledge base onto a set of potentially useful fuzzy rules, which serves as an initial step towards an intuitive technique for similarity-based rule generalisation. This is followed by a procedure that locally selects a compact subset of the emerging fuzzy rules, so that the resulting subset collectively generalises the underlying original crisp rule. The outcome of this local procedure forms the input to a global genetic search process, which seeks for a trade-off between accuracy and complexity of the eventually induced fuzzy rule base while maintaining transparency. Systematic experimental results are provided to demonstrate that the induced fuzzy knowledge base is of high performance and interpretabilitypublishersversionPeer reviewe
    • …
    corecore