5,651 research outputs found

    Predictive User Modeling with Actionable Attributes

    Get PDF
    Different machine learning techniques have been proposed and used for modeling individual and group user needs, interests and preferences. In the traditional predictive modeling instances are described by observable variables, called attributes. The goal is to learn a model for predicting the target variable for unseen instances. For example, for marketing purposes a company consider profiling a new user based on her observed web browsing behavior, referral keywords or other relevant information. In many real world applications the values of some attributes are not only observable, but can be actively decided by a decision maker. Furthermore, in some of such applications the decision maker is interested not only to generate accurate predictions, but to maximize the probability of the desired outcome. For example, a direct marketing manager can choose which type of a special offer to send to a client (actionable attribute), hoping that the right choice will result in a positive response with a higher probability. We study how to learn to choose the value of an actionable attribute in order to maximize the probability of a desired outcome in predictive modeling. We emphasize that not all instances are equally sensitive to changes in actions. Accurate choice of an action is critical for those instances, which are on the borderline (e.g. users who do not have a strong opinion one way or the other). We formulate three supervised learning approaches for learning to select the value of an actionable attribute at an instance level. We also introduce a focused training procedure which puts more emphasis on the situations where varying the action is the most likely to take the effect. The proof of concept experimental validation on two real-world case studies in web analytics and e-learning domains highlights the potential of the proposed approaches

    A comprehensive study of implicator-conjunctor based and noise-tolerant fuzzy rough sets: definitions, properties and robustness analysis

    Get PDF
    © 2014 Elsevier B.V. Both rough and fuzzy set theories offer interesting tools for dealing with imperfect data: while the former allows us to work with uncertain and incomplete information, the latter provides a formal setting for vague concepts. The two theories are highly compatible, and since the late 1980s many researchers have studied their hybridization. In this paper, we critically evaluate most relevant fuzzy rough set models proposed in the literature. To this end, we establish a formally correct and unified mathematical framework for them. Both implicator-conjunctor-based definitions and noise-tolerant models are studied. We evaluate these models on two different fronts: firstly, we discuss which properties of the original rough set model can be maintained and secondly, we examine how robust they are against both class and attribute noise. By highlighting the benefits and drawbacks of the different fuzzy rough set models, this study appears a necessary first step to propose and develop new models in future research.Lynn D’eer has been supported by the Ghent University Special Research Fund, Chris Cornelis was partially supported by the Spanish Ministry of Science and Technology under the project TIN2011-28488 and the Andalusian Research Plans P11-TIC-7765 and P10-TIC-6858, and by project PYR-2014-8 of the Genil Program of CEI BioTic GRANADA and Lluis Godo has been partially supported by the Spanish MINECO project EdeTRI TIN2012-39348-C02-01Peer Reviewe

    Anytime Subgroup Discovery in Numerical Domains with Guarantees

    Get PDF
    International audienceSubgroup discovery is the task of discovering patterns that accurately discriminate a class label from the others. Existing approaches can uncover such patterns either through an exhaustive or an approximate exploration of the pattern search space. However, an exhaustive exploration is generally unfeasible whereas approximate approaches do not provide guarantees bounding the error of the best pattern quality nor the exploration progression ("How far are we of an exhaustive search"). We design here an algorithm for mining numerical data with three key properties w.r.t. the state of the art: (i) It yields progressively interval patterns whose quality improves over time; (ii) It can be interrupted anytime and always gives a guarantee bounding the error on the top pattern quality and (iii) It always bounds a distance to the exhaustive exploration. After reporting experimentations showing the effectiveness of our method, we discuss its generalization to other kinds of patterns

    Logical analysis of data as a tool for the analysis of probabilistic discrete choice behavior

    Get PDF
    Probabilistic Discrete Choice Models (PDCM) have been extensively used to interpret the behavior of heterogeneous decision makers that face discrete alternatives. The classification approach of Logical Analysis of Data (LAD) uses discrete optimization to generate patterns, which are logic formulas characterizing the different classes. Patterns can be seen as rules explaining the phenomenon under analysis. In this work we discuss how LAD can be used as the first phase of the specification of PDCM. Since in this task the number of patterns generated may be extremely large, and many of them may be nearly equivalent, additional processing is necessary to obtain practically meaningful information. Hence, we propose computationally viable techniques to obtain small sets of patterns that constitute meaningful representations of the phenomenon and allow to discover significant associations between subsets of explanatory variables and the output. We consider the complex socio-economic problem of the analysis of the utilization of the Internet in Italy, using real data gathered by the Italian National Institute of Statistics

    Effect of nano black rice husk ash on the chemical and physical properties of porous concrete pavement

    Get PDF
    Black rice husk is a waste from this agriculture industry. It has been found that majority inorganic element in rice husk is silica. In this study, the effect of Nano from black rice husk ash (BRHA) on the chemical and physical properties of concrete pavement was investigated. The BRHA produced from uncontrolled burning at rice factory was taken. It was then been ground using laboratory mill with steel balls and steel rods. Four different grinding grades of BRHA were examined. A rice husk ash dosage of 10% by weight of binder was used throughout the experiments. The chemical and physical properties of the Nano BRHA mixtures were evaluated using fineness test, X-ray Fluorescence spectrometer (XRF) and X-ray diffraction (XRD). In addition, the compressive strength test was used to evaluate the performance of porous concrete pavement. Generally, the results show that the optimum grinding time was 63 hours. The result also indicated that the use of Nano black rice husk ash ground for 63hours produced concrete with good strengt

    Recent advances in the theory and practice of logical analysis of data

    Get PDF
    Logical Analysis of Data (LAD) is a data analysis methodology introduced by Peter L. Hammer in 1986. LAD distinguishes itself from other classification and machine learning methods by the fact that it analyzes a significant subset of combinations of variables to describe the positive or negative nature of an observation and uses combinatorial techniques to extract models defined in terms of patterns. In recent years, the methodology has tremendously advanced through numerous theoretical developments and practical applications. In the present paper, we review the methodology and its recent advances, describe novel applications in engineering, finance, health care, and algorithmic techniques for some stochastic optimization problems, and provide a comparative description of LAD with well-known classification methods

    Rough set approach for categorical data clustering

    Get PDF
    A few techniques of rough categorical data clustering exist to group objects having similar characteristics. However, the performance of the techniques is an issue due to low accuracy, high computational complexity and clusters purity. This work proposes a new technique called Maximum Dependency Attributes (MDA) to improve the previous techniques due to these issues. The proposed technique is based on rough set theory by taking into account the dependency of attributes of an information system. The main contribution of this technique is to introduce a new technique to classify objects from categorical datasets which has better performance as compared to the baseline techniques. The algorithm of the proposed technique is implemented in MATLAB® version 7.6.0.324 (R2008a). They are executed sequentially on a processor Intel Core 2 Duo CPUs. The total main memory is 1 Gigabyte and the operating system is Windows XP Professional SP3. Results collected during the experiments on four small datasets and thirteen UCI benchmark datasets for selecting a clustering attribute show that the proposed MDA technique is an efficient approach in terms of accuracy and computational complexity as compared to BC, TR and MMR techniques. For the clusters purity, the results on Soybean and Zoo datasets show that MDA technique provided better purity up to 17% and 9%, respectively. The experimental result on supplier chain management clustering also demonstrates how MDA technique can contribute to practical system and establish the better performance for computation complexity and clusters purity up to 90% and 23%, respectively
    • …
    corecore