7 research outputs found

    A Multi-Attribute Group Decision Approach Based on Rough Set Theory and Application in Supply Chain Partner Selection

    Get PDF
    In multi-attribute group decision, decision makers (DMs) are willing or able to provide only incomplete information because of time pressure, lack of knowledge or data, and their limited expertise related with problem domain, so the alternative sets judged by different decision makers are inconsistent in allusion to a certain decision problem, how to form consistent alternative sets becomes a very important problem. There have been a few studies considering incomplete information in group settings, but few papers consider the adjustment of inconsistent alternative sets. We suggest a method, utilizing individual decision results to form consistent alternative sets based on Rough Set theory. The method can be depicted as follows: (1) decision matrix of every decision maker is transformed to decision table through an new discretization algorithm of condition attributes ; (2) we analyze the harmony of decision table of every DM in order to filter some extra alternatives with the result that new alternative sets are formed; (3) if the new alternative sets of different DMs are inconsistent all the same, learning quality of DMs for any inconsistent alternative is a standard of accepting the alternative

    Global discretization of continuous attributes as preprocessing for machine learning

    Get PDF
    AbstractReal-life data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. In this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis, is presented and compared experimentally with three known local methods, transformed into global. Experiments include tenfold cross-validation and leaving-one-out methods for ten real-life data sets

    Measuring the functional sequence complexity of proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Abel and Trevors have delineated three aspects of sequence complexity, Random Sequence Complexity (RSC), Ordered Sequence Complexity (OSC) and Functional Sequence Complexity (FSC) observed in biosequences such as proteins. In this paper, we provide a method to measure functional sequence complexity.</p> <p>Methods and Results</p> <p>We have extended Shannon uncertainty by incorporating the data variable with a functionality variable. The resulting measured unit, which we call Functional bit (Fit), is calculated from the sequence data jointly with the defined functionality variable. To demonstrate the relevance to functional bioinformatics, a method to measure functional sequence complexity was developed and applied to 35 protein families. Considerations were made in determining how the measure can be used to correlate functionality when relating to the whole molecule and sub-molecule. In the experiment, we show that when the proposed measure is applied to the aligned protein sequences of ubiquitin, 6 of the 7 highest value sites correlate with the binding domain.</p> <p>Conclusion</p> <p>For future extensions, measures of functional bioinformatics may provide a means to evaluate potential evolving pathways from effects such as mutations, as well as analyzing the internal structural and functional relationships within the 3-D structure of proteins.</p

    Global Entropy Based Greedy Algorithm for discretization

    Get PDF
    Discretization algorithm is a crucial step to not only achieve summarization of continuous attributes but also better performance in classification that requires discrete values as input. In this thesis, I propose a supervised discretization method, Global Entropy Based Greedy algorithm, which is based on the Information Entropy Minimization. Experimental results show that the proposed method outperforms state of the art methods with well-known benchmarking datasets. To further improve the proposed method, a new approach for stop criterion that is based on the change rate of entropy was also explored. From the experimental analysis, it is noticed that the threshold based on the decreasing rate of entropy could be more effective than a constant number of intervals in the classification such as C5.0

    Pattern Discovery and Disentanglement for Clinical Data Analysis

    Get PDF
    In recent years, machine learning approaches have important empirical successes on analysing data such as images, signals, texts and speeches with applications in biomedical and clinical areas. However, from the perspective of modelling, many machine learning methods still encounter crucial problems such as the lack of transparency and interpretability. Frequent Pattern Mining or Association Mining methods intend to solve the problem of interpretability, but they also encounter serious problems such as requiring exhaustive search and producing overwhelming numbers of patterns. From the perspective of data analysis, they do not render high prediction accuracy particularly for data with low volume, rare or imbalanced groups, rare cases or biases due to subtle overlapping or entanglement of the statistical and functional associations at the data source level. Hence, Professor Andrew K.C. Wong and I have developed a novel Pattern Discovery and Disentanglement (PDD) Method to discover explicit patterns and unveil knowledge from relational datasets even encompassing imbalanced groups, biases and anomalies. The statistically significant high-order patterns, pattern clusters and rare patterns are discovered in the disentangled Attribute Value Association (AVA) Spaces. They may be embedded in a relational dataset but overlapping or entangled with each other so that they are masked or obscured at the data level. The patterns discovered from the disentangled association source can be used for explicitly interpreting the original data, predicting the functional groups/classes and detecting anomalies and/or outliers. When class labels are not given, pattern/entity clustering can be more effectively discovered from the disentangled attribute value association (AVA) space than from the original records. The objective of this Master Thesis is to develop and validate the efficacy of PDD for genomic and clinical data analysis using a) protein sequence data, b) public clinical records from UCI dataset and c) a clinical dataset obtained from the School of Public Health and Health Systems at the University of Waterloo. The experimental results with superior performance in unsupervised and supervised learning than existing methods are presented in interpretable knowledge representation frameworks, interlinking the AVA disentangled sources, patterns, pattern/entity clusters and individual entities. In the clinical cases, it reveals the symptomatic patterns of individual patients, disease complexes/groups and subtle etiological sources. Hence it will have impacts in machine learning on genomic and clinical data with broad applications

    COOPERATIVE QUERY ANSWERING FOR APPROXIMATE ANSWERS WITH NEARNESS MEASURE IN HIERARCHICAL STRUCTURE INFORMATION SYSTEMS

    Get PDF
    Cooperative query answering for approximate answers has been utilized in various problem domains. Many challenges in manufacturing information retrieval, such as: classifying parts into families in group technology implementation, choosing the closest alternatives or substitutions for an out-of-stock part, or finding similar existing parts for rapid prototyping, could be alleviated using the concept of cooperative query answering. Most cooperative query answering techniques proposed by researchers so far concentrate on simple queries or single table information retrieval. Query relaxations in searching for approximate answers are mostly limited to attribute value substitutions. Many hierarchical structure information systems, such as manufacturing information systems, store their data in multiple tables that are connected to each other using hierarchical relationships - "aggregation", "generalization/specialization", "classification", and "category". Due to the nature of hierarchical structure information systems, information retrieval in such domains usually involves nested or jointed queries. In addition, searching for approximate answers in hierarchical structure databases not only considers attribute value substitutions, but also must take into account attribute or relation substitutions (i.e., WIDTH to DIAMETER, HOLE to GROOVE). For example, shape transformations of parts or features are possible and commonly practiced. A bar could be transformed to a rod. Such characteristics of hierarchical information systems, simple query or single-relation query relaxation techniques used in most cooperative query answering systems are not adequate. In this research, we proposed techniques for neighbor knowledge constructions, and complex query relaxations. We enhanced the original Pattern-based Knowledge Induction (PKI) and Distribution Sensitive Clustering (DISC) so that they can be used in neighbor hierarchy constructions at both tuple and attribute levels. We developed a cooperative query answering model to facilitate the approximate answer searching for complex queries. Our cooperative query answering model is comprised of algorithms for determining the causes of null answer, expanding qualified tuple set, expanding intersected tuple set, and relaxing multiple condition simultaneously. To calculate the semantic nearness between exact-match answers and approximate answers, we also proposed a nearness measuring function, called "Block Nearness", that is appropriate for the query relaxation methods proposed in this research

    Rough Set Based Rule Evaluations and Their Applications

    Get PDF
    Knowledge discovery is an important process in data analysis, data mining and machine learning. Typically knowledge is presented in the form of rules. However, knowledge discovery systems often generate a huge amount of rules. One of the challenges we face is how to automatically discover interesting and meaningful knowledge from such discovered rules. It is infeasible for human beings to select important and interesting rules manually. How to provide a measure to evaluate the qualities of rules in order to facilitate the understanding of data mining results becomes our focus. In this thesis, we present a series of rule evaluation techniques for the purpose of facilitating the knowledge understanding process. These evaluation techniques help not only to reduce the number of rules, but also to extract higher quality rules. Empirical studies on both artificial data sets and real world data sets demonstrate how such techniques can contribute to practical systems such as ones for medical diagnosis and web personalization. In the first part of this thesis, we discuss several rule evaluation techniques that are proposed towards rule postprocessing. We show how properly defined rule templates can be used as a rule evaluation approach. We propose two rough set based measures, a Rule Importance Measure, and a Rules-As-Attributes Measure, %a measure of considering rules as attributes, to rank the important and interesting rules. In the second part of this thesis, we show how data preprocessing can help with rule evaluation. Because well preprocessed data is essential for important rule generation, we propose a new approach for processing missing attribute values for enhancing the generated rules. In the third part of this thesis, a rough set based rule evaluation system is demonstrated to show the effectiveness of the measures proposed in this thesis. Furthermore, a new user-centric web personalization system is used as a case study to demonstrate how the proposed evaluation measures can be used in an actual application
    corecore