Search CORE

7 research outputs found

A Multi-Attribute Group Decision Approach Based on Rough Set Theory and Application in Supply Chain Partner Selection

Author: Hua Zhongsheng
Jiang Wenqi
Publication venue: AIS Electronic Library (AISeL)
Publication date: 05/12/2004
Field of study

In multi-attribute group decision, decision makers (DMs) are willing or able to provide only incomplete information because of time pressure, lack of knowledge or data, and their limited expertise related with problem domain, so the alternative sets judged by different decision makers are inconsistent in allusion to a certain decision problem, how to form consistent alternative sets becomes a very important problem. There have been a few studies considering incomplete information in group settings, but few papers consider the adjustment of inconsistent alternative sets. We suggest a method, utilizing individual decision results to form consistent alternative sets based on Rough Set theory. The method can be depicted as follows: (1) decision matrix of every decision maker is transformed to decision table through an new discretization algorithm of condition attributes ; (2) we analyze the harmony of decision table of every DM in order to filter some extra alternatives with the result that new alternative sets are formed; (3) if the new alternative sets of different DMs are inconsistent all the same, learning quality of DMs for any inconsistent alternative is a standard of accepting the alternative

AIS Electronic Library (AISeL)

Global discretization of continuous attributes as preprocessing for machine learning

Author: Chmielewski Michal R.
Grzymala-Busse Jerzy W.
Publication venue: Published by Elsevier Inc.
Publication date: 01/11/1996
Field of study

AbstractReal-life data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. In this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis, is presented and compared experimentally with three known local methods, transformed into global. Experiments include tenfold cross-validation and leaving-one-out methods for ten real-life data sets

Elsevier - Publisher Connector

KU ScholarWorks

Measuring the functional sequence complexity of proteins

Author: A Gammerman
AD Ellington
AKC Wong
AKC Wong
C Shannon
C Tuerk
David KY Chiu
David L Abel
DKY Chiu
DKY Chiu
DKY Chiu
DKY Chiu
DL Abel
DL Abel
DL Abel
DL Robertson
G Ertem
G Steinman
H Kobayashi
H Liao
HP Yockey
J Griesemer
Jack T Trevors
JF Chaparro-Riggers
JW Szostak
Kirk K Durston
KK Durston
L Gao
LM Rocha
LM Rocha
M Barbieri
M Oti
M Ronshaugen
MB Gerstein
O Weiss
PD Karp
R Backofen
S Oyama
WJL Cook
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Abel and Trevors have delineated three aspects of sequence complexity, Random Sequence Complexity (RSC), Ordered Sequence Complexity (OSC) and Functional Sequence Complexity (FSC) observed in biosequences such as proteins. In this paper, we provide a method to measure functional sequence complexity. Methods and Results We have extended Shannon uncertainty by incorporating the data variable with a functionality variable. The resulting measured unit, which we call Functional bit (Fit), is calculated from the sequence data jointly with the defined functionality variable. To demonstrate the relevance to functional bioinformatics, a method to measure functional sequence complexity was developed and applied to 35 protein families. Considerations were made in determining how the measure can be used to correlate functionality when relating to the whole molecule and sub-molecule. In the experiment, we show that when the proposed measure is applied to the aligned protein sequences of ubiquitin, 6 of the 7 highest value sites correlate with the binding domain. Conclusion For future extensions, measures of functional bioinformatics may provide a means to evaluate potential evolving pathways from effects such as mutations, as well as analyzing the internal structural and functional relationships within the 3-D structure of proteins.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Global Entropy Based Greedy Algorithm for discretization

Author: Jonnalagadda Sai Jyothsna
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/05/2016
Field of study

Discretization algorithm is a crucial step to not only achieve summarization of continuous attributes but also better performance in classification that requires discrete values as input. In this thesis, I propose a supervised discretization method, Global Entropy Based Greedy algorithm, which is based on the Information Entropy Minimization. Experimental results show that the proposed method outperforms state of the art methods with well-known benchmarking datasets. To further improve the proposed method, a new approach for stop criterion that is based on the change rate of entropy was also explored. From the experimental analysis, it is noticed that the threshold based on the decreasing rate of entropy could be more effective than a constant number of intervals in the classification such as C5.0

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Pattern Discovery and Disentanglement for Clinical Data Analysis

Author: Zhou Pei-Yuan
Publication venue: 'University of Waterloo'
Publication date: 27/08/2020
Field of study

In recent years, machine learning approaches have important empirical successes on analysing data such as images, signals, texts and speeches with applications in biomedical and clinical areas. However, from the perspective of modelling, many machine learning methods still encounter crucial problems such as the lack of transparency and interpretability. Frequent Pattern Mining or Association Mining methods intend to solve the problem of interpretability, but they also encounter serious problems such as requiring exhaustive search and producing overwhelming numbers of patterns. From the perspective of data analysis, they do not render high prediction accuracy particularly for data with low volume, rare or imbalanced groups, rare cases or biases due to subtle overlapping or entanglement of the statistical and functional associations at the data source level. Hence, Professor Andrew K.C. Wong and I have developed a novel Pattern Discovery and Disentanglement (PDD) Method to discover explicit patterns and unveil knowledge from relational datasets even encompassing imbalanced groups, biases and anomalies. The statistically significant high-order patterns, pattern clusters and rare patterns are discovered in the disentangled Attribute Value Association (AVA) Spaces. They may be embedded in a relational dataset but overlapping or entangled with each other so that they are masked or obscured at the data level. The patterns discovered from the disentangled association source can be used for explicitly interpreting the original data, predicting the functional groups/classes and detecting anomalies and/or outliers. When class labels are not given, pattern/entity clustering can be more effectively discovered from the disentangled attribute value association (AVA) space than from the original records. The objective of this Master Thesis is to develop and validate the efficacy of PDD for genomic and clinical data analysis using a) protein sequence data, b) public clinical records from UCI dataset and c) a clinical dataset obtained from the School of Public Health and Health Systems at the University of Waterloo. The experimental results with superior performance in unsupervised and supervised learning than existing methods are presented in interpretable knowledge representation frameworks, interlinking the AVA disentangled sources, patterns, pattern/entity clusters and individual entities. In the clinical cases, it reveals the symptomatic patterns of individual patients, disease complexes/groups and subtle etiological sources. Hence it will have impacts in machine learning on genomic and clinical data with broad applications

University of Waterloo's Institutional Repository

COOPERATIVE QUERY ANSWERING FOR APPROXIMATE ANSWERS WITH NEARNESS MEASURE IN HIERARCHICAL STRUCTURE INFORMATION SYSTEMS

Author: Puthpongsiriporn Thanit
Publication venue
Publication date: 05/09/2002
Field of study

Cooperative query answering for approximate answers has been utilized in various problem domains. Many challenges in manufacturing information retrieval, such as: classifying parts into families in group technology implementation, choosing the closest alternatives or substitutions for an out-of-stock part, or finding similar existing parts for rapid prototyping, could be alleviated using the concept of cooperative query answering. Most cooperative query answering techniques proposed by researchers so far concentrate on simple queries or single table information retrieval. Query relaxations in searching for approximate answers are mostly limited to attribute value substitutions. Many hierarchical structure information systems, such as manufacturing information systems, store their data in multiple tables that are connected to each other using hierarchical relationships - "aggregation", "generalization/specialization", "classification", and "category". Due to the nature of hierarchical structure information systems, information retrieval in such domains usually involves nested or jointed queries. In addition, searching for approximate answers in hierarchical structure databases not only considers attribute value substitutions, but also must take into account attribute or relation substitutions (i.e., WIDTH to DIAMETER, HOLE to GROOVE). For example, shape transformations of parts or features are possible and commonly practiced. A bar could be transformed to a rod. Such characteristics of hierarchical information systems, simple query or single-relation query relaxation techniques used in most cooperative query answering systems are not adequate. In this research, we proposed techniques for neighbor knowledge constructions, and complex query relaxations. We enhanced the original Pattern-based Knowledge Induction (PKI) and Distribution Sensitive Clustering (DISC) so that they can be used in neighbor hierarchy constructions at both tuple and attribute levels. We developed a cooperative query answering model to facilitate the approximate answer searching for complex queries. Our cooperative query answering model is comprised of algorithms for determining the causes of null answer, expanding qualified tuple set, expanding intersected tuple set, and relaxing multiple condition simultaneously. To calculate the semantic nearness between exact-match answers and approximate answers, we also proposed a nearness measuring function, called "Block Nearness", that is appropriate for the query relaxation methods proposed in this research

D-Scholarship@Pitt

Rough Set Based Rule Evaluations and Their Applications

Author: Li Jiye
Publication venue: 'University of Waterloo'
Publication date: 01/01/2007
Field of study

Knowledge discovery is an important process in data analysis, data mining and machine learning. Typically knowledge is presented in the form of rules. However, knowledge discovery systems often generate a huge amount of rules. One of the challenges we face is how to automatically discover interesting and meaningful knowledge from such discovered rules. It is infeasible for human beings to select important and interesting rules manually. How to provide a measure to evaluate the qualities of rules in order to facilitate the understanding of data mining results becomes our focus. In this thesis, we present a series of rule evaluation techniques for the purpose of facilitating the knowledge understanding process. These evaluation techniques help not only to reduce the number of rules, but also to extract higher quality rules. Empirical studies on both artificial data sets and real world data sets demonstrate how such techniques can contribute to practical systems such as ones for medical diagnosis and web personalization. In the first part of this thesis, we discuss several rule evaluation techniques that are proposed towards rule postprocessing. We show how properly defined rule templates can be used as a rule evaluation approach. We propose two rough set based measures, a Rule Importance Measure, and a Rules-As-Attributes Measure, %a measure of considering rules as attributes, to rank the important and interesting rules. In the second part of this thesis, we show how data preprocessing can help with rule evaluation. Because well preprocessed data is essential for important rule generation, we propose a new approach for processing missing attribute values for enhancing the generated rules. In the third part of this thesis, a rough set based rule evaluation system is demonstrated to show the effectiveness of the measures proposed in this thesis. Furthermore, a new user-centric web personalization system is used as a case study to demonstrate how the proposed evaluation measures can be used in an actual application

University of Waterloo's Institutional Repository