746 research outputs found

    Fuzzy Orderings for Fuzzy Gradual Patterns

    Full text link

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered

    Unsupervised machine learning approach for building composite indicators with fuzzy metrics

    Full text link
    [EN] This study aims at developing a new methodological approach for building composite indicators, focusingon the weight schemes through an unsupervised machine learning technique. The composite indicatorproposed is based on fuzzy metrics to capture multidimensional concepts that do not have boundaries, suchas competitiveness, development, corruption or vulnerability. This methodology is designed for formativemeasurement models using a set of indicators measured on different scales (quantitative, ordinal and binary)and it is partially compensatory. Under a benchmarking approach, the single indicators are synthesized.The optimization method applied manages to remove the overlapping information provided for the singleindicators, so that the composite indicator provides a more realistic and faithful approximation to the conceptwhich would be studied. It has been quantitatively and qualitatively validated with a set of randomizeddatabases covering extreme and usual cases.This work was supported by the project FEDER-University of Granada (B-SEJ-242.UGR20), 2021-2023: An innovative methodological approach for measuring multidimensional poverty in Andalusia (COMPOSITE). Eduardo Jimenez-Fernandez would also like to thank the support received from Universitat Jaume I under the grant E-2018-03.Jiménez Fernández, E.; Sánchez, A.; Sánchez Pérez, EA. (2022). Unsupervised machine learning approach for building composite indicators with fuzzy metrics. Expert Systems with Applications. 200:1-11. https://doi.org/10.1016/j.eswa.2022.11692711120

    A method of classification for multisource data in remote sensing based on interval-valued probabilities

    Get PDF
    An axiomatic approach to intervalued (IV) probabilities is presented, where the IV probability is defined by a pair of set-theoretic functions which satisfy some pre-specified axioms. On the basis of this approach representation of statistical evidence and combination of multiple bodies of evidence are emphasized. Although IV probabilities provide an innovative means for the representation and combination of evidential information, they make the decision process rather complicated. It entails more intelligent strategies for making decisions. The development of decision rules over IV probabilities is discussed from the viewpoint of statistical pattern recognition. The proposed method, so called evidential reasoning method, is applied to the ground-cover classification of a multisource data set consisting of Multispectral Scanner (MSS) data, Synthetic Aperture Radar (SAR) data, and digital terrain data such as elevation, slope, and aspect. By treating the data sources separately, the method is able to capture both parametric and nonparametric information and to combine them. Then the method is applied to two separate cases of classifying multiband data obtained by a single sensor. In each case a set of multiple sources is obtained by dividing the dimensionally huge data into smaller and more manageable pieces based on the global statistical correlation information. By a divide-and-combine process, the method is able to utilize more features than the conventional maximum likelihood method

    On the edges of clustering

    Get PDF

    Method of Classification for Multisource Data in Remote Sensing Based on Interval-VaIued Probabilities

    Get PDF
    This work was supported by NASA Grant No. NAGW-925 “Earth Observation Research - Using Multistage EOS-Iike Data” (Principal lnvestigators: David A. Landgrebe and Chris Johannsen). The Anderson River SAR/MSS data set was acquired, preprocessed, and loaned to us by the Canada Centre for Remote Sensing, Department of Energy Mines, and Resources, of the Government of Canada. The importance of utilizing multisource data in ground-cover^ classification lies in the fact that improvements in classification accuracy can be achieved at the expense of additional independent features provided by separate sensors. However, it should be recognized that information and knowledge from most available data sources in the real world are neither certain nor complete. We refer to such a body of uncertain, incomplete, and sometimes inconsistent information as “evidential information.” The objective of this research is to develop a mathematical framework within which various applications can be made with multisource data in remote sensing and geographic information systems. The methodology described in this report has evolved from “evidential reasoning,” where each data source is considered as providing a body of evidence with a certain degree of belief. The degrees of belief based on the body of evidence are represented by “interval-valued (IV) probabilities” rather than by conventional point-valued probabilities so that uncertainty can be embedded in the measures. There are three fundamental problems in the muItisource data analysis based on IV probabilities: (1) how to represent bodies of evidence by IV probabilities, (2) how to combine IV probabilities to give an overall assessment of the combined body of evidence, and (3) how to make a decision when the statistical evidence is given by IV probabilities. This report first introduces an axiomatic approach to IV probabilities, where the IV probability is defined by a pair of set-theoretic functions which satisfy some pre-specified axioms. On the basis of this approach the report focuses on representation of statistical evidence by IV probabilities and combination of multiple bodies of evidence. Although IV probabilities provide an innovative means for the representation and combination of evidential information, they make the decision process rather complicated. It entails more intelligent strategies for making decisions. This report also focuses on the development of decision rules over IV probabilities from the viewpoint of statistical pattern recognition The proposed method, so called “evidential reasoning” method, is applied to the ground-cover classification of a multisource data set consisting of Multispectral Scanner (MSS) data* Synthetic Aperture Radar (SAR) data, and digital terrain data such as elevation, slope, and aspect. By treating the data sources separately, the method is able to capture both parametric and nonparametric information and to combine them. Then the method is applied to two separate cases of classifying multiband data obtained by a single sensor, in each case, a set of multiple sources is obtained by dividing the dimensionally huge data into smaller and more manageable pieces based on the global statistical correlation information. By a Divide-and-Combine process, the method is able to utilize more features than the conventional Maximum Likelihood method

    構造化データに対する予測手法:グラフ,順序,時系列

    Get PDF
    京都大学新制・課程博士博士(情報学)甲第23439号情博第769号新制||情||131(附属図書館)京都大学大学院情報学研究科知能情報学専攻(主査)教授 鹿島 久嗣, 教授 山本 章博, 教授 阿久津 達也学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    Ensemble methods for ranking data with and without position weights

    Get PDF
    The main goal of this Thesis is to build suitable Ensemble Methods for ranking data with weights assigned to the items’positions, in the cases of rankings with and without ties. The Thesis begins with the definition of a new rank correlation coefficient, able to take into account the importance of items’position. Inspired by the rank correlation coefficient, τ x , proposed by Emond and Mason (2002) for unweighted rankings and the weighted Kemeny distance proposed by García-Lapresta and Pérez-Román (2010), this work proposes τ x w , a new rank correlation coefficient corresponding to the weighted Kemeny distance. The new coefficient is analized analitically and empirically and represents the main core of the consensus ranking process. Simulations and applications to real cases are presented. In a second step, in order to detect which predictors better explain a phenomenon, the Thesis proposes decision trees for ranking data with and without weights, discussing and comparing the results. A simulation study is built up, showing the impact of different structures of weights on the ability of decision trees to describe data. In the third part, ensemble methods for ranking data, more specifically Bagging and Boosting, are introduced. Last but not least, a review on a different topic is inserted in this Thesis. The review compares a significant number of linear mixed model selection procedures available in the literature. The review represents the answer to a pressing issue in the framework of LMMs: how to identify the best approach to adopt in a specific case. The work outlines mainly all approaches found in literature. This review represents my first academic training in making research.The main goal of this Thesis is to build suitable Ensemble Methods for ranking data with weights assigned to the items’positions, in the cases of rankings with and without ties. The Thesis begins with the definition of a new rank correlation coefficient, able to take into account the importance of items’position. Inspired by the rank correlation coefficient, τ x , proposed by Emond and Mason (2002) for unweighted rankings and the weighted Kemeny distance proposed by García-Lapresta and Pérez-Román (2010), this work proposes τ x w , a new rank correlation coefficient corresponding to the weighted Kemeny distance. The new coefficient is analized analitically and empirically and represents the main core of the consensus ranking process. Simulations and applications to real cases are presented. In a second step, in order to detect which predictors better explain a phenomenon, the Thesis proposes decision trees for ranking data with and without weights, discussing and comparing the results. A simulation study is built up, showing the impact of different structures of weights on the ability of decision trees to describe data. In the third part, ensemble methods for ranking data, more specifically Bagging and Boosting, are introduced. Last but not least, a review on a different topic is inserted in this Thesis. The review compares a significant number of linear mixed model selection procedures available in the literature. The review represents the answer to a pressing issue in the framework of LMMs: how to identify the best approach to adopt in a specific case. The work outlines mainly all approaches found in literature. This review represents my first academic training in making research
    corecore