Search CORE

4 research outputs found

Data clustering using proximity matrices with missing values

Author: Karimzadeh Samira
Olafsson Sigurdur
Olafsson Sigurdur
Publication venue: Iowa State University Digital Repository
Publication date: 15/07/2019
Field of study

In most applications of data clustering the input data includes vectors describing the location of each data point, from which distances between data points can be calculated and a proximity matrix constructed. In some applications, however, the only available input is the proximity matrix, that is, the distances between each pair of data point. Several clustering algorithms can still be applied, but if the proximity matrix has missing values no standard method is directly applicable. Imputation can be done to replace missing values, but most imputation methods do not apply when only the proximity matrix is available. As a partial solution to fill this gap, we propose the Proximity Matrix Completion (PMC) algorithm. This algorithm assumes that data is missing due to one of two reasons: complete dissimilarity or incomplete observations; and imputes values accordingly. To determine which case applies the data is modeled as a graph and a set of maximum cliques in the graph is found. Overlap between cliques then determines the case and hence the method of imputation for each missing data point. This approach is motivated by an application in plant breeding, where what is needed is to cluster new experimental seed varieties into sets of varieties that interact similarly to the environment, and this application is presented as a case study in the paper. The applicability, limitations and performance of the new algorithm versus other methods of imputation are further studied by applying it to datasets derived from three well-known test datasets

Digital Repository @ Iowa State University (ISU)

Adaptive pairing of classifier and imputation methods based on the characteristics of missing values in data sets

Author: Akhila
Batista
Bifet
Chen
Dogan
Farhangfar
Fire
Gao
Ghannad-Rezaie
Hengpraphrom
Jaemun Sim
Jang
Jiang
Jones
Kang
Kim
Kun Chang Lee
Kwon
Li
Liu
Liu
Liu
Loh
Luengo
Nia
Ohbyung Kwon
Okamoto
Orczyk
Raudys
Rubin
Safavian
Silva
Sim
Sowe
Su
Suykens
Tan
Tran
Twala
van Leeuwen
Wang
Wasito
Wrzus
Wu
Xiang
Xiao
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Evidential reasoning for preprocessing uncertain categorical data for trustworthy decisions: An application on healthcare and finance

Author: Almaghrabi Fatima
Sachan Swati
Xu Dong-Ling
Yang Jian-Bo
Publication venue: 'Elsevier BV'
Publication date: 15/12/2021
Field of study

The uncertainty attributed by discrepant data in AI-enabled decisions is a critical challenge in highly regulated domains such as health care and finance. Ambiguity and incompleteness due to missing values in output and input attributes, respectively, is ubiquitous in these domains. It could have an adverse impact on a certain unrepresented set of people in the training data without a developer's intention to discriminate. The inherently non-numerical nature of categorical attributes than numerical attributes and the presence of incomplete and ambiguous categorical attributes in a dataset increases the uncertainty in decision-making. This paper addresses the challenges in handling categorical attributes as it is not addressed comprehensively in previous research. Three sources of uncertainties in categorical attributes are recognised in this research. The informational uncertainty, unforeseeable uncertainty in the decision task environment, and the uncertainty due to lack of pre-modelling explainability in categorical attributes are addressed in the proposed methodology on maximum likelihood evidential reasoning (MAKER). It can transform and impute incomplete and ambiguous categorical attributes into interpretable numerical features. It utilises a notion of weight and reliability to include subjective expert preference over a piece of evidence and the quality of evidence in a categorical attribute, respectively. The MAKER framework strives to integrate the recognised uncertainties in the transformed input data that allow a model to perceive data limitations during the training regime and acknowledge doubtful predictions by supporting trustworthy pre-modelling and post modelling explainability. The ability to handle uncertainty and its impact on explainability is demonstrated on a real-world healthcare and finance data for different missing data scenarios in three types of AI algorithms: deep-learning, tree-based, and rule-based model

University of Liverpool Repository