50,078 research outputs found

    Computing fuzzy rough approximations in large scale information systems

    Get PDF
    Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem

    Rough set and rule-based multicriteria decision aiding

    Get PDF
    The aim of multicriteria decision aiding is to give the decision maker a recommendation concerning a set of objects evaluated from multiple points of view called criteria. Since a rational decision maker acts with respect to his/her value system, in order to recommend the most-preferred decision, one must identify decision maker's preferences. In this paper, we focus on preference discovery from data concerning some past decisions of the decision maker. We consider the preference model in the form of a set of "if..., then..." decision rules discovered from the data by inductive learning. To structure the data prior to induction of rules, we use the Dominance-based Rough Set Approach (DRSA). DRSA is a methodology for reasoning about data, which handles ordinal evaluations of objects on considered criteria and monotonic relationships between these evaluations and the decision. We review applications of DRSA to a large variety of multicriteria decision problems

    Interval-valued analysis for discriminative gene selection and tissue sample classification using microarray data

    Get PDF
    AbstractAn important application of gene expression data is to classify samples in a variety of diagnostic fields. However, high dimensionality and a small number of noisy samples pose significant challenges to existing classification methods. Focused on the problems of overfitting and sensitivity to noise of the dataset in the classification of microarray data, we propose an interval-valued analysis method based on a rough set technique to select discriminative genes and to use these genes to classify tissue samples of microarray data. We first select a small subset of genes based on interval-valued rough set by considering the preference-ordered domains of the gene expression data, and then classify test samples into certain classes with a term of similar degree. Experiments show that the proposed method is able to reach high prediction accuracies with a small number of selected genes and its performance is robust to noise

    Multiple Relevant Feature Ensemble Selection Based on Multilayer Co-Evolutionary Consensus MapReduce

    Full text link
    IEEE Although feature selection for large data has been intensively investigated in data mining, machine learning, and pattern recognition, the challenges are not just to invent new algorithms to handle noisy and uncertain large data in applications, but rather to link the multiple relevant feature sources, structured, or unstructured, to develop an effective feature reduction method. In this paper, we propose a multiple relevant feature ensemble selection (MRFES) algorithm based on multilayer co-evolutionary consensus MapReduce (MCCM). We construct an effective MCCM model to handle feature ensemble selection of large-scale datasets with multiple relevant feature sources, and explore the unified consistency aggregation between the local solutions and global dominance solutions achieved by the co-evolutionary memeplexes, which participate in the cooperative feature ensemble selection process. This model attempts to reach a mutual decision agreement among co-evolutionary memeplexes, which calls for the need for mechanisms to detect some noncooperative co-evolutionary behaviors and achieve better Nash equilibrium resolutions. Extensive experimental comparative studies substantiate the effectiveness of MRFES to solve large-scale dataset problems with the complex noise and multiple relevant feature sources on some well-known benchmark datasets. The algorithm can greatly facilitate the selection of relevant feature subsets coming from the original feature space with better accuracy, efficiency, and interpretability. Moreover, we apply MRFES to human cerebral cortex-based classification prediction. Such successful applications are expected to significantly scale up classification prediction for large-scale and complex brain data in terms of efficiency and feasibility

    How do you say ‘hello’? Personality impressions from brief novel voices

    Get PDF
    On hearing a novel voice, listeners readily form personality impressions of that speaker. Accurate or not, these impressions are known to affect subsequent interactions; yet the underlying psychological and acoustical bases remain poorly understood. Furthermore, hitherto studies have focussed on extended speech as opposed to analysing the instantaneous impressions we obtain from first experience. In this paper, through a mass online rating experiment, 320 participants rated 64 sub-second vocal utterances of the word ‘hello’ on one of 10 personality traits. We show that: (1) personality judgements of brief utterances from unfamiliar speakers are consistent across listeners; (2) a two-dimensional ‘social voice space’ with axes mapping Valence (Trust, Likeability) and Dominance, each driven by differing combinations of vocal acoustics, adequately summarises ratings in both male and female voices; and (3) a positive combination of Valence and Dominance results in increased perceived male vocal Attractiveness, whereas perceived female vocal Attractiveness is largely controlled by increasing Valence. Results are discussed in relation to the rapid evaluation of personality and, in turn, the intent of others, as being driven by survival mechanisms via approach or avoidance behaviours. These findings provide empirical bases for predicting personality impressions from acoustical analyses of short utterances and for generating desired personality impressions in artificial voices

    Research of the Classification Model Based on Dominance Rough Set Approach for China Emergency Communication

    Get PDF
    Ensuring smooth communication and recovering damaged communication system quickly and efficiently are the key to the entire emergency response, command, control, and rescue during the whole accident. The classification of emergency communication level is the premise of emergency communication guarantee. So, we use dominance rough set approach (DRSA) to construct the classification model for the judgment of emergency communication in this paper. In this model, we propose a classification index system of emergency communication using the method of expert interview firstly and then use DRSA to complete data sample, reduct attribute, and extract the preference decision rules of the emergency communication classification. Finally, the recognition accuracy of this model is verified; the testing result proves the model proposed in this paper is valid
    • …
    corecore