13 research outputs found

    Dominance-based Rough Set Approach, basic ideas and main trends

    Full text link
    Dominance-based Rough Approach (DRSA) has been proposed as a machine learning and knowledge discovery methodology to handle Multiple Criteria Decision Aiding (MCDA). Due to its capacity of asking the decision maker (DM) for simple preference information and supplying easily understandable and explainable recommendations, DRSA gained much interest during the years and it is now one of the most appreciated MCDA approaches. In fact, it has been applied also beyond MCDA domain, as a general knowledge discovery and data mining methodology for the analysis of monotonic (and also non-monotonic) data. In this contribution, we recall the basic principles and the main concepts of DRSA, with a general overview of its developments and software. We present also a historical reconstruction of the genesis of the methodology, with a specific focus on the contribution of Roman S{\l}owi\'nski.Comment: This research was partially supported by TAILOR, a project funded by European Union (EU) Horizon 2020 research and innovation programme under GA No 952215. This submission is a preprint of a book chapter accepted by Springer, with very few minor differences of just technical natur

    Topic identification using filtering and rule generation algorithm for textual document

    Get PDF
    Information stored digitally in text documents are seldom arranged according to specific topics. The necessity to read whole documents is time-consuming and decreases the interest for searching information. Most existing topic identification methods depend on occurrence of terms in the text. However, not all frequent occurrence terms are relevant. The term extraction phase in topic identification method has resulted in extracted terms that might have similar meaning which is known as synonymy problem. Filtering and rule generation algorithms are introduced in this study to identify topic in textual documents. The proposed filtering algorithm (PFA) will extract the most relevant terms from text and solve synonym roblem amongst the extracted terms. The rule generation algorithm (TopId) is proposed to identify topic for each verse based on the extracted terms. The PFA will process and filter each sentence based on nouns and predefined keywords to produce suitable terms for the topic. Rules are then generated from the extracted terms using the rule-based classifier. An experimental design was performed on 224 English translated Quran verses which are related to female issues. Topics identified by both TopId and Rough Set technique were compared and later verified by experts. PFA has successfully extracted more relevant terms compared to other filtering techniques. TopId has identified topics that are closer to the topics from experts with an accuracy of 70%. The proposed algorithms were able to extract relevant terms without losing important terms and identify topic in the verse

    Decision rules construction : algorithm based on EAV model

    Get PDF
    In the paper, an approach for decision rules construction is proposed. It is studied from the point of view of the supervised machine learning task, i.e., classification, and from the point of view of knowledge representation. Generated rules provide comparable classification results to the dynamic programming approach for optimization of decision rules relative to length or support. However, the proposed algorithm is based on transformation of decision table into entity– attribute–value (EAV) format. Additionally, standard deviation function for computation of averages’ values of attributes in particular decision classes was introduced. It allows to select from the whole set of attributes only these which provide the highest degree of information about the decision. Construction of decision rules is performed based on idea of partitioning of a decision table into corresponding subtables. In opposite to dynamic programming approach, not all attributes need to be taken into account but only these with the highest values of standard deviation per decision classes. Consequently, the proposed solution is more time efficient because of lower computational complexity. In the framework of experimental results, support and length of decision rules were computed and compared with the values of optimal rules. The classification error for data sets from UCI Machine Learning Repository was also obtained and compared with the ones for dynamic programming approach. Performed experiments show that constructed rules are not far from the optimal ones and classification results are comparable to these obtained in the framework of the dynamic programming extension

    Decision rules derived from optimal decision trees with hypotheses

    Get PDF
    Conventional decision trees use queries each of which is based on one attribute. In this study, we also examine decision trees that handle additional queries based on hypotheses. This kind of query is similar to the equivalence queries considered in exact learning. Earlier, we designed dynamic programming algorithms for the computation of the minimum depth and the minimum number of internal nodes in decision trees that have hypotheses. Modification of these algorithms considered in the present paper permits us to build decision trees with hypotheses that are optimal relative to the depth or relative to the number of the internal nodes. We compare the length and coverage of decision rules extracted from optimal decision trees with hypotheses and decision rules extracted from optimal conventional decision trees to choose the ones that are preferable as a tool for the representation of information. To this end, we conduct computer experiments on various decision tables from the UCI Machine Learning Repository. In addition, we also consider decision tables for randomly generated Boolean functions. The collected results show that the decision rules derived from decision trees with hypotheses in many cases are better than the rules extracted from conventional decision trees

    Machine-learned models using hematological inflammation markers in the prediction of short-term acute coronary syndrome outcomes.

    Get PDF
    BACKGROUND: Increased systemic and local inflammation play a vital role in the pathophysiology of acute coronary syndrome. This study aimed to assess the usefulness of selected machine learning methods and hematological markers of inflammation in predicting short-term outcomes of acute coronary syndrome (ACS). METHODS: We analyzed the predictive importance of laboratory and clinical features in 6769 hospitalizations of patients with ACS. Two binary classifications were considered: significant coronary lesion (SCL) or lack of SCL, and in-hospital death or survival. SCL was observed in 73% of patients. In-hospital mortality was observed in 1.4% of patients and it was higher in the case of patients with SCL. Ensembles of decision trees and decision rule models were trained to predict these classifications. RESULTS: The best performing model for in-hospital mortality was based on the dominance-based rough set approach and the full set of laboratory as well as clinical features. This model achieved 81 ± 2.4% sensitivity and 81.1 ± 0.5% specificity in the detection of in-hospital mortality. The models trained for SCL performed considerably worse. The best performing model for detecting SCL achieved 56.9 ± 0.2% sensitivity and 66.9 ± 0.2% specificity. Dominance rough set approach classifier operating on the full set of clinical and laboratory features identifies presence or absence of diabetes, systolic and diastolic blood pressure and prothrombin time as having the highest confirmation measures (best predictive value) in the detection of in-hospital mortality. When we used the limited set of variables, neutrophil count, age, systolic and diastolic pressure and heart rate (taken at admission) achieved the high feature importance scores (provided by the gradient boosted trees classifier) as well as the positive confirmation measures (provided by the dominance-based rough set approach classifier). CONCLUSIONS: Machine learned models can rely on the association between the elevated inflammatory markers and the short-term ACS outcomes to provide accurate predictions. Moreover, such models can help assess the usefulness of laboratory and clinical features in predicting the in-hospital mortality of ACS patients

    Combining DRSA decision-rules with FCA-based DANP evaluation for financial performance improvements

    Get PDF
    This study proposes a combined method to integrate soft computing techniques and multiple criteria decision making (MCDM) methods to guide semiconductor companies to improve financial performance (FP) – based on logical reasoning. The complex and imprecise patterns of FP changes are explored by dominance-based rough set approach (DRSA) to find decision rules associated with FP changes. Companies may identify its underperformed criterion (gap) to conduct formal concept analysis (FCA) – by implication rules – to explore the source criteria regarding the underperformed gap. The source criteria are analysed by decision making trial and evaluation laboratory (DEMATEL) technique to explore the cause-effect relationship among the source criteria for guiding improvements; in the next, DEMATEL-based analytical network process (DANP) can provide the influential weights to form an evaluation model, to select or rank improvement plans. To illustrate the proposed method, the financial data of a real semiconductor company is used as an example to show the involved processes: from performance gaps identification to the selection of five assumed improvement plans. Moreover, the obtained implication rules can integrate with DEMATEL analysis to explore directional influences among the critical criteria, which may provide rich insights and managerial implications in practice. First published online: 17 Sep 201

    Impact et facteurs clés de l'introduction d'équipements miniers innovants : le cas d'une mine souterraine

    Get PDF
    Les entreprises minières naviguent dans un environnement économique cyclique influencé par les prix du marché. À cela, s'ajoute une pression sociale accrue au niveau des conditions de travail et de la sécurité des travailleurs. C'est donc à un contexte hautement concurrentiel que les entreprises de ce secteur sont confrontées. Afin de demeurer compétitives, l'une des solutions qu'elles privilégient est l'acquisition d'équipements innovants. Toutefois, l'introduction d'équipements innovants ne se fait pas sans heurts. Plusieurs études ont en effet démontré que l'arrivée de nouveaux équipements plus gros, plus puissants et plus sophistiqués a également entraîné des effets négatifs. Parmi ceux-ci notons, les périodes d'adaptation plus longues que prévues. Mais encore, ces équipements sont aussi en cause dans bon nombre d'accidents et de décès, et ce, tant à l'échelle internationale que chez les mines québécoises. Devant ces constats de succès mitigés, il appert fondamental de mieux comprendre les facteurs de succès lors de l'implantation d'équipements miniers innovants. Dans cette thèse nous proposons l'étude approfondie de ce sujet par une étude de cas réalisée dans une mine aurifère souterraine témiscabitibienne. Dans un premier temps, notre démarche vise à mesurer l'impact de dix projets innovants sur des indicateurs de performance en productivité et en santé et sécurité du travail (SST). Dans un deuxième temps, nous proposons l'utilisation d'un outil d'aide à la décision, l'approche des ensembles approximatifs basés sur la dominance, afin d'identifier les facteurs clés favorisant l'implantation de ces équipements innovants. Parmi les résultats obtenus, deux facteurs ont été identifiés comme les plus pertinents sur l'ensemble des indicateurs de performance étudiés, soit le niveau d'habileté requis pour maîtriser la technologie et le niveau d'acceptation de cette dernière par les opérateurs. En plus de ces deux facteurs, la qualité du siège et l'expérience des opérateurs ont également été identifiées comme pertinentes pour expliquer les résultats en SST, alors que le niveau de standardisation du nouvel équipement s'est montré pertinent pour expliquer ceux en productivité. Nos travaux permettent ainsi à notre partenaire industriel de cibler et de prioriser ses besoins pour que l'implantation d'équipements innovants entraîne dorénavant une amélioration de la performance en productivité et en SST. Bien que nos résultats proviennent et se limitent à une étude de cas, l'approche innovante et rigoureuse que nous proposons à la communauté scientifique et industrielle peut être mise en application à chaque entreprise minière souterraine désirant identifier ses propres facteurs de succès inhérents à son propre environnement. D'autres limites et perspectives offrent des pistes de recherches potentielles sur lesquelles se conclue notre thèse. À ce titre, nous proposons des indicateurs de performance supplémentaires, tels que le nombre de tonnes transportées par les camions et le taux de sévérité des blessures. De plus, une étude similaire, mais prenant en considération les accidents touchant les employés d'entrepreneurs miniers, de même que les accidents survenus lors de réparation ou de maintenance, ajouterait des connaissances complémentaires et intéressantes sur le sujet développé dans cette thèse

    The multiple pheromone Ant clustering algorithm

    Get PDF
    Ant Colony Optimisation algorithms mimic the way ants use pheromones for marking paths to important locations. Pheromone traces are followed and reinforced by other ants, but also evaporate over time. As a consequence, optimal paths attract more pheromone, whilst the less useful paths fade away. In the Multiple Pheromone Ant Clustering Algorithm (MPACA), ants detect features of objects represented as nodes within graph space. Each node has one or more ants assigned to each feature. Ants attempt to locate nodes with matching feature values, depositing pheromone traces on the way. This use of multiple pheromone values is a key innovation. Ants record other ant encounters, keeping a record of the features and colony membership of ants. The recorded values determine when ants should combine their features to look for conjunctions and whether they should merge into colonies. This ability to detect and deposit pheromone representative of feature combinations, and the resulting colony formation, renders the algorithm a powerful clustering tool. The MPACA operates as follows: (i) initially each node has ants assigned to each feature; (ii) ants roam the graph space searching for nodes with matching features; (iii) when departing matching nodes, ants deposit pheromones to inform other ants that the path goes to a node with the associated feature values; (iv) ant feature encounters are counted each time an ant arrives at a node; (v) if the feature encounters exceed a threshold value, feature combination occurs; (vi) a similar mechanism is used for colony merging. The model varies from traditional ACO in that: (i) a modified pheromone-driven movement mechanism is used; (ii) ants learn feature combinations and deposit multiple pheromone scents accordingly; (iii) ants merge into colonies, the basis of cluster formation. The MPACA is evaluated over synthetic and real-world datasets and its performance compares favourably with alternative approaches
    corecore