313,160 research outputs found

    Rough Set Approaches to the Problem of Supplier Selection

    Get PDF
    The data mining approach of rough set theory is being adopted to study the multi-index question of supplier’s evaluation and determination in order to reveal the determining rules hidden in the historical evaluative data. After introducing some basic notions of rough set theory, this paper uses a sample to tell the steps of the deducing process in detail, and figures out some satisfying rules of supplier’s determination and weights of various attribute’s indexes which have been compared to other methods after the calculation. All of these illustrate the method of rough set theory can be used in the area of supplier’s selection and solve them with great efficiency

    Active Sample Selection Based Incremental Algorithm for Attribute Reduction with Rough Sets

    Get PDF
    Attribute reduction with rough sets is an effective technique for obtaining a compact and informative attribute set from a given dataset. However, traditional algorithms have no explicit provision for handling dynamic datasets where data present themselves in successive samples. Incremental algorithms for attribute reduction with rough sets have been recently introduced to handle dynamic datasets with large samples, though they have high complexity in time and space. To address the time/space complexity issue of the algorithms, this paper presents a novel incremental algorithm for attribute reduction with rough sets based on the adoption of an active sample selection process and an insight into the attribute reduction process. This algorithm first decides whether each incoming sample is useful with respect to the current dataset by the active sample selection process. A useless sample is discarded while a useful sample is selected to update a reduct. At the arrival of a useful sample, the attribute reduction process is then employed to guide how to add and/or delete attributes in the current reduct. The two processes thus constitute the theoretical framework of our algorithm. The proposed algorithm is finally experimentally shown to be efficient in time and space.This is a manuscript of the publication Yang, Yanyan, Degang Chen, and Hui Wang. "Active Sample Selection Based Incremental Algorithm for Attribute Reduction With Rough Sets." IEEE Transactions on Fuzzy Systems 25, no. 4 (2017): 825-838. DOI: 10.1109/TFUZZ.2016.2581186. Posted with permission.</p

    A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining

    Full text link
    Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham's razor non plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.Comment: 18 pages, 2 figures 3 table

    Scalable approximate FRNN-OWA classification

    Get PDF
    Fuzzy Rough Nearest Neighbour classification with Ordered Weighted Averaging operators (FRNN-OWA) is an algorithm that classifies unseen instances according to their membership in the fuzzy upper and lower approximations of the decision classes. Previous research has shown that the use of OWA operators increases the robustness of this model. However, calculating membership in an approximation requires a nearest neighbour search. In practice, the query time complexity of exact nearest neighbour search algorithms in more than a handful of dimensions is near-linear, which limits the scalability of FRNN-OWA. Therefore, we propose approximate FRNN-OWA, a modified model that calculates upper and lower approximations of decision classes using the approximate nearest neighbours returned by Hierarchical Navigable Small Worlds (HNSW), a recent approximative nearest neighbour search algorithm with logarithmic query time complexity at constant near-100% accuracy. We demonstrate that approximate FRNN-OWA is sufficiently robust to match the classification accuracy of exact FRNN-OWA while scaling much more efficiently. We test four parameter configurations of HNSW, and evaluate their performance by measuring classification accuracy and construction and query times for samples of various sizes from three large datasets. We find that with two of the parameter configurations, approximate FRNN-OWA achieves near-identical accuracy to exact FRNN-OWA for most sample sizes within query times that are up to several orders of magnitude faster

    The probability of default in internal ratings based (IRB) models in Basel II: an application of the rough sets methodology

    Get PDF
    El nuevo Acuerdo de Capital de junio de 2004 (Basilea II) da cabida e incentiva la implantación de modelos propios para la medición de los riesgos financieros en las entidades de crédito. En el trabajo que presentamos nos centramos en los modelos internos para la valoración del riesgo de crédito (IRB) y concretamente en la aproximación a uno de sus componentes: la probabilidad de impago (PD). Los métodos tradicionales usados para la modelización del riesgo de crédito, como son el análisis discriminante y los modelos logit y probit, parten de una serie de restricciones estadísticas. La metodología rough sets se presenta como una alternativa a los métodos estadísticos clásicos, salvando las limitaciones de estos. En nuestro trabajo aplicamos la metodología rought sets a una base de datos, compuesta por 106 empresas, solicitantes de créditos, con el objeto de obtener aquellos ratios que mejor discriminan entre empresas sanas y fallidas, así como una serie de reglas de decisión que ayudarán a detectar las operaciones potencialmente fallidas, como primer paso en la modelización de la probabilidad de impago. Por último, enfrentamos los resultados obtenidos con los alcanzados con el análisis discriminante clásico, para concluir que la metodología de los rough sets presenta mejores resultados de clasificación, en nuestro caso.The new Capital Accord of June 2004 (Basel II) opens the way for and encourages credit entities to implement their own models for measuring financial risks. In the paper presented, we focus on the use of internal rating based (IRB) models for the assessment of credit risk and specifically on the approach to one of their components: probability of default (PD). In our study we apply the rough sets methodology to a database composed of 106 companies, applicants for credit, with the object of obtaining those ratios that discriminate best between healthy and bankrupt companies, together with a series of decision rules that will help to detect the operations potentially in default, as a first step in modelling the probability of default. Lastly, we compare the results obtained against those obtained using classic discriminant análisis. We conclude that the rough sets methodology presents better risk classification results.Junta de Andalucía P06-SEJ-0153
    • …
    corecore