313,160 research outputs found
Rough Set Approaches to the Problem of Supplier Selection
The data mining approach of rough set theory is being adopted to study the multi-index question of supplier’s evaluation and determination in order to reveal the determining rules hidden in the historical evaluative data. After introducing some basic notions of rough set theory, this paper uses a sample to tell the steps of the deducing process in detail, and figures out some satisfying rules of supplier’s determination and weights of various attribute’s indexes which have been compared to other methods after the calculation. All of these illustrate the method of rough set theory can be used in the area of supplier’s selection and solve them with great efficiency
Active Sample Selection Based Incremental Algorithm for Attribute Reduction with Rough Sets
Attribute reduction with rough sets is an effective technique for obtaining a compact and informative attribute set from a given dataset. However, traditional algorithms have no explicit provision for handling dynamic datasets where data present themselves in successive samples. Incremental algorithms for attribute reduction with rough sets have been recently introduced to handle dynamic datasets with large samples, though they have high complexity in time and space. To address the time/space complexity issue of the algorithms, this paper presents a novel incremental algorithm for attribute reduction with rough sets based on the adoption of an active sample selection process and an insight into the attribute reduction process. This algorithm first decides whether each incoming sample is useful with respect to the current dataset by the active sample selection process. A useless sample is discarded while a useful sample is selected to update a reduct. At the arrival of a useful sample, the attribute reduction process is then employed to guide how to add and/or delete attributes in the current reduct. The two processes thus constitute the theoretical framework of our algorithm. The proposed algorithm is finally experimentally shown to be efficient in time and space.This is a manuscript of the publication Yang, Yanyan, Degang Chen, and Hui Wang. "Active Sample Selection Based Incremental Algorithm for Attribute Reduction With Rough Sets." IEEE Transactions on Fuzzy Systems 25, no. 4 (2017): 825-838. DOI: 10.1109/TFUZZ.2016.2581186. Posted with permission.</p
A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining
Big data comes in various ways, types, shapes, forms and sizes. Indeed,
almost all areas of science, technology, medicine, public health, economics,
business, linguistics and social science are bombarded by ever increasing flows
of data begging to analyzed efficiently and effectively. In this paper, we
propose a rough idea of a possible taxonomy of big data, along with some of the
most commonly used tools for handling each particular category of bigness. The
dimensionality p of the input space and the sample size n are usually the main
ingredients in the characterization of data bigness. The specific statistical
machine learning technique used to handle a particular big data set will depend
on which category it falls in within the bigness taxonomy. Large p small n data
sets for instance require a different set of tools from the large n small p
variety. Among other tools, we discuss Preprocessing, Standardization,
Imputation, Projection, Regularization, Penalization, Compression, Reduction,
Selection, Kernelization, Hybridization, Parallelization, Aggregation,
Randomization, Replication, Sequentialization. Indeed, it is important to
emphasize right away that the so-called no free lunch theorem applies here, in
the sense that there is no universally superior method that outperforms all
other methods on all categories of bigness. It is also important to stress the
fact that simplicity in the sense of Ockham's razor non plurality principle of
parsimony tends to reign supreme when it comes to massive data. We conclude
with a comparison of the predictive performance of some of the most commonly
used methods on a few data sets.Comment: 18 pages, 2 figures 3 table
Scalable approximate FRNN-OWA classification
Fuzzy Rough Nearest Neighbour classification with Ordered Weighted Averaging operators (FRNN-OWA) is an algorithm that classifies unseen instances according to their membership in the fuzzy upper and lower approximations of the decision classes. Previous research has shown that the use of OWA operators increases the robustness of this model. However, calculating membership in an approximation requires a nearest neighbour search. In practice, the query time complexity of exact nearest neighbour search algorithms in more than a handful of dimensions is near-linear, which limits the scalability of FRNN-OWA. Therefore, we propose approximate FRNN-OWA, a modified model that calculates upper and lower approximations of decision classes using the approximate nearest neighbours returned by Hierarchical Navigable Small Worlds (HNSW), a recent approximative nearest neighbour search algorithm with logarithmic query time complexity at constant near-100% accuracy. We demonstrate that approximate FRNN-OWA is sufficiently robust to match the classification accuracy of exact FRNN-OWA while scaling much more efficiently. We test four parameter configurations of HNSW, and evaluate their performance by measuring classification accuracy and construction and query times for samples of various sizes from three large datasets. We find that with two of the parameter configurations, approximate FRNN-OWA achieves near-identical accuracy to exact FRNN-OWA for most sample sizes within query times that are up to several orders of magnitude faster
The probability of default in internal ratings based (IRB) models in Basel II: an application of the rough sets methodology
El nuevo Acuerdo de Capital de junio de 2004 (Basilea II) da cabida e incentiva la
implantación de modelos propios para la medición de los riesgos financieros en las
entidades de crédito. En el trabajo que presentamos nos centramos en los modelos internos
para la valoración del riesgo de crédito (IRB) y concretamente en la aproximación a uno de
sus componentes: la probabilidad de impago (PD).
Los métodos tradicionales usados para la modelización del riesgo de crédito, como son el
análisis discriminante y los modelos logit y probit, parten de una serie de restricciones
estadÃsticas. La metodologÃa rough sets se presenta como una alternativa a los métodos
estadÃsticos clásicos, salvando las limitaciones de estos.
En nuestro trabajo aplicamos la metodologÃa rought sets a una base de datos, compuesta
por 106 empresas, solicitantes de créditos, con el objeto de obtener aquellos ratios que
mejor discriminan entre empresas sanas y fallidas, asà como una serie de reglas de decisión
que ayudarán a detectar las operaciones potencialmente fallidas, como primer paso en la
modelización de la probabilidad de impago. Por último, enfrentamos los resultados obtenidos
con los alcanzados con el análisis discriminante clásico, para concluir que la metodologÃa de
los rough sets presenta mejores resultados de clasificación, en nuestro caso.The new Capital Accord of June 2004 (Basel II) opens the way for and encourages credit entities to implement
their own models for measuring financial risks. In the paper presented, we focus on the use of internal rating
based (IRB) models for the assessment of credit risk and specifically on the approach to one of their
components: probability of default (PD).
In our study we apply the rough sets methodology to a database composed of 106 companies, applicants for
credit, with the object of obtaining those ratios that discriminate best between healthy and bankrupt companies,
together with a series of decision rules that will help to detect the operations potentially in default, as a first step
in modelling the probability of default. Lastly, we compare the results obtained against those obtained using
classic discriminant análisis. We conclude that the rough sets methodology presents better risk classification
results.Junta de AndalucÃa P06-SEJ-0153
- …