11,701 research outputs found
Fuzzy rule-based classification systems for multi-class problems using binary decomposition strategies: on the influence of n-dimensional overlap functions in the fuzzy reasoning method
Multi-class classification problems appear in a broad variety of real-world problems, e.g., medicine, genomics, bioinformatics, or computer vision. In this context, decomposition strategies are useful to increase the classification performance of classifiers. For this reason, in a previous work we proposed to improve the performance of FARC-HD (Fuzzy Association Rule-based Classification model for High-Dimensional problems) fuzzy classifier using One-vs-One (OVO) and One-vs-All (OVA) decomposition strategies. As a result of an exhaustive experimental analysis, we concluded that even though the usage of decomposition strategies was worth to be considered, further improvements could be achieved by introducing n-dimensional overlap functions instead of the product t-norm in the Fuzzy Reasoning Method (FRM). In this way, we can improve confidences for the subsequent processing performed in both OVO and OVA.
In this paper, we want to conduct a broader study of the influence of the usage of n-dimensional overlap functions to model the conjunction in several Fuzzy Rule-Based Classification Systems (FRBCSs) in order to enhance their performance in multi-class classification problems applying decomposition techniques. To do so, we adapt the FRM of four well-known FRBCSs (CHI, SLAVE, FURIA, and FARC-HD itself). We will show that the benefits of the usage of n-dimensional overlap functions strongly depend on both the learning algorithm and the rule structure of each classifier, which explains why FARC-HD is the most suitable one for the usage of these functions.This work has been supported by the Spanish Ministry of Science and Technology under the project
TIN-2013-40765-P
Hyperbox based machine learning algorithms: A comprehensive survey
With the rapid development of digital information, the data volume generated
by humans and machines is growing exponentially. Along with this trend, machine
learning algorithms have been formed and evolved continuously to discover new
information and knowledge from different data sources. Learning algorithms
using hyperboxes as fundamental representational and building blocks are a
branch of machine learning methods. These algorithms have enormous potential
for high scalability and online adaptation of predictors built using hyperbox
data representations to the dynamically changing environments and streaming
data. This paper aims to give a comprehensive survey of literature on
hyperbox-based machine learning models. In general, according to the
architecture and characteristic features of the resulting models, the existing
hyperbox-based learning algorithms may be grouped into three major categories:
fuzzy min-max neural networks, hyperbox-based hybrid models, and other
algorithms based on hyperbox representations. Within each of these groups, this
paper shows a brief description of the structure of models, associated learning
algorithms, and an analysis of their advantages and drawbacks. Main
applications of these hyperbox-based models to the real-world problems are also
described in this paper. Finally, we discuss some open problems and identify
potential future research directions in this field.Comment: 7 figure
CFM-BD: a distributed rule induction algorithm for building Compact Fuzzy Models in Big Data classification problems
Interpretability has always been a major concern for fuzzy rule-based
classifiers. The usage of human-readable models allows them to explain the
reasoning behind their predictions and decisions. However, when it comes to Big
Data classification problems, fuzzy rule-based classifiers have not been able
to maintain the good trade-off between accuracy and interpretability that has
characterized these techniques in non-Big Data environments. The most accurate
methods build too complex models composed of a large number of rules and fuzzy
sets, while those approaches focusing on interpretability do not provide
state-of-the-art discrimination capabilities. In this paper, we propose a new
distributed learning algorithm named CFM-BD to construct accurate and compact
fuzzy rule-based classification systems for Big Data. This method has been
specifically designed from scratch for Big Data problems and does not adapt or
extend any existing algorithm. The proposed learning process consists of three
stages: 1) pre-processing based on the probability integral transform theorem;
2) rule induction inspired by CHI-BD and Apriori algorithms; 3) rule selection
by means of a global evolutionary optimization. We conducted a complete
empirical study to test the performance of our approach in terms of accuracy,
complexity, and runtime. The results obtained were compared and contrasted with
four state-of-the-art fuzzy classifiers for Big Data (FBDT, FMDT, Chi-Spark-RS,
and CHI-BD). According to this study, CFM-BD is able to provide competitive
discrimination capabilities using significantly simpler models composed of a
few rules of less than 3 antecedents, employing 5 linguistic labels for all
variables.Comment: Appears in IEEE Transactions on Fuzzy System
Neuro-Fuzzy Computing System with the Capacity of Implementation on Memristor-Crossbar and Optimization-Free Hardware Training
In this paper, first we present a new explanation for the relation between
logical circuits and artificial neural networks, logical circuits and fuzzy
logic, and artificial neural networks and fuzzy inference systems. Then, based
on these results, we propose a new neuro-fuzzy computing system which can
effectively be implemented on the memristor-crossbar structure. One important
feature of the proposed system is that its hardware can directly be trained
using the Hebbian learning rule and without the need to any optimization. The
system also has a very good capability to deal with huge number of input-out
training data without facing problems like overtraining.Comment: 16 pages, 11 images, submitted to IEEE Trans. on Fuzzy system
N-dimensional admissibly ordered interval-valued overlap functions and its influence in interval-valued fuzzy rule-based classification systems
Overlap functions are a type of aggregation functions that are not required to be associative, generally used to indicate the overlapping degree between two values. They have been successfully used as a conjunction operator in several practical problems, such as fuzzy rulebased classification systems (FRBCSs) and image processing. Some extensions of overlap functions were recently proposed, such as general overlap functions and, in the interval-valued context, n-dimensional interval-valued overlap functions. The latter allow them to be applied in n-dimensional problems with interval-valued inputs, like interval-valued classification problems, where one can apply interval-valued FRBCSs (IV-FRBCSs). In this case, the choice of an appropriate total order for intervals, like an admissible order, can play an important role. However, neither the relationship between the interval order and the n-dimensional interval-valued overlap function (which may or may not be increasing for that order) nor the impact of this relationship in the classification process have been studied in the literature. Moreover, there is not a clear preferred n-dimensional interval-valued overlap function to be applied in an IV-FRBCS. Hence, in this paper we: (i) present some new results on admissible orders, which allow us to introduce the concept of n-dimensional admissibly ordered interval-valued overlap functions, that is, n-dimensional interval-valued overlap functions that are increasing with respect to an admissible order; (ii) develop a width-preserving construction method for this kind of function, derived from an admissible order and an n-dimensional overlap function, discussing some of its features; (iii) analyze the behaviour of several combinations of admissible orders and n-dimensional (admissibly ordered) interval-valued overlap functions when applied in IV-FRBCSs. All in all, the contribution of this paper resides in pointing out the effect of admissible orders and n-dimensional admissibly ordered interval-valued overlap functions, both from a theoretical and applied points of view, the latter when considering classification problems
General overlap functions
As a generalization of bivariate overlap functions, which measure the degree of overlapping (intersection for non-crisp sets) of n different classes, in this paper we introduce the concept of general overlap functions. We characterize the class of general overlap functions and include some construction methods by means of different aggregation and bivariate overlap functions. Finally, we apply general overlap functions to define a new matching degree in a classification problem. We deduce that the global behavior of these functions is slightly better than some other methods in the literature.The work has been supported by the Research Services of the Universidad Publica de Navarra, the
research projects TIN2016-77356-P (AEI/FEDER, UE) and TIN2015-66471-P from the Government of
Spain and by the Brazilian National Counsel of Technological and Scientific Development CNPq (Proc.
233950/2014-1, 306970/2013-9, 307781/2016-0) and by Caixa and Fundación Caja Navarra of Spain
A proposal for tuning the α parameter in CαC-integrals for application in fuzzy rule-based classification systems
In this paper, we consider the concept of extended Choquet integral generalized by a copula, called CC-integral. In particular, we adopt a CC-integral that uses a copula defined by a parameter α, which behavior was tested in a previous work using different fixed values. In this contribution, we propose an extension of this method by learning the best value for the parameter α using a genetic algorithm. This new proposal is applied in the fuzzy reasoning method of fuzzy rule-based classification systems in such a way that, for each class, the most suitable value of the parameter α is obtained, which can lead to an improvement on the system's performance. In the experimental study, we test the performance of 4 different so called CαC-integrals, comparing the results obtained when using fixed values for the parameter α against the results provided by our new evolutionary approach. From the obtained results, it is possible to conclude that the genetic learning of the parameter α is statistically superior than the fixed one for two copulas. Moreover, in general, the accuracy achieved in test is superior than that of the fixed approach in all functions. We also compare the quality of this approach with related approaches, showing that the methodology proposed in this work provides competitive results. Therefore, we demonstrate that CαC-integrals with α learned genetically can be considered as a good alternative to be used in fuzzy rule-based classification systems.The authors would like to thank the Brazilian National Counsel of Technological and Scientific Development CNPq (Proc. 233950/2014-1, 481283/2013-7, 306970/ 2013-9, 307681/2012-2) and the Spanish Ministry of Science and Technology under project TIN2016-77356-P (AEI/FEDER, UE). G.P. Dimuro is also supported by Caixa and Fundación Caja Navarra of Spain
KPCA Spatio-temporal trajectory point cloud classifier for recognizing human actions in a CBVR system
We describe a content based video retrieval (CBVR) software system for
identifying specific locations of a human action within a full length film, and
retrieving similar video shots from a query. For this, we introduce the concept
of a trajectory point cloud for classifying unique actions, encoded in a
spatio-temporal covariant eigenspace, where each point is characterized by its
spatial location, local Frenet-Serret vector basis, time averaged curvature and
torsion and the mean osculating hyperplane. Since each action can be
distinguished by their unique trajectories within this space, the trajectory
point cloud is used to define an adaptive distance metric for classifying
queries against stored actions. Depending upon the distance to other
trajectories, the distance metric uses either large scale structure of the
trajectory point cloud, such as the mean distance between cloud centroids or
the difference in hyperplane orientation, or small structure such as the time
averaged curvature and torsion, to classify individual points in a fuzzy-KNN.
Our system can function in real-time and has an accuracy greater than 93% for
multiple action recognition within video repositories. We demonstrate the use
of our CBVR system in two situations: by locating specific frame positions of
trained actions in two full featured films, and video shot retrieval from a
database with a web search application
Aggregation and pre-aggregation functions in fuzzy rule-based classification systems
Una manera eficiente de tratar problemas de clasificación, entre otras, es el uso de Sistemas de Clasificación Basados en Reglas Difusas (SCBRDs). Estos sistemas están compuestos por dos componentes principales, la Base de Conocimiento (BC) y el Método de Razonamiento Difuso (MRD). El MRD es el método responsable de clasificar nuevos ejemplos utilizando la información almacenada en la BC. Un punto clave del MRD es la forma en la que se agrega la información proporcionada por las reglas difusas disparadas. Precisamente, la función de agregación es lo que diferencia a los dos MRDs más utilizados de la literatura especializada. El primero, llamado de Regla Ganadora (RG), tiene un comportamiento promedio, es decir, el resultado de la agregación está en el rango delimitado por el mínimo y el máximo de los valores a agregar y utiliza la mayor relación entre el nuevo ejemplo a clasificar y las reglas. El segundo, conocido como Combinación Aditiva (CA), es ampliamente utilizado por los algoritmos difusos más precisos de la actualidad y aplica una suma normalizada para agregar toda la información relacionada con el ejemplo. Sin embargo, este método no presenta un comportamiento promedio. En este trabajo de tesis, proponemos modificar la manera en la que se agrega la información en el MRD, aplicando generalizaciones de la integral Choquet. Para ello, desarrollamos nuevos conceptos teóricos en el campo de los operadores de agregación. En concreto, definiremos generalizaciones de la Choquet integral con y sin comportamientos promedio. Utilizamos estas generalizaciones en el MRD del clasificador FARC-HD, que es un SCBRD del estado del arte. A partir de los resultados obtenidos, demostramos que el nuevo MRD puede ser utilizado, de manera eficiente, para afrontar problemas de clasificación. Además, mostramos que los resultados son estadísticamente equivalentes, o incluso superiores, a los clasificadores difusos considerados como estado del arte.An effective way to cope with classification problems, among others, is by using Fuzzy Rule-Based Classification Systems (FRBCSs). These systems are composed by two main components, the Knowledge Base (KB) and the Fuzzy Reasoning Method (FRM). The FRM is responsible for performing the classification of new examples based on the information stored in the KB. A key point in the FRM is the way in which the information given by the fired fuzzy rules is aggregated. Precisely, the aggregation function is the component that differs the two most widely used FRMs in the specialized literature. The first one, known as Winning Rule (WR), applies the maximum as the aggregation function, which has an averaging behavior. This function is limited by the maximum and the minimum of the values to be aggregated and it uses the largest relationship between the new example to be classified and the fuzzy rules. The second one, known as Additive Combination (AC), is used by the most accurate algorithms nowadays and it applies the normalized sum to aggregate the information but, in this case, this aggregation operator has a non-averaging behavior. In this thesis, we intend to change the way that the information is aggregated in the FRM by applying generalizations of the Choquet integral. To do so, we have developed new theoretical concepts in the field of aggregation operators. These generalizations of the Choquet integral present both averaging and non-averaging behaviors. We use them in the FRM of FARC-HD, which is a state-of-the-art FRBCS. From the obtained results, we show that the new FRM can be used in an efficient way to deal with classification problems, taking into account that the results are statistically comparable, or even superior, to the state-of-the-art fuzzy classifiers.Programa de Doctorado en Ciencias y Tecnologías Industriales (RD 99/2011)Industria Zientzietako eta Teknologietako Doktoretza Programa (ED 99/2011
Towards Automation of Knowledge Understanding: An Approach for Probabilistic Generative Classifiers
After data selection, pre-processing, transformation, and feature extraction,
knowledge extraction is not the final step in a data mining process. It is then
necessary to understand this knowledge in order to apply it efficiently and
effectively. Up to now, there is a lack of appropriate techniques that support
this significant step. This is partly due to the fact that the assessment of
knowledge is often highly subjective, e.g., regarding aspects such as novelty
or usefulness. These aspects depend on the specific knowledge and requirements
of the data miner. There are, however, a number of aspects that are objective
and for which it is possible to provide appropriate measures. In this article
we focus on classification problems and use probabilistic generative
classifiers based on mixture density models that are quite common in data
mining applications. We define objective measures to assess the
informativeness, uniqueness, importance, discrimination, representativity,
uncertainty, and distinguishability of rules contained in these classifiers
numerically. These measures not only support a data miner in evaluating results
of a data mining process based on such classifiers. As we will see in
illustrative case studies, they may also be used to improve the data mining
process itself or to support the later application of the extracted knowledge.Comment: 29 pages with 9 figures and 4 tables. Currently under review for
Information Science
- …