3,099 research outputs found

    An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

    Get PDF
    AbstractNowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability

    Attributes regrouping in Fuzzy Rule Based Classification Systems: an intra-classes approach

    Get PDF
    International audienceFuzzy rule-based classification systems (FRBCS) are able to build linguistic interpretable models, they automatically generate fuzzy if-then rules and use them to classify new observations. However, in these supervised learning systems, a high number of predictive attributes leads to an exponential increase of the number of generated rules. Moreover the antecedent conditions of the obtained rules are very large since they contain all the attributes that describe the examples. Therefore the accuracy of these systems as well as their interpretability degraded. To address this problem, we propose to use ensemble methods for FRBCS where the decisions of different classifiers are combined in order to form the final classification model. We are interested in particular in ensemble methods which split the attributes into subgroups and treat each subgroup separately. We propose to regroup attributes by correlation search among the training set elements that belongs to the same class, such an intra-classes correlation search allows to characterize each class separately. Several experiences were carried out on various data. The results show a reduction in the number of rules and of antecedents without altering accuracy, on the contrary classification rates are even improved

    Automatic synthesis of fuzzy systems: An evolutionary overview with a genetic programming perspective

    Get PDF
    Studies in Evolutionary Fuzzy Systems (EFSs) began in the 90s and have experienced a fast development since then, with applications to areas such as pattern recognition, curve‐fitting and regression, forecasting and control. An EFS results from the combination of a Fuzzy Inference System (FIS) with an Evolutionary Algorithm (EA). This relationship can be established for multiple purposes: fine‐tuning of FIS's parameters, selection of fuzzy rules, learning a rule base or membership functions from scratch, and so forth. Each facet of this relationship creates a strand in the literature, as membership function fine‐tuning, fuzzy rule‐based learning, and so forth and the purpose here is to outline some of what has been done in each aspect. Special focus is given to Genetic Programming‐based EFSs by providing a taxonomy of the main architectures available, as well as by pointing out the gaps that still prevail in the literature. The concluding remarks address some further topics of current research and trends, such as interpretability analysis, multiobjective optimization, and synthesis of a FIS through Evolving methods

    A HEDGE ALGEBRAS BASED CLASSIFICATION REASONING METHOD WITH MULTI-GRANULARITY FUZZY PARTITIONING

    Get PDF
    During last years, lots of the fuzzy rule based classifier (FRBC) design methods have been proposed to improve the classification accuracy and the interpretability of the proposed classification models. Most of them are based on the fuzzy set theory approach in such a way that the fuzzy classification rules are generated from the grid partitions combined with the pre-designed fuzzy partitions using fuzzy sets. Some mechanisms are studied to automatically generate fuzzy partitions from data such as discretization, granular computing, etc. Even those, linguistic terms are intuitively assigned to fuzzy sets because there is no formalisms to link inherent semantics of linguistic terms to fuzzy sets. In view of that trend, genetic design methods of linguistic terms along with their (triangular and trapezoidal) fuzzy sets based semantics for FRBCs, using hedge algebras as the mathematical formalism, have been proposed. Those hedge algebras-based design methods utilize semantically quantifying mapping values of linguistic terms to generate their fuzzy sets based semantics so as to make use of fuzzy sets based-classification reasoning methods proposed in design methods based on fuzzy set theoretic approach for data classification. If there exists a classification reasoning method which bases merely on semantic parameters of hedge algebras, fuzzy sets-based semantics of the linguistic terms in fuzzy classification rule bases can be replaced by semantics - based hedge algebras. This paper presents a FRBC design method based on hedge algebras approach by introducing a hedge algebra- based classification reasoning method with multi-granularity fuzzy partitioning for data classification so that the semantic of linguistic terms in rule bases can be hedge algebras-based semantics. Experimental results over 17 real world datasets are compared to existing methods based on hedge algebras and the state-of-the-art fuzzy sets theoretic-based approaches, showing that the proposed FRBC in this paper is an effective classifier and produces good results

    A Review of Classification Problems and Algorithms in Renewable Energy Applications

    Get PDF
    Classification problems and their corresponding solving approaches constitute one of the fields of machine learning. The application of classification schemes in Renewable Energy (RE) has gained significant attention in the last few years, contributing to the deployment, management and optimization of RE systems. The main objective of this paper is to review the most important classification algorithms applied to RE problems, including both classical and novel algorithms. The paper also provides a comprehensive literature review and discussion on different classification techniques in specific RE problems, including wind speed/power prediction, fault diagnosis in RE systems, power quality disturbance classification and other applications in alternative RE systems. In this way, the paper describes classification techniques and metrics applied to RE problems, thus being useful both for researchers dealing with this kind of problem and for practitioners of the field

    Nuevos retos en clasificación asociativa: Big Data y aplicaciones

    Get PDF
    La clasificación asociativa surge como resultado de la unión de dos importantes ámbitos del aprendizaje automático. Por un lado la tarea descriptiva de extracción de reglas de asociación, como mecanismo para obtener información previamente desconocida e interesante de un conjunto de datos, combinado con una tarea predictiva, como es la clasificación, que permite en base a un conjunto de variables explicativas y previamente conocidas realizar una predicción sobre una variable de interés o predictiva. Los objetivos de esta tesis doctoral son los siguientes: 1) El estudio y el análisis del estado del arte de tanto la extracción de reglas de asociación como de la clasificación asociativa; 2) La propuesta de nuevos modelos de clasificación asociativa así como de extracción de reglas de asociación teniendo en cuenta la obtención de modelos que sean precisos, interpretables, eficientes así como flexibles para poder introducir conocimiento subjetivo en éstos. 3) Adicionalmente, y dado la gran cantidad de datos que cada día se genera en las últimas décadas, se prestará especial atención al tratamiento de grandes cantidades datos, también conocido como Big Data. En primer lugar, se ha analizado el estado del arte tanto de clasificación asociativa como de la extracción de reglas de asociación. En este sentido, se ha realizado un estudio y análisis exhaustivo de la bibliografía de los trabajos relacionados para poder conocer con gran nivel de detalle el estado del arte. Como resultado, se ha permitido sentar las bases para la consecución de los demás objetivos así como detectar que dentro de la clasificación asociativa se requería de algún mecanismo que facilitara la unificación de comparativas así como que fueran lo más completas posibles. Para tal fin, se ha propuesto una herramienta de software que cuenta con al menos un algoritmo de todas las categorías que componen la taxonomía actual. Esto permitirá dentro de las investigaciones del área, realizar comparaciones más diversas y completas que hasta el momento se consideraba una tarea en el mejor de los casos muy ardua, al no estar disponibles muchos de los algoritmos en un formato ejecutable ni mucho menos como código abierto. Además, esta herramienta también dispone de un conjunto muy diverso de métricas que permite cuantificar la calidad de los resultados desde diferentes perspectivas. Esto permite conseguir clasificadores lo más completos posibles, así como para unificar futuras comparaciones con otras propuestas. En segundo lugar, y como resultado del análisis previo, se ha detectado que las propuestas actuales no permiten escalar, ni horizontalmente, ni verticalmente, las metodologías sobre conjuntos de datos relativamente grandes. Dado el creciente interés, tanto del mundo académico como del industrial, de aumentar la capacidad de cómputo a ingentes cantidades de datos, se ha considerado interesante continuar esta tesis doctoral realizando un análisis de diferentes propuestas sobre Big Data. Para tal fin, se ha comenzado realizando un análisis pormenorizado de los últimos avances para el tratamiento de tal cantidad de datos. En este respecto, se ha prestado especial atención a la computación distribuida ya que ha demostrado ser el único procedimiento que permite el tratamiento de grandes cantidades de datos sin la realización de técnicas de muestreo. En concreto, se ha prestado especial atención a las metodologías basadas en MapReduce que permite la descomposición de problemas complejos en fracciones divisibles y paralelizables, que posteriormente pueden ser agrupadas para obtener el resultado final. Como resultado de este objetivo se han propuesto diferentes algoritmos que permiten el tratamiento de grandes cantidades de datos, sin la pérdida de precisión ni interpretabilidad. Todos los algoritmos propuestos se han diseñado para que puedan funcionar sobre las implementaciones de código abierto más conocidas de MapReduce. En tercer y último lugar, se ha considerado interesante realizar una propuesta que mejore el estado del arte de la clasificación asociativa. Para tal fin, y dado que las reglas de asociación son la base y factores determinantes para los clasificadores asociativos, se ha comenzado realizando una nueva propuesta para la extracción de reglas de asociación. En este aspecto, se ha combinado el uso de los últimos avances en computación distribuida, como MapReduce, con los algoritmos evolutivos que han demostrado obtener excelentes resultados en el área. En particular, se ha hecho uso de programación genética gramatical por su flexibilidad para codificar las soluciones, así como introducir conocimiento subjetivo en el proceso de búsqueda a la vez que permiten aliviar los requisitos computacionales y de memoria. Este nuevo algoritmo, supone una mejora significativa de la extracción de reglas de asociación ya que ha demostrado obtener mejores resultados que las propuestas existentes sobre diferentes tipos de datos así como sobre diferentes métricas de interés, es decir, no sólo obtiene mejores resultados sobre Big Data, sino que se ha comparado en su versión secuencial con los algoritmos existentes. Una vez que se ha conseguido este algoritmo que permite extraer excelentes reglas de asociación, se ha adaptado para la obtención de reglas de asociación de clase así como para obtener un clasificador a partir de tales reglas. De nuevo, se ha hecho uso de programación genética gramatical para la obtención del clasificador de forma que se permite al usuario no sólo introducir conocimiento subjetivo en las propias formas de las reglas, sino también en la forma final del clasificador. Esta nueva propuesta también se ha comparado con los algoritmos existentes de clasificación asociativa forma secuencial para garantizar que consigue diferencias significativas respecto a éstos en términos de exactitud, interpretabilidad y eficiencia. Adicionalmente, también se ha comparado con otras propuestas específicas de Big Data demostrado obtener excelentes resultados a la vez que mantiene un compromiso entre los objetivos conflictivos de interpretabilidad, exactitud y eficiencia. Esta tesis doctoral se ha desarrollado bajo un entorno experimental apropiado, haciendo uso de diversos conjunto de datos incluyendo tanto datos de pequeña dimensionalidad como Big Data. Además, todos los conjuntos de datos usados están publicados libremente y conforman un conglomerado de diversas dimensionalidades, número de instancias y de clases. Todos los resultados obtenidos se han comparado con el estado de arte correspondiente, y se ha hecho uso de tests estadísticos no paramétricos para comprobar que las diferencias encontradas son significativas desde un punto de vista estadístico, y no son fruto del azar. Adicionalmente, todas las comparaciones realizadas consideran diferentes perspectivas, es decir, se ha analizado rendimiento, eficiencia, precisión así como interpretabilidad en cada uno de los estudios.This Doctoral Thesis aims at solving the challenging problem of associative classification and its application on very large datasets. First, associative classification state-of-art has been studied and analyzed, and a new tool covering the whole taxonomy of algorithms as well as providing many different measures has been proposed. The goal of this tool is two-fold: 1) unification of comparisons, since existing works compare with very different measures; 2) providing a unique tool which has at least one algorithm of each category forming the taxonomy. This tool is a very important advancement in the field, since until the moment the whole taxonomy has not been covered due to that many algorithms have not been released as open source nor they were available to be run. Second, AC has been analyzed on very large quantities of data. In this regard, many different platforms for distributed computing have been studied and different proposals have been developed on them. These proposals enable to deal with very large data in a efficient way scaling up the load on very different compute nodes. Third, as one of the most important part of the associative classification is to extract high quality rules, it has been proposed a novel grammar-guided genetic programming algorithm which enables to obtain interesting association rules with regard to different metrics and in different kinds of data, including truly Big Data datasets. This proposal has proved to obtain very good results in terms of both quality and interpretability, at the same time of providing a very flexible way of representing the solutions and enabling to introduce subjective knowledge in the search process. Then, a novel algorithm has been proposed for associative classification using a non-trivial adaptation of the aforementioned algorithm to obtain the rules forming the classifier. This methodology is also based on grammar-guided genetic programming enabling user not only to constrain the form of the rules, but the final form of the classifier. Results have proved that this algorithm obtains very accurate classifiers at the same time of maintaining a good level of interpretability. All the methodologies proposed along this Thesis has been evaluated using a proper experimental framework, using a varied set of datasets including both classical and Big Data dataset, and analyzing different metrics to quantify the quality of the algorithms with regard to different perspectives. Results have been compared with state-of-the-art and they have been verified by means of non-parametric statistical tests proving that the proposed methods overcome to existing approaches
    corecore