89 research outputs found

    Iterative Information Granulation for Novelty Detection in Complex Datasets

    Get PDF
    Recognition memory in a number of mammals is usually utilised to identify novel objects that violate model predictions. In humans in particular, the recognition of novel objects is foremost associated to their ability to group objects that are highly compatible/similar. Granular computing not only mimics the human cognition to draw objects together but also mimics the ability to capture associated properties by similarity, proximity or functionality. In this paper, an iterative information granulation approach is presented, for the problem of novelty detection in complex data. Two granular compatibility measures are used, based on principles of Granular Computing, namely the multidimensional distance between the granules, as well as the granular density and volume. A two-stage iterative information granulation is proposed in this work. In the first stage, a predefined number of granular detectors are constructed. The granular detectors capture the relationships (rules) between the input-output data and then use this information in a second granulation stage in order to discriminate new samples as novel. The proposed iterative information granulation approach for novelty detection is then applied to three different benchmark problems in pattern recognition demonstrating very good performance

    Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

    Get PDF
    Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

    Sistemas granulares evolutivos

    Get PDF
    Orientador: Fernando Antonio Campos GomideTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia ElĂ©trica e de ComputaçãoResumo: Recentemente tem-se observado um crescente interesse em abordagens de modelagem computacional para lidar com fluxos de dados do mundo real. MĂ©todos e algoritmos tĂȘm sido propostos para obtenção de conhecimento a partir de conjuntos de dados muito grandes e, a princĂ­pio, sem valor aparente. Este trabalho apresenta uma plataforma computacional para modelagem granular evolutiva de fluxos de dados incertos. Sistemas granulares evolutivos abrangem uma variedade de abordagens para modelagem on-line inspiradas na forma com que os humanos lidam com a complexidade. Esses sistemas exploram o fluxo de informação em ambiente dinĂąmico e extrai disso modelos que podem ser linguisticamente entendidos. Particularmente, a granulação da informação Ă© uma tĂ©cnica natural para dispensar atenção a detalhes desnecessĂĄrios e enfatizar transparĂȘncia, interpretabilidade e escalabilidade de sistemas de informação. Dados incertos (granulares) surgem a partir de percepçÔes ou descriçÔes imprecisas do valor de uma variĂĄvel. De maneira geral, vĂĄrios fatores podem afetar a escolha da representação dos dados tal que o objeto representativo reflita o significado do conceito que ele estĂĄ sendo usado para representar. Neste trabalho sĂŁo considerados dados numĂ©ricos, intervalares e fuzzy; e modelos intervalares, fuzzy e neuro-fuzzy. A aprendizagem de sistemas granulares Ă© baseada em algoritmos incrementais que constroem a estrutura do modelo sem conhecimento anterior sobre o processo e adapta os parĂąmetros do modelo sempre que necessĂĄrio. Este paradigma de aprendizagem Ă© particularmente importante uma vez que ele evita a reconstrução e o retreinamento do modelo quando o ambiente muda. Exemplos de aplicação em classificação, aproximação de função, predição de sĂ©ries temporais e controle usando dados sintĂ©ticos e reais ilustram a utilidade das abordagens de modelagem granular propostas. O comportamento de fluxos de dados nĂŁo-estacionĂĄrios com mudanças graduais e abruptas de regime Ă© tambĂ©m analisado dentro do paradigma de computação granular evolutiva. Realçamos o papel da computação intervalar, fuzzy e neuro-fuzzy em processar dados incertos e prover soluçÔes aproximadas de alta qualidade e sumĂĄrio de regras de conjuntos de dados de entrada e saĂ­da. As abordagens e o paradigma introduzidos constituem uma extensĂŁo natural de sistemas inteligentes evolutivos para processamento de dados numĂ©ricos a sistemas granulares evolutivos para processamento de dados granularesAbstract: In recent years there has been increasing interest in computational modeling approaches to deal with real-world data streams. Methods and algorithms have been proposed to uncover meaningful knowledge from very large (often unbounded) data sets in principle with no apparent value. This thesis introduces a framework for evolving granular modeling of uncertain data streams. Evolving granular systems comprise an array of online modeling approaches inspired by the way in which humans deal with complexity. These systems explore the information flow in dynamic environments and derive from it models that can be linguistically understood. Particularly, information granulation is a natural technique to dispense unnecessary details and emphasize transparency, interpretability and scalability of information systems. Uncertain (granular) data arise from imprecise perception or description of the value of a variable. Broadly stated, various factors can affect one's choice of data representation such that the representing object conveys the meaning of the concept it is being used to represent. Of particular concern to this work are numerical, interval, and fuzzy types of granular data; and interval, fuzzy, and neurofuzzy modeling frameworks. Learning in evolving granular systems is based on incremental algorithms that build model structure from scratch on a per-sample basis and adapt model parameters whenever necessary. This learning paradigm is meaningful once it avoids redesigning and retraining models all along if the system changes. Application examples in classification, function approximation, time-series prediction and control using real and synthetic data illustrate the usefulness of the granular approaches and framework proposed. The behavior of nonstationary data streams with gradual and abrupt regime shifts is also analyzed in the realm of evolving granular computing. We shed light upon the role of interval, fuzzy, and neurofuzzy computing in processing uncertain data and providing high-quality approximate solutions and rule summary of input-output data sets. The approaches and framework introduced constitute a natural extension of evolving intelligent systems over numeric data streams to evolving granular systems over granular data streamsDoutoradoAutomaçãoDoutor em Engenharia ElĂ©tric

    Clustering of nonstationary data streams: a survey of fuzzy partitional methods

    Get PDF
    YesData streams have arisen as a relevant research topic during the past decade. They are real‐time, incremental in nature, temporally ordered, massive, contain outliers, and the objects in a data stream may evolve over time (concept drift). Clustering is often one of the earliest and most important steps in the streaming data analysis workflow. A comprehensive literature is available about stream data clustering; however, less attention is devoted to the fuzzy clustering approach, even though the nonstationary nature of many data streams makes it especially appealing. This survey discusses relevant data stream clustering algorithms focusing mainly on fuzzy methods, including their treatment of outliers and concept drift and shift.Ministero dell‘Istruzione, dell‘Universitá e della Ricerca

    Data-stream driven Fuzzy-granular approaches for system maintenance

    Get PDF
    Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability

    A domain transformation approach for addressing staff scheduling problems

    Get PDF
    Staff scheduling is a complex combinatorial optimisation problem concerning allocation of staff to duty rosters in a wide range of industries and settings. This thesis presents a novel approach to solving staff scheduling problems, and in particular nurse scheduling, by simplifying the problem space through information granulation. The complexity of the problem is due to a large solution space and the many constraints that need to be satisfied. Published research indicates that methods based on random searches of the solution space did not produce good-quality results consistently. In this study, we have avoided random searching and proposed a systematic hierarchical method of granulation of the problem domain through pre-processing of constraints. The approach is general and can be applied to a wide range of staff scheduling problems. The novel approach proposed here involves a simplification of the original problem by a judicious grouping of shift types and a grouping of individual shifts into weekly sequences. The schedule construction is done systematically, while assuring its feasibility and minimising the cost of the solution in the reduced problem space of weekly sequences. Subsequently, the schedules from the reduced problem space are translated into the original problem space by taking into account the constraints that could not be represented in the reduced space. This two-stage approach to solving the scheduling problem is referred to here as a domain-transformation approach. The thesis reports computational results on both standard benchmark problems and a specific scheduling problem from Kajang Hospital in Malaysia. The results confirm that the proposed method delivers high-quality results consistently and is computationally efficient

    Combining heterogeneous classifiers via granular prototypes.

    Get PDF
    In this study, a novel framework to combine multiple classifiers in an ensemble system is introduced. Here we exploit the concept of information granule to construct granular prototypes for each class on the outputs of an ensemble of base classifiers. In the proposed method, uncertainty in the outputs of the base classifiers on training observations is captured by an interval-based representation. To predict the class label for a new observation, we first determine the distances between the output of the base classifiers for this observation and the class prototypes, then the predicted class label is obtained by choosing the label associated with the shortest distance. In the experimental study, we combine several learning algorithms to build the ensemble system and conduct experiments on the UCI, colon cancer, and selected CLEF2009 datasets. The experimental results demonstrate that the proposed framework outperforms several benchmarked algorithms including two trainable combining methods, i.e., Decision Template and Two Stages Ensemble System, AdaBoost, Random Forest, L2-loss Linear Support Vector Machine, and Decision Tree

    A domain transformation approach for addressing staff scheduling problems

    Get PDF
    Staff scheduling is a complex combinatorial optimisation problem concerning allocation of staff to duty rosters in a wide range of industries and settings. This thesis presents a novel approach to solving staff scheduling problems, and in particular nurse scheduling, by simplifying the problem space through information granulation. The complexity of the problem is due to a large solution space and the many constraints that need to be satisfied. Published research indicates that methods based on random searches of the solution space did not produce good-quality results consistently. In this study, we have avoided random searching and proposed a systematic hierarchical method of granulation of the problem domain through pre-processing of constraints. The approach is general and can be applied to a wide range of staff scheduling problems. The novel approach proposed here involves a simplification of the original problem by a judicious grouping of shift types and a grouping of individual shifts into weekly sequences. The schedule construction is done systematically, while assuring its feasibility and minimising the cost of the solution in the reduced problem space of weekly sequences. Subsequently, the schedules from the reduced problem space are translated into the original problem space by taking into account the constraints that could not be represented in the reduced space. This two-stage approach to solving the scheduling problem is referred to here as a domain-transformation approach. The thesis reports computational results on both standard benchmark problems and a specific scheduling problem from Kajang Hospital in Malaysia. The results confirm that the proposed method delivers high-quality results consistently and is computationally efficient

    General type-2 radial basis function neural network: a data-driven fuzzy model

    Get PDF
    This paper proposes a new General Type-2 Radial Basis Function Neural Network (GT2-RBFNN) that is functionally equivalent to a GT2 Fuzzy Logic System (FLS) of either Takagi-Sugeno-Kang (TSK) or Mamdani type. The neural structure of the GT2-RBFNN is based on the alpha-planes representation, in which the antecedent and consequent part of each fuzzy rule uses GT2 Fuzzy Sets (FSs). To reduce the iterative nature of the Karnik-Mendel algorithm, the Enhaned-Karnik-Mendel (EKM) type-reduction and three popular direct-defuzzification methods, namely the 1) Nie-Tan approach (NT), the 2) Wu-Mendel uncertain bounds method (WU) and the 3) Biglarbegian-Melek-Mendel algorithm (BMM) are employed. For that reason, this paper provides four different neural structures of the GT2-RBFNN and their structural and parametric optimisation. Such optimisation is a two-stage methodology that first implements an Iterative Information Granulation approach to estimate the antecedent parameters of each fuzzy rule. Secondly, each consequent part and the fuzzy rule base of the GT2-RBFNN is trained and optimised using an Adaptive Gradient Descent method (AGD) respectively. Several benchmark data sets, including a problem of identification of a nonlinear system and a chaotic time series are considered. The reported comparative analysis of experimental results is used to evaluate the performance of the suggested GT2 RBFNN with respect to other popular methodologies
    • 

    corecore