122 research outputs found

    A Survey on Soft Subspace Clustering

    Full text link
    Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201

    New methods for discovering local behaviour in mixed databases

    Full text link
    Clustering techniques are widely used. There are many applications where it is desired to find automatically groups or hidden information in the data set. Finding a model of the system based in the integration of several local models is placed among other applications. Local model could have many structures; however, a linear structure is the most common one, due to its simplicity. This work aims at finding improvements in several fields, but all them will be applied to this finding of a set of local models in a database. On the one hand, a way of codifying the categorical information into numerical values has been designed, in order to apply a numerical algorithm to the whole data set. On the other hand, a cost index has been developed, which will be optimized globally, to find the parameters of the local clusters that best define the output of the process. Each of the techniques has been applied to several experiments and results show the improvements over the actual techniques.Barceló Rico, F. (2009). New methods for discovering local behaviour in mixed databases. http://hdl.handle.net/10251/12739Archivo delegad

    Contributions in computational intelligence with results in functional neuroimaging

    Get PDF
    This thesis applies computational intelligence methodologies to study functional brain images. It is a state-of-the-art application relative to unsupervised learning domain to functional neuroimaging. There are also contributions related to computational intelligence on topics relative to clustering validation and spatio-temporal clustering analysis. Speci_cally, there are the presentation of a new separation measure based on fuzzy sets theory to establish the validity of the fuzzy clustering outcomes and the presentation of a framework to approach the parcellation of functional neuroimages taking in account both spatial and temporal patterns. These contributions have been applied to neuroimages obtained with functional Magnetic Resonance Imaging, using both active and passive paradigm and using both in-house data and fMRI repository. The results obtained shown, globally, an improvement on the quality of the neuroimaging analysis using the methodological contributions proposed

    Neuro-Fuzzy Based Intelligent Approaches to Nonlinear System Identification and Forecasting

    Get PDF
    Nearly three decades back nonlinear system identification consisted of several ad-hoc approaches, which were restricted to a very limited class of systems. However, with the advent of the various soft computing methodologies like neural networks and the fuzzy logic combined with optimization techniques, a wider class of systems can be handled at present. Complex systems may be of diverse characteristics and nature. These systems may be linear or nonlinear, continuous or discrete, time varying or time invariant, static or dynamic, short term or long term, central or distributed, predictable or unpredictable, ill or well defined. Neurofuzzy hybrid modelling approaches have been developed as an ideal technique for utilising linguistic values and numerical data. This Thesis is focused on the development of advanced neurofuzzy modelling architectures and their application to real case studies. Three potential requirements have been identified as desirable characteristics for such design: A model needs to have minimum number of rules; a model needs to be generic acting either as Multi-Input-Single-Output (MISO) or Multi-Input-Multi-Output (MIMO) identification model; a model needs to have a versatile nonlinear membership function. Initially, a MIMO Adaptive Fuzzy Logic System (AFLS) model which incorporates a prototype defuzzification scheme, while utilising an efficient, compared to the Takagi–Sugeno–Kang (TSK) based systems, fuzzification layer has been developed for the detection of meat spoilage using Fourier transform infrared (FTIR) spectroscopy. The identification strategy involved not only the classification of beef fillet samples in their respective quality class (i.e. fresh, semi-fresh and spoiled), but also the simultaneous prediction of their associated microbiological population directly from FTIR spectra. In the case of AFLS, the number of memberships for each input variable was directly associated to the number of rules, hence, the “curse of dimensionality” problem was significantly reduced. Results confirmed the advantage of the proposed scheme against Adaptive Neurofuzzy Inference System (ANFIS), Multilayer Perceptron (MLP) and Partial Least Squares (PLS) techniques used in the same case study. In the case of MISO systems, the TSK based structure, has been utilized in many neurofuzzy systems, like ANFIS. At the next stage of research, an Adaptive Fuzzy Inference Neural Network (AFINN) has been developed for the monitoring the spoilage of minced beef utilising multispectral imaging information. This model, which follows the TSK structure, incorporates a clustering pre-processing stage for the definition of fuzzy rules, while its final fuzzy rule base is determined by competitive learning. In this specific case study, AFINN model was also able to predict for the first time in the literature, the beef’s temperature directly from imaging information. Results again proved the superiority of the adopted model. By extending the line of research and adopting specific design concepts from the previous case studies, the Asymmetric Gaussian Fuzzy Inference Neural Network (AGFINN) architecture has been developed. This architecture has been designed based on the above design principles. A clustering preprocessing scheme has been applied to minimise the number of fuzzy rules. AGFINN incorporates features from the AFLS concept, by having the same number of rules as well as fuzzy memberships. In spite of the extensive use of the standard symmetric Gaussian membership functions, AGFINN utilizes an asymmetric function acting as input linguistic node. Since the asymmetric Gaussian membership function’s variability and flexibility are higher than the traditional one, it can partition the input space more effectively. AGFINN can be built either as an MISO or as an MIMO system. In the MISO case, a TSK defuzzification scheme has been implemented, while two different learning algorithms have been implemented. AGFINN has been tested on real datasets related to electricity price forecasting for the ISO New England Power Distribution System. Its performance was compared against a number of alternative models, including ANFIS, AFLS, MLP and Wavelet Neural Network (WNN), and proved to be superior. The concept of asymmetric functions proved to be a valid hypothesis and certainly it can find application to other architectures, such as in Fuzzy Wavelet Neural Network models, by designing a suitable flexible wavelet membership function. AGFINN’s MIMO characteristics also make the proposed architecture suitable for a larger range of applications/problems

    Innovative Algorithms and Evaluation Methods for Biological Motif Finding

    Get PDF
    Biological motifs are defined as overly recurring sub-patterns in biological systems. Sequence motifs and network motifs are the examples of biological motifs. Due to the wide range of applications, many algorithms and computational tools have been developed for efficient search for biological motifs. Therefore, there are more computationally derived motifs than experimentally validated motifs, and how to validate the biological significance of the ‘candidate motifs’ becomes an important question. Some of sequence motifs are verified by their structural similarities or their functional roles in DNA or protein sequences, and stored in databases. However, biological role of network motifs is still invalidated and currently no databases exist for this purpose. In this thesis, we focus not only on the computational efficiency but also on the biological meanings of the motifs. We provide an efficient way to incorporate biological information with clustering analysis methods: For example, a sparse nonnegative matrix factorization (SNMF) method is used with Chou-Fasman parameters for the protein motif finding. Biological network motifs are searched by various clustering algorithms with Gene ontology (GO) information. Experimental results show that the algorithms perform better than existing algorithms by producing a larger number of high-quality of biological motifs. In addition, we apply biological network motifs for the discovery of essential proteins. Essential proteins are defined as a minimum set of proteins which are vital for development to a fertile adult and in a cellular life in an organism. We design a new centrality algorithm with biological network motifs, named MCGO, and score proteins in a protein-protein interaction (PPI) network to find essential proteins. MCGO is also combined with other centrality measures to predict essential proteins using machine learning techniques. We have three contributions to the study of biological motifs through this thesis; 1) Clustering analysis is efficiently used in this work and biological information is easily integrated with the analysis; 2) We focus more on the biological meanings of motifs by adding biological knowledge in the algorithms and by suggesting biologically related evaluation methods. 3) Biological network motifs are successfully applied to a practical application of prediction of essential proteins

    Validação de heterogeneidade estrutural em dados de Crio-ME por comitês de agrupadores

    Get PDF
    Orientadores: Fernando José Von Zuben, Rodrigo Villares PortugalDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Análise de Partículas Isoladas é uma técnica que permite o estudo da estrutura tridimensional de proteínas e outros complexos macromoleculares de interesse biológico. Seus dados primários consistem em imagens de microscopia eletrônica de transmissão de múltiplas cópias da molécula em orientações aleatórias. Tais imagens são bastante ruidosas devido à baixa dose de elétrons utilizada. Reconstruções 3D podem ser obtidas combinando-se muitas imagens de partículas em orientações similares e estimando seus ângulos relativos. Entretanto, estados conformacionais heterogêneos frequentemente coexistem na amostra, porque os complexos moleculares podem ser flexíveis e também interagir com outras partículas. Heterogeneidade representa um desafio na reconstrução de modelos 3D confiáveis e degrada a resolução dos mesmos. Entre os algoritmos mais populares usados para classificação estrutural estão o agrupamento por k-médias, agrupamento hierárquico, mapas autoorganizáveis e estimadores de máxima verossimilhança. Tais abordagens estão geralmente entrelaçadas à reconstrução dos modelos 3D. No entanto, trabalhos recentes indicam ser possível inferir informações a respeito da estrutura das moléculas diretamente do conjunto de projeções 2D. Dentre estas descobertas, está a relação entre a variabilidade estrutural e manifolds em um espaço de atributos multidimensional. Esta dissertação investiga se um comitê de algoritmos de não-supervisionados é capaz de separar tais "manifolds conformacionais". Métodos de "consenso" tendem a fornecer classificação mais precisa e podem alcançar performance satisfatória em uma ampla gama de conjuntos de dados, se comparados a algoritmos individuais. Nós investigamos o comportamento de seis algoritmos de agrupamento, tanto individualmente quanto combinados em comitês, para a tarefa de classificação de heterogeneidade conformacional. A abordagem proposta foi testada em conjuntos sintéticos e reais contendo misturas de imagens de projeção da proteína Mm-cpn nos estados "aberto" e "fechado". Demonstra-se que comitês de agrupadores podem fornecer informações úteis na validação de particionamentos estruturais independetemente de algoritmos de reconstrução 3DAbstract: Single Particle Analysis is a technique that allows the study of the three-dimensional structure of proteins and other macromolecular assemblies of biological interest. Its primary data consists of transmission electron microscopy images from multiple copies of the molecule in random orientations. Such images are very noisy due to the low electron dose employed. Reconstruction of the macromolecule can be obtained by averaging many images of particles in similar orientations and estimating their relative angles. However, heterogeneous conformational states often co-exist in the sample, because the molecular complexes can be flexible and may also interact with other particles. Heterogeneity poses a challenge to the reconstruction of reliable 3D models and degrades their resolution. Among the most popular algorithms used for structural classification are k-means clustering, hierarchical clustering, self-organizing maps and maximum-likelihood estimators. Such approaches are usually interlaced with the reconstructions of the 3D models. Nevertheless, recent works indicate that it is possible to infer information about the structure of the molecules directly from the dataset of 2D projections. Among these findings is the relationship between structural variability and manifolds in a multidimensional feature space. This dissertation investigates whether an ensemble of unsupervised classification algorithms is able to separate these "conformational manifolds". Ensemble or "consensus" methods tend to provide more accurate classification and may achieve satisfactory performance across a wide range of datasets, when compared with individual algorithms. We investigate the behavior of six clustering algorithms both individually and combined in ensembles for the task of structural heterogeneity classification. The approach was tested on synthetic and real datasets containing a mixture of images from the Mm-cpn chaperonin in the "open" and "closed" states. It is shown that cluster ensembles can provide useful information in validating the structural partitionings independently of 3D reconstruction methodsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric

    A survey of the application of soft computing to investment and financial trading

    Get PDF

    Big data analytics for preventive medicine

    Get PDF
    © 2019, Springer-Verlag London Ltd., part of Springer Nature. Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations

    Big Earth Data and Machine Learning for Sustainable and Resilient Agriculture

    Full text link
    Big streams of Earth images from satellites or other platforms (e.g., drones and mobile phones) are becoming increasingly available at low or no cost and with enhanced spatial and temporal resolution. This thesis recognizes the unprecedented opportunities offered by the high quality and open access Earth observation data of our times and introduces novel machine learning and big data methods to properly exploit them towards developing applications for sustainable and resilient agriculture. The thesis addresses three distinct thematic areas, i.e., the monitoring of the Common Agricultural Policy (CAP), the monitoring of food security and applications for smart and resilient agriculture. The methodological innovations of the developments related to the three thematic areas address the following issues: i) the processing of big Earth Observation (EO) data, ii) the scarcity of annotated data for machine learning model training and iii) the gap between machine learning outputs and actionable advice. This thesis demonstrated how big data technologies such as data cubes, distributed learning, linked open data and semantic enrichment can be used to exploit the data deluge and extract knowledge to address real user needs. Furthermore, this thesis argues for the importance of semi-supervised and unsupervised machine learning models that circumvent the ever-present challenge of scarce annotations and thus allow for model generalization in space and time. Specifically, it is shown how merely few ground truth data are needed to generate high quality crop type maps and crop phenology estimations. Finally, this thesis argues there is considerable distance in value between model inferences and decision making in real-world scenarios and thereby showcases the power of causal and interpretable machine learning in bridging this gap.Comment: Phd thesi
    corecore