2,331 research outputs found

    One-class classifiers based on entropic spanning graphs

    Get PDF
    One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the α\alpha-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

    PAC: A Novel Self-Adaptive Neuro-Fuzzy Controller for Micro Aerial Vehicles

    Full text link
    There exists an increasing demand for a flexible and computationally efficient controller for micro aerial vehicles (MAVs) due to a high degree of environmental perturbations. In this work, an evolving neuro-fuzzy controller, namely Parsimonious Controller (PAC) is proposed. It features fewer network parameters than conventional approaches due to the absence of rule premise parameters. PAC is built upon a recently developed evolving neuro-fuzzy system known as parsimonious learning machine (PALM) and adopts new rule growing and pruning modules derived from the approximation of bias and variance. These rule adaptation methods have no reliance on user-defined thresholds, thereby increasing the PAC's autonomy for real-time deployment. PAC adapts the consequent parameters with the sliding mode control (SMC) theory in the single-pass fashion. The boundedness and convergence of the closed-loop control system's tracking error and the controller's consequent parameters are confirmed by utilizing the LaSalle-Yoshizawa theorem. Lastly, the controller's efficacy is evaluated by observing various trajectory tracking performance from a bio-inspired flapping-wing micro aerial vehicle (BI-FWMAV) and a rotary wing micro aerial vehicle called hexacopter. Furthermore, it is compared to three distinctive controllers. Our PAC outperforms the linear PID controller and feed-forward neural network (FFNN) based nonlinear adaptive controller. Compared to its predecessor, G-controller, the tracking accuracy is comparable, but the PAC incurs significantly fewer parameters to attain similar or better performance than the G-controller.Comment: This paper has been accepted for publication in Information Science Journal 201

    Fuzzy spectral clustering methods for textual data

    Get PDF
    Nowadays, the development of advanced information technologies has determined an increase in the production of textual data. This inevitable growth accentuates the need to advance in the identification of new methods and tools able to efficiently analyse such kind of data. Against this background, unsupervised classification techniques can play a key role in this process since most of this data is not classified. Document clustering, which is used for identifying a partition of clusters in a corpus of documents, has proven to perform efficiently in the analyses of textual documents and it has been extensively applied in different fields, from topic modelling to information retrieval tasks. Recently, spectral clustering methods have gained success in the field of text classification. These methods have gained popularity due to their solid theoretical foundations which do not require any specific assumption on the global structure of the data. However, even though they prove to perform well in text classification problems, little has been done in the field of clustering. Moreover, depending on the type of documents analysed, it might be often the case that textual documents do not contain only information related to a single topic: indeed, there might be an overlap of contents characterizing different knowledge domains. Consequently, documents may contain information that is relevant to different areas of interest to some degree. The first part of this work critically analyses the main clustering algorithms used for text data, involving also the mathematical representation of documents and the pre-processing phase. Then, three novel fuzzy versions of spectral clustering algorithms for text data are introduced. The first one exploits the use of fuzzy K-medoids instead of K-means. The second one derives directly from the first one but is used in combination with Kernel and Set Similarity (KS2M), which takes into account the Jaccard index. Finally, in the third one, in order to enhance the clustering performance, a new similarity measure S∗ is proposed. This last one exploits the inherent sequential nature of text data by means of a weighted combination between the Spectrum string kernel function and a measure of set similarity. The second part of the thesis focuses on spectral bi-clustering algorithms for text mining tasks, which represent an interesting and partially unexplored field of research. In particular, two novel versions of fuzzy spectral bi-clustering algorithms are introduced. The two algorithms differ from each other for the approach followed in the identification of the document and the word partitions. Indeed, the first one follows a simultaneous approach while the second one a sequential approach. This difference leads also to a diversification in the choice of the number of clusters. The adequacy of all the proposed fuzzy (bi-)clustering methods is evaluated by experiments performed on both real and benchmark data sets

    A Survey on Soft Subspace Clustering

    Full text link
    Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201

    Laser ablation aerosol particle time-of-flight mass spectrometer (LAAPTOF): performance, reference spectra and classification of atmospheric samples

    Get PDF
    The laser ablation aerosol particle time-of-flight mass spectrometer (LAAPTOF, AeroMegt GmbH) is able to identify the chemical composition and mixing state of individual aerosol particles, and thus is a tool for elucidating their impacts on human health, visibility, ecosystem, and climate. The overall detection efficiency (ODE) of the instrument we use was determined to range from  ∼ (0.01±0.01) to  ∼ (4.23±2.36)% for polystyrene latex (PSL) in the size range of 200 to 2000nm,  ∼ (0.44±0.19) to  ∼ (6.57±2.38)% for ammonium nitrate (NH4NO3), and  ∼ (0.14±0.02) to  ∼ (1.46±0.08)% for sodium chloride (NaCl) particles in the size range of 300 to 1000nm. Reference mass spectra of 32 different particle types relevant for atmospheric aerosol (e.g. pure compounds NH4NO3, K2SO4, NaCl, oxalic acid, pinic acid, and pinonic acid; internal mixtures of e.g. salts, secondary organic aerosol, and metallic core–organic shell particles; more complex particles such as soot and dust particles) were determined. Our results show that internally mixed aerosol particles can result in spectra with new clusters of ions, rather than simply a combination of the spectra from the single components. An exemplary 1-day ambient data set was analysed by both classical fuzzy clustering and a reference-spectra-based classification method. Resulting identified particle types were generally well correlated. We show how a combination of both methods can greatly improve the interpretation of single-particle data in field measurements

    Model of early stage intermediate in respect to its final structure

    Get PDF
    The model, describing a method of determining the structure of an early intermediate in the process of protein folding to analyze nonredundant PDB protein bases, allows determining the relationship between the sequence of tetrapeptides and their structural forms expressed by structural codes. The contingency table expressing such a relationship can be used to predict the structure of polypeptides by proposing a structural form with a precision limited to the structural code. However, by analyzing structural forms in native forms of proteins based on the fuzzy oil drop model, one can also determine the status of polypeptide chain fragments with respect to the assumptions of this model. Whether the probability distributions for both compliant and noncompliant forms were similar or whether the tetrapeptide sequences showed some differences at a level of a set of structural codes was investigated. The analysis presented here indicated that some sequences in both forms revealed differences in probability distributions expressed as a negative statistically significant correlation coefficient. This meant that the identified sections (tetrapeptides) took different forms against the fuzzy oil drop model. It may suggest that the information of the final status with respect to hydrophobic core formation is already carried by the structure of the early-stage intermediate
    corecore