1,851 research outputs found

    Weighting Policies for Robust Unsupervised Ensemble Learning

    Get PDF
    The unsupervised ensemble learning, or consensus clustering, consists of finding the optimal com- bination strategy of individual partitions that is robust in comparison to the selection of an algorithmic clustering pool. Despite its strong properties, this approach assigns the same weight to the contribution of each clustering to the final solution. We propose a weighting policy for this problem that is based on internal clustering quality measures and compare against other modern approaches. Results on publicly available datasets show that weights can significantly improve the accuracy performance while retaining the robust properties. Since the issue of determining an appropriate number of clusters, which is a primary input for many clustering methods is one of the significant challenges, we have used the same methodology to predict correct or the most suitable number of clusters as well. Among various methods, using internal validity indexes in conjunction with a suitable algorithm is one of the most popular way to determine the appropriate number of cluster. Thus, we use weighted consensus clustering along with four different indexes which are Silhouette (SH), Calinski-Harabasz (CH), Davies-Bouldin (DB), and Consensus (CI) indexes. Our experiment indicates that weighted consensus clustering together with chosen indexes is a useful method to determine right or the most appropriate number of clusters in comparison to individual clustering methods (e.g., k-means) and consensus clustering. Lastly, to decrease the variance of proposed weighted consensus clustering, we borrow the idea of Markowitz portfolio theory and implement its core idea to clustering domain. We aim to optimize the combination of individual clustering methods to minimize the variance of clustering accuracy. This is a new weighting policy to produce partition with a lower variance which might be crucial for a decision maker. Our study shows that using the idea of Markowitz portfolio theory will create a partition with a less variation in comparison to traditional consensus clustering and proposed weighted consensus clustering

    Meta-optimizations for Cluster Analysis

    Get PDF
    This dissertation thesis deals with advances in the automation of cluster analysis.This dissertation thesis deals with advances in the automation of cluster analysis

    Hybrid Dy-NFIS & RLS equalization for ZCC code in optical-CDMA over multi-mode optical fiber

    Get PDF
    For long haul coherent optical fiber communication systems, it is significant to precisely monitor the quality of transmission links and optical signals. The channel capacity beyond Shannon limit of Single-mode optical fiber (SMOF) is achieved with the help of Multi-mode optical fiber (MMOF), where the signal is multiplexed in different spatial modes. To increase single-mode transmission capacity and to avoid a foreseen “capacity crunch”, researchers have been motivated to employ MMOF as an alternative. Furthermore, different multiplexing techniques could be applied in MMOF to improve the communication system. One of these techniques is the Optical Code Division Multiple Access (Optical-CDMA), which simplifies and decentralizes network controls to improve spectral efficiency and information security increasing flexibility in bandwidth granularity. This technique also allows synchronous and simultaneous transmission medium to be shared by many users. However, during the propagation of the data over the MMOF based on Optical-CDMA, an inevitable encountered issue is pulse dispersion, nonlinearity and MAI due to mode coupling. Moreover, pulse dispersion, nonlinearity and MAI are significant aspects for the evaluation of the performance of high-speed MMOF communication systems based on Optical-CDMA. This work suggests a hybrid algorithm based on nonlinear algorithm (Dynamic evolving neural fuzzy inference (Dy-NFIS)) and linear algorithm (Recursive least squares (RLS)) equalization for ZCC code in Optical-CDMA over MMOF. Root mean squared error (RMSE), mean squared error (MSE) and Structural Similarity index (SSIM) are used to measure performance results

    Novel applications of spectroscopy to characterize soil variation

    Get PDF
    This thesis embodies a collection of novel studies related to the use of multivariate information provided by spectroscopic tools such as Visible and Near Infrared (Vis-NIR) spectrometers to represent soil variation. The general structure is organized following the increasing levels of soil complexity, starting from the characterization of soil aggregates and the identification of soil colloids, to the recognition of soil horizons and their boundaries in the soil profile, to finally the depiction of soil type’s distribution in the landscape. Briefly, Chapter 1 is written as a rationale, emphasising the need for up-to-date methodologies for making effective use of the increasing amount of soil information produced worldwide. Chapter 2 presents the development of a new methodology for the measure of soil aggregate stability and the further use of spectroscopic information to predict its values. Chapter 3 gives examples of the use of Vis-NIR spectral libraries for the prediction of soil properties. Chapter 4 presents the development of a new method for the identification of soil horizons and their boundaries using fuzzy clustering of Vis-NIR spectra. Chapter 5 expands into a new way of measuring the diversity of soils into the landscape, introducing two new indices for measuring soil diversity or “Functional Pedodiversity” inspired in previous studies in Functional Ecology. Finally Chapter 6 discusses the main findings of this thesis and foresees issues, challenges and opportunities in the area of spectroscopy and multivariate soil data analysis

    LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles

    Get PDF
    Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.

    Measuring, Modeling, and Evaluating the Spatial Properties of Northeast Oregon Forests Using Unmanned Aerial Systems

    Get PDF
    There is an ever expanding range of applications for the aerial images that unmanned aerial systems can uniquely provide. One such application is the use of high-resolution imagery for stand-level forest inventory. Inventory techniques utilizing unmanned aerial systems could be attractive where conditions demand high-resolution data, or where other aerial imagery sources are cost prohibitive. Here the effectiveness of unmanned aerial systems in this application was tested. Over the summer of 2015, a remote-controlled hexacopter equipped with a micro four thirds camera was flown over multiple 1600 meter-squared forested plots in Eastern Oregon. Additional ground-level validation measurements were collected including stem location, crown radius, and tree height. Agisoft Photoscan was used to construct 3-D point-clouds which then allowed the production of digital surface models of the stands. The first section of this project assesses the accuracy of stem locations derived from segmented imagery. The next section evaluates the accuracy of estimates for tree height, crown radius, and diameter at breast height. In the final section, various spatial metrics such as stand contagion and species mingling were compared with more commonly used metrics to see if significant correlations emerged. The utilized methods did not yield sufficiently accurate estimates for stem location or the various forest biometrics. Yet this work revealed stand density to be a significant influence on model accuracy. Finally, stand density and species diversity were found to be well correlated with the nearest neighbor and species mingling indexes, respectively, potentially supporting a complementary relationship indicating the clustering of various factors within the stand

    Biomedical Image Processing and Classification

    Get PDF
    Biomedical image processing is an interdisciplinary field involving a variety of disciplines, e.g., electronics, computer science, physics, mathematics, physiology, and medicine. Several imaging techniques have been developed, providing many approaches to the study of the human body. Biomedical image processing is finding an increasing number of important applications in, for example, the study of the internal structure or function of an organ and the diagnosis or treatment of a disease. If associated with classification methods, it can support the development of computer-aided diagnosis (CAD) systems, which could help medical doctors in refining their clinical picture

    Machine Learning and Data Mining Applications in Power Systems

    Get PDF
    This Special Issue was intended as a forum to advance research and apply machine-learning and data-mining methods to facilitate the development of modern electric power systems, grids and devices, and smart grids and protection devices, as well as to develop tools for more accurate and efficient power system analysis. Conventional signal processing is no longer adequate to extract all the relevant information from distorted signals through filtering, estimation, and detection to facilitate decision-making and control actions. Machine learning algorithms, optimization techniques and efficient numerical algorithms, distributed signal processing, machine learning, data-mining statistical signal detection, and estimation may help to solve contemporary challenges in modern power systems. The increased use of digital information and control technology can improve the grid’s reliability, security, and efficiency; the dynamic optimization of grid operations; demand response; the incorporation of demand-side resources and integration of energy-efficient resources; distribution automation; and the integration of smart appliances and consumer devices. Signal processing offers the tools needed to convert measurement data to information, and to transform information into actionable intelligence. This Special Issue includes fifteen articles, authored by international research teams from several countries

    Validação de heterogeneidade estrutural em dados de Crio-ME por comitês de agrupadores

    Get PDF
    Orientadores: Fernando José Von Zuben, Rodrigo Villares PortugalDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Análise de Partículas Isoladas é uma técnica que permite o estudo da estrutura tridimensional de proteínas e outros complexos macromoleculares de interesse biológico. Seus dados primários consistem em imagens de microscopia eletrônica de transmissão de múltiplas cópias da molécula em orientações aleatórias. Tais imagens são bastante ruidosas devido à baixa dose de elétrons utilizada. Reconstruções 3D podem ser obtidas combinando-se muitas imagens de partículas em orientações similares e estimando seus ângulos relativos. Entretanto, estados conformacionais heterogêneos frequentemente coexistem na amostra, porque os complexos moleculares podem ser flexíveis e também interagir com outras partículas. Heterogeneidade representa um desafio na reconstrução de modelos 3D confiáveis e degrada a resolução dos mesmos. Entre os algoritmos mais populares usados para classificação estrutural estão o agrupamento por k-médias, agrupamento hierárquico, mapas autoorganizáveis e estimadores de máxima verossimilhança. Tais abordagens estão geralmente entrelaçadas à reconstrução dos modelos 3D. No entanto, trabalhos recentes indicam ser possível inferir informações a respeito da estrutura das moléculas diretamente do conjunto de projeções 2D. Dentre estas descobertas, está a relação entre a variabilidade estrutural e manifolds em um espaço de atributos multidimensional. Esta dissertação investiga se um comitê de algoritmos de não-supervisionados é capaz de separar tais "manifolds conformacionais". Métodos de "consenso" tendem a fornecer classificação mais precisa e podem alcançar performance satisfatória em uma ampla gama de conjuntos de dados, se comparados a algoritmos individuais. Nós investigamos o comportamento de seis algoritmos de agrupamento, tanto individualmente quanto combinados em comitês, para a tarefa de classificação de heterogeneidade conformacional. A abordagem proposta foi testada em conjuntos sintéticos e reais contendo misturas de imagens de projeção da proteína Mm-cpn nos estados "aberto" e "fechado". Demonstra-se que comitês de agrupadores podem fornecer informações úteis na validação de particionamentos estruturais independetemente de algoritmos de reconstrução 3DAbstract: Single Particle Analysis is a technique that allows the study of the three-dimensional structure of proteins and other macromolecular assemblies of biological interest. Its primary data consists of transmission electron microscopy images from multiple copies of the molecule in random orientations. Such images are very noisy due to the low electron dose employed. Reconstruction of the macromolecule can be obtained by averaging many images of particles in similar orientations and estimating their relative angles. However, heterogeneous conformational states often co-exist in the sample, because the molecular complexes can be flexible and may also interact with other particles. Heterogeneity poses a challenge to the reconstruction of reliable 3D models and degrades their resolution. Among the most popular algorithms used for structural classification are k-means clustering, hierarchical clustering, self-organizing maps and maximum-likelihood estimators. Such approaches are usually interlaced with the reconstructions of the 3D models. Nevertheless, recent works indicate that it is possible to infer information about the structure of the molecules directly from the dataset of 2D projections. Among these findings is the relationship between structural variability and manifolds in a multidimensional feature space. This dissertation investigates whether an ensemble of unsupervised classification algorithms is able to separate these "conformational manifolds". Ensemble or "consensus" methods tend to provide more accurate classification and may achieve satisfactory performance across a wide range of datasets, when compared with individual algorithms. We investigate the behavior of six clustering algorithms both individually and combined in ensembles for the task of structural heterogeneity classification. The approach was tested on synthetic and real datasets containing a mixture of images from the Mm-cpn chaperonin in the "open" and "closed" states. It is shown that cluster ensembles can provide useful information in validating the structural partitionings independently of 3D reconstruction methodsMestradoEngenharia de ComputaçãoMestre em Engenharia Elétric
    corecore