925 research outputs found

    OctaSOM - An octagonal based SOM lattice structure for biomedical problems

    Get PDF
    In this study, an octagonal-based self-organizing network’s lattice structure is proposed to allow more exploration and exploitation in updating the weights for better mapping and classification performances.The neighborhood of the octagonal-based lattice structure provides more nodes for the weights updating than standard hexagonal-based lattice structure. Based on our experiment, the octagonal-based lattice structure performance is better than standard hexagonal lattice structure on biomedical datasets for classification problem. This indicates that proposed algorithm is an alternative lattice structure for self-organizing network which give more wisdom to classification problems especially in the biomedical domains

    Projection-Based Clustering through Self-Organization and Swarm Intelligence

    Get PDF
    It covers aspects of unsupervised machine learning used for knowledge discovery in data science and introduces a data-driven approach to cluster analysis, the Databionic swarm (DBS). DBS consists of the 3D landscape visualization and clustering of data. The 3D landscape enables 3D printing of high-dimensional data structures. The clustering and number of clusters or an absence of cluster structure are verified by the 3D landscape at a glance. DBS is the first swarm-based technique that shows emergent properties while exploiting concepts of swarm intelligence, self-organization and the Nash equilibrium concept from game theory. It results in the elimination of a global objective function and the setting of parameters. By downloading the R package DBS can be applied to data drawn from diverse research fields and used even by non-professionals in the field of data mining

    Some Clustering Methods, Algorithms and their Applications

    Get PDF
    Clustering is a type of unsupervised learning [15]. When no target values are known, or "supervisors," in an unsupervised learning task, the purpose is to produce training data from the inputs themselves. Data mining and machine learning would be useless without clustering. If you utilize it to categorize your datasets according to their similarities, you'll be able to predict user behavior more accurately. The purpose of this research is to compare and contrast three widely-used data-clustering methods. Clustering techniques include partitioning, hierarchy, density, grid, and fuzzy clustering. Machine learning, data mining, pattern recognition, image analysis, and bioinformatics are just a few of the many fields where clustering is utilized as an analytical technique. In addition to defining the various algorithms, specialized forms of cluster analysis, linking methods, and please offer a review of the clustering techniques used in the big data setting

    Projection-Based Clustering through Self-Organization and Swarm Intelligence: Combining Cluster Analysis with the Visualization of High-Dimensional Data

    Get PDF
    Cluster Analysis; Dimensionality Reduction; Swarm Intelligence; Visualization; Unsupervised Machine Learning; Data Science; Knowledge Discovery; 3D Printing; Self-Organization; Emergence; Game Theory; Advanced Analytics; High-Dimensional Data; Multivariate Data; Analysis of Structured Dat

    Swarm-Organized Topographic Mapping

    Get PDF
    Topographieerhaltende Abbildungen versuchen, hochdimensionale oder komplexe Datenbestände auf einen niederdimensionalen Ausgaberaum abzubilden, wobei die Topographie der Daten hinreichend gut wiedergegeben werden soll. Die Qualität solcher Abbildung hängt gewöhnlich vom eingesetzten Nachbarschaftskonzept des konstruierenden Algorithmus ab. Die Schwarm-Organisierte Projektion ermöglicht eine Lösung dieses Parametrisierungsproblems durch die Verwendung von Techniken der Schwarmintelligenz. Die praktische Verwendbarkeit dieser Methodik wurde durch zwei Anwendungen auf dem Feld der Molekularbiologie sowie der Finanzanalytik demonstriert

    Advanced and novel modeling techniques for simulation, optimization and monitoring chemical engineering tasks with refinery and petrochemical unit applications

    Get PDF
    Engineers predict, optimize, and monitor processes to improve safety and profitability. Models automate these tasks and determine precise solutions. This research studies and applies advanced and novel modeling techniques to automate and aid engineering decision-making. Advancements in computational ability have improved modeling software’s ability to mimic industrial problems. Simulations are increasingly used to explore new operating regimes and design new processes. In this work, we present a methodology for creating structured mathematical models, useful tips to simplify models, and a novel repair method to improve convergence by populating quality initial conditions for the simulation’s solver. A crude oil refinery application is presented including simulation, simplification tips, and the repair strategy implementation. A crude oil scheduling problem is also presented which can be integrated with production unit models. Recently, stochastic global optimization (SGO) has shown to have success of finding global optima to complex nonlinear processes. When performing SGO on simulations, model convergence can become an issue. The computational load can be decreased by 1) simplifying the model and 2) finding a synergy between the model solver repair strategy and optimization routine by using the initial conditions formulated as points to perturb the neighborhood being searched. Here, a simplifying technique to merging the crude oil scheduling problem and the vertically integrated online refinery production optimization is demonstrated. To optimize the refinery production a stochastic global optimization technique is employed. Process monitoring has been vastly enhanced through a data-driven modeling technique Principle Component Analysis. As opposed to first-principle models, which make assumptions about the structure of the model describing the process, data-driven techniques make no assumptions about the underlying relationships. Data-driven techniques search for a projection that displays data into a space easier to analyze. Feature extraction techniques, commonly dimensionality reduction techniques, have been explored fervidly to better capture nonlinear relationships. These techniques can extend data-driven modeling’s process-monitoring use to nonlinear processes. Here, we employ a novel nonlinear process-monitoring scheme, which utilizes Self-Organizing Maps. The novel techniques and implementation methodology are applied and implemented to a publically studied Tennessee Eastman Process and an industrial polymerization unit

    Cluster analysis for outlier detection : A case study of applying unsupervised machine learning on diesel engine data

    Get PDF
    With the advent of modern data driven methods, engine manufacturers and maintainers are attempting to pivot from corrective to predictive maintenance. One way to achieve this goal is to install sensors on the engine and look for anomalies in the data patterns it produces. Companies such as Wärtsilä that provide condition monitoring services use the Fast Fourier Transform to manually look for anomalies in the data. The Edge-project is an industrial research project involving institutions such as universities and private companies, with the goal of developing technical solutions and edge analytics for autonomous devices and vessels. Several papers and theses have been written as a result of the project, using techniques such as autoencoders to perform anomaly detection on data produced by sensors on a diesel engine. This thesis explores the use of cluster analysis for anomaly detection on diesel engine data from the Edge-project. Finding clusters could potentially represent different states of the running engine, with anomalies being represented e.g. by data points far away from cluster centroids, or data points not belonging to any particular cluster. The techniques of K-means, DBSCAN and spectral clustering are used for assigning clusters, with silhouette coefficient and eigengap used as hyperparameter tuning heuristics. Distance from cluster centroids and reduced kernel density estimation are used to flag anomalies. T-SNE and Self-Organizing Maps are used as dimensionality reduction techniques to visualize the data into a 3-dimensional and 2-dimensional space, respectively. Results show that what data are flagged as anomalies is highly sensitive to the choice of algorithm and chosen hyperparameters. The different results suggest different data as anomaly candidates. Therefore, further evaluation is needed from subject matter experts to determine which one of the models provides the most interesting results. Further work could include building an ensemble model that combines the used approaches, which could flag certain areas of the data space as a high risk for being anomalous.Moottorien valmistajat ja ylläpitäjät pyrkivät siirtymään korjaavasta huollosta ennakoivaan huoltoon modernien datavetoisten menetelmien avulla. Tämä voidaan saavuttaa esimerkiksi asentamalla antureita moottoriin ja etsimällä poikkeavuuksia anturien tuottamasta datasta. Yritykset kuten Wärtsilä, jotka tarjoavat kunnonvalvontapalveluita etsivät datasta poikkeavuuksia manuaalisesti Fourier-muunnosten avulla. Edge-projekti on teollinen tutkimushanke, johon osallistuu mm. yliopistoja ja yksityisen sektorin yrityksiä, ja jonka tavoitteena on tuottaa teknisiä ratkaisuja ja reunalaskenta-analytiikkaa itseohjautuville laitteille, ajoneuvoille ja aluksille. Hankkeesta on kirjoitettu monia tutkimusartikkeleita ja opinnäytetöitä, joissa käytetään tekniikoita kuten syviä neuroverkkoja poikkeavuuksien havaitsemiseen dieselmoottoriin asennettujen anturien tuottamasta datasta. Tämä opinnäytetyö tutkii klusterianalyysiä menetelmänä poikkeavuuksien havaitsemiseen Edge-projektissa ajetun dieselmoottorin datasta. Klusterit voisivat mahdollisesti edustaa ajettavan moottorin eri tiloja, ja poikkeavuudet voisivat olla esim. kaukana klusterien keskipisteistä olevia datapisteitä, tai datapisteitä, jotka eivät kuulu mihinkään tiettyyn klusteriin. Työssä käytetään algoritmeja K-means, DBSCAN ja spektraaliklusterointia klusterien määrittämiseen, ja siluettikerrointa sekä ominaisväliä käytetään hyperparametrioptimoinnin heuristiikkoina. Poikkeavuuksien merkintään käytetään etäisyyttä klusterien keskipisteisiin sekä alennettua ydintiheysestimaattoria. T-SNE:tä ja itseorganisoituvaa karttaa käytetään datan ulottuvuuksien vähentämisen tekniikoina, jotta data voidaan visualisoida 3- ja 2-ulotteiseen avaruuteen. Tulokset osoittavat, että mikä data tulkitaan poikkeavana, riippuu vahvasti algoritmin ja sen hyperparametrien valinnasta. Menetelmien merkitsemät poikkeavuudet eroavat huomattavasti toisistaan. Tämän vuoksi vaaditaan aihealueen ammattilaisilta lisätutkimuksia, jotta voidaan päättää mikä malli luo mielenkiintoisimmat tulokset. Jatkokehitysideana voisi olla mallikokoelma, jossa yhdistyy tässä työssä käytetyt menetelmät, ja jonka tehtävänä olisi kartoittaa data-avaruuden eri alueiden riskit poikkeavuuksien sisältämiseen
    corecore