60 research outputs found

    Image inpainting based on self-organizing maps by using multi-agent implementation

    Get PDF
    AbstractThe image inpainting is a well-known task of visual editing. However, the efficiency strongly depends on sizes and textural neighborhood of “missing” area. Various methods of image inpainting exist, among which the Kohonen Self-Organizing Map (SOM) network as a mean of unsupervised learning is widely used. The weaknesses of the Kohonen SOM network such as the necessity for tuning of algorithm parameters and the low computational speed caused the application of multi- agent system with a multi-mapping possibility and a parallel processing by the identical agents. During experiments, it was shown that the preliminary image segmentation and the creation of the SOMs for each type of homogeneous textures provide better results in comparison with the classical SOM application. Also the optimal number of inpainting agents was determined. The quality of inpainting was estimated by several metrics, and good results were obtained in complex images

    Individuals tell a fascinating story: using unsupervised text mining methods to cluster policyholders based on their medical history

    Get PDF
    Background and objective: Classifying people according to their health profile is crucial in order to propose appropriate treatment. However, the medical diagnosis is sometimes not available. This is for example the case in health insurance, making the proposal of custom prevention plans difficult. When this is the case, an unsupervised clustering method is needed. This article aims to compare three different methods by adapting some text mining methods to the field of health insurance. Also, a new clustering stability measure is proposed in order to compare the stability of the tested processes. Methods : Nonnegative Matrix Factorization, the word2vec method, and marginalized Stacked Denoising Autoencoders are used and compared in order to create a high-quality input for a clustering method. A self-organizing map is then used to obtain the final clustering. A real health insurance database is used in order to test the methods. Results: the marginalized Stacked Denoising Autoencoder outperforms the other methods both in stability and result quality with our data. Conclusions: The use of text mining methods offers several possibilities to understand the context of any medical act. On a medical database, the process could reveal unexpected correlation between treatment, and thus, pathology. Moreover, this kind of method could exploit the refund dates contained in the data, but the tested method using temporality, word2vec, still needs to be improved since the results, even if satisfying, are not as better as the one offered by other methods

    Analyzing international travelers\u27 profile with self-organizing maps

    Full text link
    It is generally agreed that knowledge is the most valuable asset to an organization. Knowledge enables a business to effectively compete with its competitors. In the tourism context, an in-depth knowledge of the profile of international travelers to a destination has become a crucial factor for decision makers to formulate their business strategies and better serve their customers. In this research, a self-organizing map (SOM) network was used for segmenting international travelers to Hong Kong, a major travel destination in Asia. An association rules discovery algorithm is then utilized to automatically characterize the profile of each segment. The resulting maps serve as a visual analysis tool for tourism managers to better understand the characteristics, motivations, and behaviors of international travelers

    Health-policyholder clustering using health consumption: a useful tool for targeting prevention plans

    Get PDF
    On paper, prevention appears to be a good complement to health insurance. However, its implementation is often costly. To maximize the impact and efficiency of prevention plans these should target particular groups of policyholders. In this article, we propose a way of clustering policyholders that could be a starting point for the targeting of prevention plans. This two-step method mainly classifies using policyholder health consumption. This dimension is first reduced using a Nonnegative matrix factorization algorithm, producing intermediate health-product clusters. We then cluster using Kohonen's map algorithm. This leads to a natural visualization of the results, allowing the simple comparison of results from different databases. We apply our method to two real health-insurer datasets. We carry out a number of tests (including tests on a text-mining database) of method stability and clustering ability. The method is shown to be stable, easily-understandable, and able to cluster most policyholders efficiently

    Challenges and prospects of spatial machine learning

    Get PDF
    The main objective of this thesis is to improve the usefulness of spatial machine learning for the spatial sciences and to allow its unused potential to be exploited. To achieve this objective, this thesis addresses several important but distinct challenges which spatial machine learning is facing. These are the modeling of spatial autocorrelation and spatial heterogeneity, the selection of an appropriate model for a given spatial problem, and the understanding of complex spatial machine learning models.Das wesentliche Ziel dieser Arbeit ist es, die Nützlichkeit des räumlichen maschinellen Lernens für die Raumwissenschaften zu verbessern und es zu ermöglichen, ungenutztes Potenzial auszuschöpfen. Um dieses Ziel zu erreichen, befasst sich diese Arbeit mit mehreren wichtigen Herausforderungen, denen das räumliche maschinelle Lernen gegenübersteht. Diese sind die Modellierung von räumlicher Autokorrelation und räumlicher Heterogenität, die Auswahl eines geeigneten Modells für ein gegebenes räumliches Problem und das Verständnis komplexer räumlicher maschineller Lernmodelle

    Data exploration process based on the self-organizing map

    Get PDF
    With the advances in computer technology, the amount of data that is obtained from various sources and stored in electronic media is growing at exponential rates. Data mining is a research area which answers to the challange of analysing this data in order to find useful information contained therein. The Self-Organizing Map (SOM) is one of the methods used in data mining. It quantizes the training data into a representative set of prototype vectors and maps them on a low-dimensional grid. The SOM is a prominent tool in the initial exploratory phase in data mining. The thesis consists of an introduction and ten publications. In the publications, the validity of SOM-based data exploration methods has been investigated and various enhancements to them have been proposed. In the introduction, these methods are presented as parts of the data mining process, and they are compared with other data exploration methods with similar aims. The work makes two primary contributions. Firstly, it has been shown that the SOM provides a versatile platform on top of which various data exploration methods can be efficiently constructed. New methods and measures for visualization of data, clustering, cluster characterization, and quantization have been proposed. The SOM algorithm and the proposed methods and measures have been implemented as a set of Matlab routines in the SOM Toolbox software library. Secondly, a framework for SOM-based data exploration of table-format data - both single tables and hierarchically organized tables - has been constructed. The framework divides exploratory data analysis into several sub-tasks, most notably the analysis of samples and the analysis of variables. The analysis methods are applied autonomously and their results are provided in a report describing the most important properties of the data manifold. In such a framework, the attention of the data miner can be directed more towards the actual data exploration task, rather than on the application of the analysis methods. Because of the highly iterative nature of the data exploration, the automation of routine analysis tasks can reduce the time needed by the data exploration process considerably.reviewe

    Using data analysis to approximate fastest paths on urban networks

    Get PDF
    Estimating shortest paths on large networks is a crucial problem for dynamic route guidance systems. The present paper proposes a statistical approach for approximating fastest paths on urban networks. The network data for statistical analysis is generated using a macroscopic traffic flow based simulation software. The input to the software are the input flows and the arc loads or the number of cars in each arc and the outputs from the software are the various paths joining the origins and the destinations of the network. The network data obtained from the simulation software is subjected to hybrid clustering followed by canonical correlation analysis. The hybrid clustering comprises of two methods namely k-means and ward's hierarchical agglomerative clustering. The results of the data analysis are decision rules containing arc loads and input flows that govern the fastest paths on the network. These rules are used for predicting the paths to follow while arriving at the entrances of the network. Before entering the network, the arc loads and input flows provided by the rules are checked inside the network. If agreement is found, then the path obtained from the data analysis is the fastest path otherwise the shortest path is chosen as the fastest path
    corecore