30 research outputs found

    Numerical Data Clustering Ontology Approach

    Get PDF
    Clustering algorithm tasks are used to group given objects defined by a set of numerical properties in such a way that the objects within a group are more similar than the objects in different groups. All clustering algorithms have common parameters the choice of which characterizes the effectiveness of clustering. The most important parameters characterizing clustering are: metrics, number of clusters and cluster validity criteria. In classic clustering algorithms semantic knowledge is ignored. This creates difficulties in interpreting the results of clustering. At present, the use of ontology opportunities is developing very rapidly, that provide an explicit model for structuring concepts, together with their interrelationship, which allows you to gain knowledge of a particular data model. According to the previously obtained results of clustering study, the author will make an attempt to create ontology-based concept from numerical data using similarity measures, cluster numbers, cluster validity and others characteristic features. To scientific novelty should be attributed the combination of approaches of classical data analysis and ontological approach to their structuring, that increases the efficiency of their use in engineering practice

    CLASS - A Study of methods for coarse phonetic classification

    Get PDF
    The objective of this thesis was to examine computer techniques for classifying speech signals into four coarse phonetic classes: vowel-like, strong fricative, weak fricative and silence. The study compared classification results from the K-means clustering algorithm using Euclidian distance measurements with classification using a multivariate maximum likelihood distance measure. In addition to the comparison of statistical methods, this study compared classification using several tree-structured decision making processes. The system was trained on ten speakers using 98 utterances with both known and unknown speakers. Results showed very little difference between the Euclidian distance and maximum likelihood; however, the introduction of the tree structure on both systems had a positive influence on their performance

    Analyzing Destination Choices of Tourists and Residents from Location Based Social Media Data

    Get PDF
    Ubiquitous uses of social media platforms in smartphones have created an opportunity to gather digital traces of individual activities at a large scale. Traditional travel surveys fall short in collecting longitudinal travel behavior data for a large number of people in a cost effective way, especially for the transient population such as tourists. This study presents an innovating methodological framework, using machine learning and econometric approaches, to gather and analyze location-based social media (LBSM) data to understand individual destination choices. First, using Twitter\u27s search interface, we have collected Twitter posts of nearly 156,000 users for the state of Florida. We have adopted several filtering techniques to create a reliable sample from noisy Twitter data. An ensemble classification technique is proposed to classify tourists and residents from user coordinates. The performance of the proposed classifier has been validated using manually labeled data and compared against the state-of-the-art classification methods. Second, using different clustering methods, we have analyzed the spatial distributions of destination choices of tourists and residents. The clusters from tourist destinations revealed most popular tourist spots including emerging tourist attractions in Florida. Third, to predict a tourist\u27s next destination type, we have estimated a Conditional Random Field (CRF) model with reasonable accuracy. Fourth, to analyze resident destination choice behavior, this study proposes an extensive data merging operation among the collected Twitter data and different geographic database from state level data libraries. We have estimated a Panel Latent Segmentation Multinomial Logit (PLSMNL) model to find the characteristics affecting individual destination choices. The proposed PLSMNL model is found to better explain the effects of variables on destination choices compared to trip-specific Multinomial Logit Models. The findings of this study show the potential of LBSM data in future transportation and planning studies where collecting individual activity data is expensive

    Unsupervised Segmentation Method for Diseases of Soybean Color Image Based on Fuzzy Clustering

    Get PDF
    The method of color image segmentation based on Fuzzy C-Means (FCM) clustering is simple, intuitive and is to be implemented. However, the clustering performance is affected by the center point of initialization and high computation and other issues. In this research, we propose a new color image unsupervised segmentation method based on fuzzy clustering. This method combines advantages of the fuzzy C-means algorithm and unsupervised clustering algorithm. Firstly, by gradually changing clusters c, and according to validity measurement, it can unsupervised search for optimal clusters c; then in order to achieve higher accuracy of clustering effect, the distance measurement scale was improved. In our experiments, this method was applied to color image segmentation for three kinds of soybean diseases. The results show that this method can more accurately segment the lesion area from the color image, and the segmentation processing of soybean disease is ideal, robustness, and have a high accuracy

    Ontology Partitioning: Clustering Based Approach

    Full text link

    Despliegue óptimo de redes inalámbricas para la infraestructura de medición inteligente de energía eléctrica

    Get PDF
    In this document we present an optimization model of the base stations that serve as collection points for data that are sent from the smart meters in order to cover a group of users that are grouped in a residential area, which sends Data collected by the distribution companies taking control data of energy consumption in each area where the user is located. In the article, we propose the method ILP and two heuristic methods of cluster users who are the K-Means method and the method of K-Medoids for each base station is required to install in the area. The article presents a comparison between the three algorithms we propose for the grouping of users to discuss which of the clustering methods have less coverage error, less time and better clustering performance so you can see which of the three methods is applied more efficient with the use of graphics. With the optimization of base stations we can get a glimpse of how many BS (base stations) we will install in the real way ruling out other SB that were proposed initially as candidates, resulting in a cost minimization installation and an intelligent network that is efficient, reliable and economical with the main objective is to cover all users, or people who are in the area who benefit from the mains.En este trabajo se presenta un modelo de optimización de las estaciones base que sirven como puntos de recolección de datos, los cuales son enviados desde los medidores inteligentes con el fin de dar cobertura a un grupo de usuarios que se agrupan en una zona residencial, desde las estaciones base se envían los datos recogidos a las empresas distribuidoras quienes llevaran el control del consumo de energía de cada área donde el usuario se encuentra ubicado. En la investigación se propone el método ILP y dos métodos heurísticos de agrupación de usuarios que son el método de K-Means y el método de K-Medoids para cada estación base que se requiere instalar en la zona. Se presenta una comparación entre los tres algoritmos que se propone para la agrupación de los usuarios, con el fin dar un análisis de cuál de los métodos de agrupación tienen menor error de cobertura, menor tiempo en ejecución y mejor clusterización y así poder ver cuál de los tres métodos aplicados es el más óptimo. Con la optimización de las estaciones de base se obtendrá una visión de cuantas BS (estaciones base) se instalara en la zona de manera real descartando a las demás BS que fueron propuestas en un principio como candidatas, teniendo como resultado una minimización de los costos de instalación y una red inteligente que sea eficiente, fiable y económica con el objetivo principal de dar cobertura a todos los usuarios o habitantes que se encuentran en la localidad que se benefician de la red de eléctrica

    New machine-learning-based techniques for DNA microarray image segmentation.

    Get PDF
    Microarray technology, which provides detailed and abundant information about biological experiments, is a significant achievement in the history of biology. One of the key issues in the microarray processing is to extract quantitative information from the spots, which represent the genes in the experiments. The process of identifying the spots and separating the foreground from the background is known as microarray image segmentation. In this thesis, we present two methods for microarray image segmentation. First, we conduct an in-depth analysis of the influence of important factors on clustering-based microarray image segmentation algorithms. Based on our analysis, we present an optimized clustering-based algorithm for microarray image segmentation, which exploits more than one feature to gain better results comparing to the state-of-the-art clustering-based algorithms. We also consider the fact that most of the spots in a microarray image are ellipses in shape, and hence introduce a novel adaptive ellipse method. This method shows various advantages when compared to the adaptive circle method, one of the most used approaches in microarray image segmentation. The simulations on real-life microarray images show that our method is capable of extracting information from the images which is ignored by the traditional adaptive circle method, and hence showing more flexibility. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Q26. Source: Masters Abstracts International, Volume: 43-03, page: 0887. Adviser: Luis Rueda. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering

    Full text link
    This paper presents a comprehensive survey of the meta-heuristic optimization algorithms on the text clustering applications and highlights its main procedures. These Artificial Intelligence (AI) algorithms are recognized as promising swarm intelligence methods due to their successful ability to solve machine learning problems, especially text clustering problems. This paper reviews all of the relevant literature on meta-heuristic-based text clustering applications, including many variants, such as basic, modified, hybridized, and multi-objective methods. As well, the main procedures of text clustering and critical discussions are given. Hence, this review reports its advantages and disadvantages and recommends potential future research paths. The main keywords that have been considered in this paper are text, clustering, meta-heuristic, optimization, and algorithm
    corecore