9 research outputs found

    Enhancing and Combining a Recent K-means Family of Algorithms for Better Results

    Get PDF
    Clustering is widely used to explore and understand large collections of data. K-means clustering method is one of the most popular approaches due to its ease of use and simplicity to implement. In this thesis, the researcher introduces Distance-based Initialization Method for K-means clustering algorithm (DIMK-means) which is developed to select carefully a set of centroids that would get high accuracy results compared to the random selection of standard K-means clustering method in choosing initial centroids, which gets low accuracy results. This initialization method is as fast and as simple as the K-means algorithm itself with almost the same low cost, which makes it attractive in practice. The researcher also Introduces Density-based Split- and -Merge K-means clustering Algorithm (DSMK-means) which is developed to address stability problems of K-means clustering, and to improve the performance of clustering when dealing with datasets that contain clusters with different complex shapes and noise or outliers. Based on a set of many experiments, this research concluded that the developed algorithms are more capable to finding high accuracy results compared with other algorithms especially as they can process datasets containing clusters with different shapes, densities, non-linearly separable, or those with outliers and noise. The researcher chose the experiments datasets from artificial and real-world examples off the UCI Machine Learning Repository

    IMAGE UNDERSTANDING OF MOLAR PREGNANCY BASED ON ANOMALIES DETECTION

    Get PDF
    Cancer occurs when normal cells grow and multiply without normal control. As the cells multiply, they form an area of abnormal cells, known as a tumour. Many tumours exhibit abnormal chromosomal segregation at cell division. These anomalies play an important role in detecting molar pregnancy cancer. Molar pregnancy, also known as hydatidiform mole, can be categorised into partial (PHM) and complete (CHM) mole, persistent gestational trophoblastic and choriocarcinoma. Hydatidiform moles are most commonly found in women under the age of 17 or over the age of 35. Hydatidiform moles can be detected by morphological and histopathological examination. Even experienced pathologists cannot easily classify between complete and partial hydatidiform moles. However, the distinction between complete and partial hydatidiform moles is important in order to recommend the appropriate treatment method. Therefore, research into molar pregnancy image analysis and understanding is critical. The hypothesis of this research project is that an anomaly detection approach to analyse molar pregnancy images can improve image analysis and classification of normal PHM and CHM villi. The primary aim of this research project is to develop a novel method, based on anomaly detection, to identify and classify anomalous villi in molar pregnancy stained images. The novel method is developed to simulate expert pathologists’ approach in diagnosis of anomalous villi. The knowledge and heuristics elicited from two expert pathologists are combined with the morphological domain knowledge of molar pregnancy, to develop a heuristic multi-neural network architecture designed to classify the villi into their appropriated anomalous types. This study confirmed that a single feature cannot give enough discriminative power for villi classification. Whereas expert pathologists consider the size and shape before textural features, this thesis demonstrated that the textural feature has a higher discriminative power than size and shape. The first heuristic-based multi-neural network, which was based on 15 elicited features, achieved an improved average accuracy of 81.2%, compared to the traditional multi-layer perceptron (80.5%); however, the recall of CHM villi class was still low (64.3%). Two further textural features, which were elicited and added to the second heuristic-based multi-neural network, have improved the average accuracy from 81.2% to 86.1% and the recall of CHM villi class from 64.3% to 73.5%. The precision of the multi-neural network II has also increased from 82.7% to 89.5% for normal villi class, from 81.3% to 84.7% for PHM villi class and from 80.8% to 86% for CHM villi class. To support pathologists to visualise the results of the segmentation, a software tool, Hydatidiform Mole Analysis Tool (HYMAT), was developed compiling the morphological and pathological data for each villus analysis

    Study of Image Local Scale Structure Using Nonlinear Diffusion

    Get PDF
    Multi-scale representation and local scale extraction of images are important in computer vision research, as in general , structures within images are unknown. Traditionally, the multi-scale analysis is based on the linear diusion (i.e. heat diusion) with known limitation in edge distortions. In addition, the term scale which is used widely in multi-scale and local scale analysis does not have a consistent denition and it can pose potential diculties in real image analysis, especially for the proper interpretation of scale as a geometric measure. In this study, in order to overcome limitations of linear diusion, we focus on the multi-scale analysis based on total variation minimization model. This model has been used in image denoising with the power that it can preserve edge structures. Based on the total variation model, we construct the multi-scale space and propose a denition for image local scale. The new denition of local scale incorporates both pixel-wise and orientation information. This denition can be interpreted with a clear geometrical meaning and applied in general image analysis. The potential applications of total variation model in retinal fundus image analysis is explored. The existence of blood vessel and drusen structures within a single fundus image makes the image analysis a challenging problem. A multi-scale model based on total variation is used, showing the capabilities in both drusen and blood vessel detections. The performance of vessel detection is compared with publicly available methods, showing the improvements both quantitatively and qualitatively. This study provides a better insight into local scale study and shows the potentials of total variation model in medical image analysis

    New methods for clustering based on evolutionary algorithms

    Get PDF
    La presente tesis doctoral aborda el estudio de la solución del problema de agrupamiento. La solución a este problema busca dividir un conjunto de datos en grupos, donde los elementos dentro de cada grupo sean similares entre sí y diferentes con los elementos de otros grupos. El tipo de agrupamiento en el que se enfoca este trabajo es la agrupación basada en particiones, siendo el algoritmo K-means uno de los algoritmos precursores en resolver este problema. Concretamente, el agrupamiento o análisis de grupos es una técnica de aprendizaje no supervisado utilizada en el campo de la minería de datos y la inteligencia artificial. Tiene una gran relevancia debido a su aplicación en una amplia variedad de campos de la ciencia como la segmentación de imágenes, procesamiento digital de voz, recuperación de documentos, aplicaciones en internet, y actualmente, continúa el incremento en diferentes dominios de aplicación cada vez más diversos tales como la astronomía, geología, geofísica, paleoecología y medicina. Desde el punto de vista de la teoría de la complejidad computacional, la dificultad del problema de agrupación está al ser considerado del tipo NP-difícil, es un problema no convexo que suele contar con muchos óptimos locales por lo que al ser solucionado por un algoritmo a menudo termina dentro de uno de ellos. La finalidad es obtener una solución óptima que garantice una agrupación de calidad en función de determinados criterios que se fijen. Para su resolución se han diseñado muchos y diferentes enfoques de algoritmos, entre los que se encuentran los algoritmos bio-inspirados y dentro de estos, los algoritmos genéticos, que son modelos computacionales que simulan el fenómeno de evolución natural para resolver problemas en diferentes dominios incluyendo la agrupación. En esta tesis, en primer lugar, se ha realizado una exhaustiva revisión de los algoritmos genéticos que resuelven el problema de agrupamiento, centrándonos en los algoritmos genéticos que emplean un único objetivo, lo que ha permitido llevar a cabo un estudio pormenorizado con una taxonomía de las diferentes propuestas que representan el estado del arte. Como complemento, se ha desarrollado una herramienta de software llamada LEAC que cuenta con la implementación de todas las propuestas previas estudiadas y que se ha puesto disponible para la comunidad científica. La finalidad es posibilitar el acceso a todos estos modelos y que puedan ser utilizados directamente sin un amplio conocimiento de los mismos dentro de un mismo marco de trabajo. Además, su diseño modular, facilita en gran medida el diseño de nuevas propuestas gracias a todos los operadores de cruce y mutación, los operadores de selección, los métodos de inicialización, los tipos de codificaciones y las medidas de rendimiento que tiene disponible para su utilización directa en nuevas propuestas. Con el objetivo de analizar las diferentes propuestas existentes, se ha llevado a cabo un exhaustivo estudio experimental que ha incluido a todas las propuestas previas, una gran cantidad de conjuntos de datos y de índices de desempeño que miden tanto lo compacto que son los grupos que se han formado como la separación que existe entre ellos, esas son dos de las medidas más ampliamente utilizadas en estos entornos. Además, toda la información relativa a datos, algoritmos y comandos utilizados para la ejecución de las propuestas está disponible para facilitar una reproducción de los resultados. Finalmente, se han desarrollado dos nuevas propuestas de algoritmos genéticos que abordan este problema con codificaciones optimizadas y operadores especializados que han resultado ser altamente competitivas con respecto a las propuestas previas, obteniendo soluciones optimizadas que mejoran el rendimiento de estas propuestas previas.This doctoral thesis deals with the study of clustering. To solve this problem is necessary to divide a data set into groups, where the elements within each group are similar to each other and different from the elements of other groups. The type of clustering in which this work focuses is partitional clustering where the K-means algorithm appears as one of the precursor algorithms in solving this problem. Specifically, clustering or group analysis is an unsupervised learning technique used in the field of data mining and artificial intelligence. It has great relevance due to its application in a wide variety of fields of science such as image segmentation, digital voice processing, document retrieval, and Internet applications. Currently, it continues to increase in diverse application domains such as astronomy, geology, geophysics, paleoecology, and medicine. From the point of view of computational complexity theory, the difficulty of clustering problems is classified as the NP-hard type. It is a non-convex problem that usually has many local optimums, so that, algorithms often find one of these local optimums. The purpose is to obtain an optimal solution that guarantees a quality grouping based on some criterio fixed. For its resolution, many different algorithm approaches have been designed, among which we find the bio-inspired algorithms where are the genetic algorithms. These algorithms are computational models that simulate the process of natural evolution to solve problems in different domains, including clustering. In this thesis, first, an exhaustive review of the genetic algorithms that solve the clustering problem has been carried out. With the focus on the genetic algorithms that use a single objective, it has been carried out a detailed study with a taxonomy of the different proposals that represent the state of the art. As a complement, a software tool called LEAC has been developed that includes the implementation of all the previous proposals studied and that has been made available to the scientific community. The purpose is to enable access to all these models so that they can be used directly without extensive knowledge within the same framework. In addition, its modular design greatly facilitates the design of new proposals thanks to all the crossover and mutation operators, selection operators, initialization methods, encoding types, and performance measures that are available for use directly in new proposals. To analyze the different existing proposals, it has been carried out an exhaustive experimental study including all the previous algorithms, a large number of data sets, and performance measures that analyze both compactness and separation, two criteria widely used in these environments. In addition, all the information related to data, algorithms, and execution commands is available to facilitate the reproduction of the results. Finally, two new genetic algorithm proposals have been developed that solve the clustering problem. They use optimized encodings and specialized operators and are highly competitive with respect to previous proposals. Concretely, they obtain optimized solutions that improve the performance of previous proposals

    On-line learning and anomaly detection methods : applications to fault assessment

    Get PDF
    [Abstract] This work lays at the intersection of two disciplines, Machine Learning (ML) research and predictive maintenance of machinery. On the one hand, Machine Learning aims at detecting patterns in data gathered from phenomena which can be very different in nature. On the other hand, predictive maintenance of industrial machinery is the discipline which, based on the measurement of physical conditions of its internal components, assesses its present and near future condition in order to prevent fatal failures. In this work it is highlighted that these two disciplines can benefit from their synergy. Predictive maintenance is a challenge for Machine Learning algorithms due to the nature of data generated by rotating machinery: (a) each machine constitutes an new individual case so fault data is not available for model construction and (b) working conditions of the machine are changeable in many situations and affects captured data. Machine Learning can help predictive maintenance to: (a) cut plant costs though the automation of tedious periodic tasks which are carried out by experts and (b) reduce the probability of fatal damages in machinery due to the possibility of monitoring it more frequently at a modest cost increase. General purpose ML techniques able to deal with the aforementioned conditions are proposed. Also, its application to the specific field of predictive maintenance of rotating machinery based on vibration signature analysis is thoroughly treated. Since only normal state data is available to model the vibration captures of a machine, we are restricted to the use of anomaly detection algorithms, which will be one of the main blocks of this work. In addition, predictive maintenance also aims at assessing its state in the near future. The second main block of this work, on-line learning algorithms, will help us in this task. A novel on-line learning algorithm for a single layer neural network with a non-linear output function is proposed. In addition to the application to predictive maintenance, the proposed algorithm is able to continuously train a network in a one pattern at a time manner. If some conditions are hold, it analytically ensures to reach a global optimal model. As well as predictive maintenance, the proposed on-line learning algorithm can be applied to scenarios of stream data learning such as big data sets, changing contexts and distributed data. Some of the principles described in this work were introduced in a commercial software prototype, GIDASR ? . This software was developed and installed in real plants as part of the work of this thesis. The experiences in applying ML to fault detection with this software are also described and prove that the proposed methodology can be very effective. Fault detection experiments with simulated and real vibration data are also carried out and demonstrate the performance of the proposed techniques when applied to the problem of predictive maintenance of rotating machinery.[Resumen] La presente tesis doctoral se sitúa en el ámbito de dos disciplinas, la investigación en Aprendizaje Computacional (AC) y el Mantenimiento Predictivo (MP) de maquinaria rotativa. Por una parte, el AC estudia la problemática de detectar y clasificar patrones en conjuntos de datos extraídos de fenómenos de interés de la más variada naturaleza. Por su parte, el MP es la disciplina que, basándose en la monitorización de variables físicas de los componentes internos de maquinaria industrial, se encarga de valorar las condiciones de éstos tanto en el momento presente como en un futuro próximo con el fin último de prevenir roturas que pueden resultar de fatales consecuencias. En este trabajo se pone de relevancia que ambas disciplinas pueden beneficiarse de su sinergia. El MP supone un reto para el AC debido a la naturaleza de los datos generados por la maquinaria: (a) las propiedades de las medidas físicas recogidas varían para cada máquina y, debido a que la monitorización debe comenzar en condiciones correctas, no contamos con datos de fallos para construir un modelo de comportamiento y (b) las condiciones de funcionamiento de las máquinas pueden ser variables y afectar a los datos generados por éstas. El AC puede ayudar al MP a: (a) reducir costes a través de la automatización de tareas periódicas tediosas que tienen que ser realizadas por expertos en el área y (b) reducir la probabilidad de grandes da˜nos a la maquinaria gracias a la posibilidad de monitorizarla con una mayor frecuencia sin elevar los costes sustancialmente. En este trabajo, se proponen algoritmos de AC de propósito general capaces de trabajar en las condiciones anteriores. Además, su aplicación específica al campo del mantenimiento predictivo de maquinaria rotativa basada en el análisis de vibraciones se estudia en detalle, aportando resultados para casos reales. El hecho de disponer sólamente de datos en condiciones de normalidad de la maquinaria nos restringe al uso de técnicas de detección de anomalías. éste será uno de los bloques principales del presente trabajo. Por otra parte, el MP también intenta valorar si la maquinaria se encontrará en un estado inaceptable en un futuro próximo. En el segundo bloque se presenta un nuevo algoritmo de aprendizaje en tiempo real (on-line) que será de gran ayuda en esta tarea. Se propone un nuevo algoritmo de aprendizaje on-line para una red neuronas monocapa con función de transferencia no lineal. Además de su aplicación al mantenimiento predictivo, el algoritmo propuesto puede ser empleado en otros escenarios de aprendizaje on-line como grandes conjuntos de datos, cambios de contexto o datos distribuidos. Algunas de las ideas descritas en este trabajo fueron implantadas en un prototipo de software comercial, GIDASR ? . Este software fue desarrollado e implantado en plantas reales por el autor de este trabajo y las experiencias extraídas de su aplicación también se describen en el presente volumen[Resumo] O presente traballo sitúase no ámbito de dúas disciplinas, a investigación en Aprendizaxe Computacional (AC) e o Mantemento Predictivo (MP) de maquinaria rotativa. Por unha banda, o AC estuda a problemática de detectar e clasificar patróns en conxuntos de datos extraídos de fenómenos de interese da máis variada natureza. Pola súa banda, o MP é a disciplina que, baseándose na monitorización de variables físicas dos seus compo˜nentes internos, encárgase de valorar as condicións destes tanto no momento presente como nun futuro próximo co fin último de previr roturas que poden resultar de fatais consecuencias. Neste traballo ponse de relevancia que ambas disciplinas poden beneficiarse da súa sinergia. O MP supón un reto para o AC debido á natureza dos datos xerados pola maquinaria: (a) as propiedades das medidas físicas recolleitas varían para cada máquina e, debido a que a monitorización debe comezar en condicións correctas, non contamos con datos de fallos para construír un modelo de comportamento e (b) as condicións de funcionamento das máquinas poden ser variables e afectar aos datos xerados por estas. O AC pode axudar ao MP a: (a) reducir custos a través da automatización de tarefas periódicas tediosas que te˜nen que ser realizadas por expertos no área e (b) reducir a probabilidade de grandes danos na maquinaria grazas á posibilidade de monitorizala cunha maior frecuencia sen elevar os custos sustancialmente. Neste traballo, propó˜nense algoritmos de AC de propósito xeral capaces de traballar nas condicións anteriores. Ademais, a súa aplicación específica ao campo do mantemento predictivo de maquinaria rotativa baseada na análise de vibracións estúdase en detalle aportando resultados para casos reais. Debido a contar só con datos en condicións de normalidade da maquinaria, estamos restrinxidos ao uso de técnicas de detección de anomalías. éste será un dos bloques principais do presente traballo. Por outra banda, o MP tamén intenta valorar si a maquinaria atoparase nun estado inaceptable nun futuro próximo. No segundo bloque do presente traballo preséntase un novo algoritmo de aprendizaxe en tempo real (on-line) que será de gran axuda nesta tarefa. Proponse un novo algoritmo de aprendizaxe on-line para unha rede neuronas monocapa con función de transferencia non lineal. Ademais da súa aplicación ao mantemento predictivo, o algoritmo proposto pode ser empregado en escenarios de aprendizaxe on-line como grandes conxuntos de datos, cambios de contexto ou datos distribuídos. Algunhas das ideas descritas neste traballo foron implantadas nun prototipo de software comercial, GIDASR ? . Este software foi desenvolvido e implantado en plantas reais polo autor deste traballo e as experiencias extraídas da súa aplicación tamén se describen no presente volume

    Mammographic Segmentation Using WaveCluster

    No full text
    Segmentation of clinically relevant regions from potentially noisy images represents a significant challenge in the field of mammography. We propose novel approaches based on the WaveCluster clustering algorithm for segmenting both the breast profile in the presence of significant acquisition noise and segmenting regions of interest (ROIs) within the breast. Using prior manual segmentations performed by domain experts as ground truth data, we apply our method to 150 film mammograms with significant acquisition noise from the University of South Florida’s Digital Database for Screening Mammography. We then apply a similar segmentation procedure to detect the position and extent of suspicious regions of interest. Our approach was able to segment the breast profile from all 150 images, leaving minor residual noise adjacent to the breast in three. Performance on ROI extraction was also excellent, with 81% sensitivity and 0.96 false positives per image when measured against manually segmented ground truth ROIs. When not utilizing image morphology, our approach ran in linear time with the input size. These results highlight the potential of WaveCluster as a useful addition to the mammographic segmentation repertoire

    Mammographic Segmentation Using WaveCluster

    No full text
    corecore