Search CORE

412 research outputs found

Hybrid optimization for k-means clustering learning enhancement

Author: Farhang Yousef
Publication venue
Publication date: 01/01/2016
Field of study

In recent years, combinational optimization issues are introduced as critical problems in clustering algorithms to partition data in a way that optimizes the performance of clustering. K-means algorithm is one of the famous and more popular clustering algorithms which can be simply implemented and it can easily solve the optimization issue with less extra information. But the problems associated with Kmeans algorithm are high error rate, high intra cluster distance and low accuracy. In this regard, researchers have worked to improve the problems computationally, creating efficient solutions that lead to better data analysis through the K-means clustering algorithm. The aim of this study is to improve the accuracy of the Kmeans algorithm using hybrid and meta-heuristic methods. To this end, a metaheuristic approach was proposed for the hybridization of K-means algorithm scheme. It obtained better results by developing a hybrid Genetic Algorithm-K-means (GAK- means) and a hybrid Partial Swarm Optimization-K-means (PSO-K-means) method. Finally, the meta-heuristic of Genetic Algorithm-Partial Swarm Optimization (GAPSO) and Partial Swarm Optimization-Genetic Algorithm (PSOGA) through the K-means algorithm were proposed. The study adopted a methodological approach to achieve the goal in three phases. First, it developed a hybrid GA-based K-means algorithm through a new crossover algorithm based on the range of attributes in order to decrease the number of errors and increase the accuracy rate. Then, a hybrid PSO-based K-means algorithm was mooted by a new calculation function based on the range of domain for decreasing intra-cluster distance and increasing the accuracy rate. Eventually, two meta-heuristic algorithms namely GAPSO-K-means and PSOGA-K-means algorithms were introduced by combining the proposed algorithms to increase the number of correct answers and improve the accuracy rate. The approach was evaluated using six integer standard data sets provided by the University of California Irvine (UCI). Findings confirmed that the hybrid optimization approach enhanced the performance of K-means clustering algorithm. Although both GA-K-means and PSO-K-means improved the result of K-means algorithm, GAPSO-K-means and PSOGA-K-means meta-heuristic algorithms outperformed the hybrid approaches. PSOGA-K-means resulted in 5%- 10% more accuracy for all data sets in comparison with other methods. The approach adopted in this study successfully increased the accuracy rate of the clustering analysis and decreased its error rate and intra-cluster distance

Universiti Teknologi Malaysia Institutional Repository

Structural damage monitoring based on machine learning and bio-inspired computing

Author: Vitola Oyaga Jaime
Publication venue: Universitat Politècnica de Catalunya
Publication date: 28/01/2021
Field of study

For a few decades, systems for supervising structures have become increasingly irnportant. In origin, the strategies had as a goal only the detection of damages. Furthermore, now monitoring the civil or military structures permanently and offering sufficient and relevant information helping make the right decisions. The SHM is applicable, carrying out preventive or corrective maintenance decisions, reducing the possibility of accidents, and promoting the reduction of costs that more extensive repairs imply when the damage is detected early. The current work focused on three elements of diagnosis of structural damage: detection, classification, and location, either in metaltic or cornposite material structures, given their wide use in air, land, rnaritime transport vehicles, aerospace, wind turbines, civil and military infrastructure. This work used the tools offered by machine leaming and bio-inspired computing. Given the right results to solve complex problems and recognizing pattems. It also involves changes in temperature since it is one of the parameters that influence real environments' structures. Information of a statistical nature applied to recognizing pattems and reducing the size of the information was used with tools such as PCA (principal component analysis), thanks to the experience obtained in works developed by the CoDAlab research group. The document is divided into five parts. The first includes a general description of the problem, the objecti.-es, and the results obtained, in addition to a brief theoretical introduction. Chapters 2, 3, and 4 include articles published in different joumals. Chapter 5 shows the results and conclusions. Other contributions, such as a book chapter and sorne papers presented at conferences, are included in appendix A. Finally, appendix B presents a multiplexing system used to develop the experiments carried out in this work.Desde hace algunas décadas los sistemas para supervisar estructuras han tenido cada vez más relevancia. En esta evolución se ha pasado de estrategias que tenían como meta sólo la detección de fallas a otras que buscan monitorizar permanentemente las estructuras bien sean éstas civiles o militares, ofreciendo información suficiente y pertinente que incide positivamente en el momento de tomar buenas decisiones, dentro de las cuales cabe destacar por ejemplo, las orientadas a realizar mantenimientos preventivos o correctivos si es del caso, reduciendo la posibilidad de accidentes, además de propiciar la disminución de costos que implican las reparaciones más extensas cuando el daño se logra detectar de manera temprana. El presente trabajo se enfocó en tres elementos de diagnóstico de daños en estructuras, siendo estos en particular la detección, clasificación y localización, bien sea en estructuras metálicas o de material compuesto, dado su amplio uso en vehículos de transporte aéreo, terrestre, marítimo, aeroespacial, aerogeneradores, infraestructura civil y militar. Se utilizaron las herramientas que ofrecen el aprendizaje automático (machine leaming) y la computación bio-inspirada, dados los buenos resultados que han ofrecido en la solución de problemas complejos y el reconocimiento de patrones. Involucrando cambios de temperatura dado que es uno de los parámetros a los que se ven enfrentadas las estructuras en ambientes reales. Se utilizó información de naturaleza estadística aplicada al reconocimiento de patrones y reducción del tamaño de la información con herramientas como el PCA (análisis de componentes principales), gracias a la experiencia lograda en trabajos desarrollados por el grupo de investigación CoDAlab. El documento está dividido en cinco capítulos. En el primerio se incluye una descripción general del problema, los objetivos y los resultados obtenidos, además de un breve introducción teórica. Los Capítulos 2,3 y 4 incluyen los artículos publicados en diferentes revistas. En el Capítulo 5 se realiza una presentación de los resultados y conclusiones. En el Anexo A se incluyen otras contribuciones tales como un capítulo de libro y algunos trabajos presentados en conferencias. Finalmente en el anexo B se presenta el diseño de un sistema de multipliexación utilizado en el desarrollo de los experimentos realizados en el presente trabajo.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Data mining techniques for complex application domains

Author: Mahoto NAEEM AHMED
Publication venue: Politecnico di Torino
Publication date
Field of study

The emergence of advanced communication techniques has increased availability of large collection of data in electronic form in a number of application domains including healthcare, e- business, and e-learning. Everyday a large amount of records are stored electronically. However, finding useful information from such a large data collection is a challenging issue. Data mining technology aims automatically extracting hidden knowledge from large data repositories exploiting sophisticated algorithms. The hidden knowledge in the electronic data may be potentially utilized to facilitate the procedures, productivity, and reliability of several application domains. The PhD activity has been focused on novel and effective data mining approaches to tackle the complex data coming from two main application domains: Healthcare data analysis and Textual data analysis. The research activity, in the context of healthcare data, addressed the application of different data mining techniques to discover valuable knowledge from real exam-log data of patients. In particular, efforts have been devoted to the extraction of medical pathways, which can be exploited to analyze the actual treatments followed by patients. The derived knowledge not only provides useful information to deal with the treatment procedures but may also play an important role in future predictions of potential patient risks associated with medical treatments. The research effort in textual data analysis is twofold. On the one hand, a novel approach to discovery of succinct summaries of large document collections has been proposed. On the other hand, the suitability of an established descriptive data mining to support domain experts in making decisions has been investigated. Both research activities are focused on adopting widely exploratory data mining techniques to textual data analysis, which require overcoming intrinsic limitations for traditional algorithms for handling textual documents efficiently and effectively

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Recommended from our members

Advances in manufacturing technology – XXII

Author: Cheng K
Harrison DJ
Makatsoris H
Publication venue: Brunel University
Publication date: 01/01/2008
Field of study

Brunel University Research Archive

Gene Expression : From Microarrays to Functional Genomics

Author: Greco Dario
Publication venue: 'University of Helsinki Libraries'
Publication date: 28/05/2009
Field of study

The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease.The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease

Helsingin yliopiston digitaalinen arkisto

Pattern Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition

Directory of Open Access Books (DOAB)

Recommended from our members

Collective analysis of multiple high-throughput gene expression datasets

Author: Abu Jamous Basel
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonModern technologies have resulted in the production of numerous high-throughput biological datasets. However, the pace of development of capable computational methods does not cope with the pace of generation of new high-throughput datasets. Amongst the most popular biological high-throughput datasets are gene expression datasets (e.g. microarray datasets). This work targets this aspect by proposing a suite of computational methods which can analyse multiple gene expression datasets collectively. The focal method in this suite is the unification of clustering results from multiple datasets using external specifications (UNCLES). This method applies clustering to multiple heterogeneous datasets which measure the expression of the same set of genes separately and then combines the resulting partitions in accordance to one of two types of external specifications; type A identifies the subsets of genes that are consistently co-expressed in all of the given datasets while type B identifies the subsets of genes that are consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets. This contributes to the types of questions which can addressed by computational methods because existing clustering, consensus clustering, and biclustering methods are inapplicable to address the aforementioned objectives. Moreover, in order to assist in setting some of the parameters required by UNCLES, the M-N scatter plots technique is proposed. These methods, and less mature versions of them, have been validated and applied to numerous real datasets from the biological contexts of budding yeast, bacteria, human red blood cells, and malaria. While collaborating with biologists, these applications have led to various biological insights. In yeast, the role of the poorly-understood gene CMR1 in the yeast cell-cycle has been further elucidated. Also, a novel subset of poorly understood yeast genes has been discovered with an expression profile consistently negatively correlated with the well-known ribosome biogenesis genes. Bacterial data analysis has identified two clusters of negatively correlated genes. Analysis of data from human red blood cells has produced some hypotheses regarding the regulation of the pathways producing such cells. On the other hand, malarial data analysis is still at a preliminary stage. Taken together, this thesis provides an original integrative suite of computational methods which scrutinise multiple gene expression datasets collectively to address previously unresolved questions, and provides the results and findings of many applications of these methods to real biological datasets from multiple contexts.National Institute for Health Research (NIHR) and the Brunel College of Engineering, Design and Physical Science

Brunel University Research Archive

Bioinformatics

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

Directory of Open Access Books (DOAB)

Intelligent Analysis for Multi-Level Data-Driven Prediction

Author: Li Zhenpeng
Publication venue
Publication date: 01/01/2019
Field of study

Aberystwyth Research Portal

Automatic Pancreas Segmentation and 3D Reconstruction for Morphological Feature Extraction in Medical Image Analysis

Author: Asaturyan H.
Asaturyan H.
Publication venue
Publication date: 01/01/2021
Field of study

The development of highly accurate, quantitative automatic medical image segmentation techniques, in comparison to manual techniques, remains a constant challenge for medical image analysis. In particular, segmenting the pancreas from an abdominal scan presents additional difficulties: this particular organ has very high anatomical variability, and a full inspection is problematic due to the location of the pancreas behind the stomach. Therefore, accurate, automatic pancreas segmentation can consequently yield quantitative morphological measures such as volume and curvature, supporting biomedical research to establish the severity and progression of a condition, such as type 2 diabetes mellitus. Furthermore, it can also guide subject stratification after diagnosis or before clinical trials, and help shed additional light on detecting early signs of pancreatic cancer. This PhD thesis delivers a novel approach for automatic, accurate quantitative pancreas segmentation in mostly but not exclusively Magnetic Resonance Imaging (MRI), by harnessing the advantages of machine learning and classical image processing in computer vision. The proposed approach is evaluated on two MRI datasets containing 216 and 132 image volumes, achieving a mean Dice similarity coefficient (DSC) of 84:1 4:6% and 85:7 2:3% respectively. In order to demonstrate the universality of the approach, a dataset containing 82 Computer Tomography (CT) image volumes is also evaluated and achieves mean DSC of 83:1 5:3%. The proposed approach delivers a contribution to computer science (computer vision) in medical image analysis, reporting better quantitative pancreas segmentation results in comparison to other state-of-the-art techniques, and also captures detailed pancreas boundaries as verified by two independent experts in radiology and radiography. The contributions’ impact can support the usage of computational methods in biomedical research with a clinical translation; for example, the pancreas volume provides a prognostic biomarker about the severity of type 2 diabetes mellitus. Furthermore, a generalisation of the proposed segmentation approach successfully extends to other anatomical structures, including the kidneys, liver and iliopsoas muscles using different MRI sequences. Thus, the proposed approach can incorporate into the development of a computational tool to support radiological interpretations of MRI scans obtained using different sequences by providing a “second opinion”, help reduce possible misdiagnosis, and consequently, provide enhanced guidance towards targeted treatment planning

WestminsterResearch