8 research outputs found
A Complex Networks Approach for Data Clustering
Many methods have been developed for data clustering, such as k-means,
expectation maximization and algorithms based on graph theory. In this latter
case, graphs are generally constructed by taking into account the Euclidian
distance as a similarity measure, and partitioned using spectral methods.
However, these methods are not accurate when the clusters are not well
separated. In addition, it is not possible to automatically determine the
number of clusters. These limitations can be overcome by taking into account
network community identification algorithms. In this work, we propose a
methodology for data clustering based on complex networks theory. We compare
different metrics for quantifying the similarity between objects and take into
account three community finding techniques. This approach is applied to two
real-world databases and to two sets of artificially generated data. By
comparing our method with traditional clustering approaches, we verify that the
proximity measures given by the Chebyshev and Manhattan distances are the most
suitable metrics to quantify the similarity between objects. In addition, the
community identification method based on the greedy optimization provides the
smallest misclassification rates.Comment: 9 pages, 8 Figure
Network and attribute‐based clustering of tennis players and tournaments
This paper aims at targeting some relevant issues for clustering tennis players and
tournaments: (i) it considers players, tournaments and the relation between them;
(ii) the relation is taken into account in the fuzzy clustering model based on the
Partitioning Around Medoids (PAM) algorithm through spatial constraints; (iii) the
attributes of the players and of the tournaments are of different nature, qualitative
and quantitative. The proposal is novel for the methodology used, a spatial Fuzzy
clustering model for players and for tournaments (based on related attributes), where
the spatial penalty term in each clustering model depends on the relation between
players and tournaments described in the adjacency matrix. The proposed model is
compared with a bipartite players-tournament complex network model (the Degree-
Corrected Stochastic Blockmodel) that considers only the relation between players
and tournaments, described in the adjacency matrix, to obtain communities on each
side of the bipartite network. An application on data taken from the ATP official
website with regards to the draws of the tournaments, and from the sport statistics
website Wheelo ratings for the performance data of players and tournaments, shows
the performances of the proposed clustering model
Uma análise comparativa entre técnicas de detecção de comunidades com aplicação para o problema de agrupamento de objetos invariantes
Trabalho de Conclusão de Curso (Graduação)O agrupamento de dados consiste na identificação de grupos de objetos de acordo com
algum critério de similaridade. Normalmente, tal critério está associado apenas aos atributos físicos dos dados utilizando, por exemplo, medidas de distância ou centróides. Uma
abordagem mais recente é o uso de redes complexas, também conhecido por Detecção de
Comunidades, o qual permite examinar, além dos atributos físicos, a estrutura topológica
dos dados. Neste trabalho, propõe-se o uso de algoritmos de Detecção de Comunidades
para o problema não supervisionado de reconhecimento de padrões invariantes. Dado um
conjunto de imagens de objetos em diferentes posições, ângulos e rotações, o problema
consiste em detectar e agrupar imagens relacionadas a um mesmo objeto. Para a realização deste trabalho, foram aplicados três algoritmos de Detecção de Comunidades em
bases de dados reais disponíveis na literatura. As bases de dados foram divididas em dois
conjuntos, bases de dados no formato de grafo e bases de dados para o agrupamento de
objetos invariantes. Experimentos apontam bons resultados por parte destes algoritmos
de acordo com um conjunto de métricas de desempenho
A centrality based multi-objective approach to disease gene association
Disease Gene Association nds genes that are involved in the presentation of a given genetic disease. We present a hybrid approach which implements a multi-objective genetic algorithm, where input consists of centrality measures based on various relational biological evidence types merged into a complex network. Multiple objective settings and parameters are studied including the development of a new exchange methodology, safe dealer-based crossover. Successful results with respect to breast cancer and Parkinson's disease compared to previous techniques and popular known databases are shown. In addition,
the newly developed methodology is also successfully applied to Alzheimer's disease, further demonstrating its flexibility.
Across all three case studies the strongest results were produced by the shortest path-based measures stress and betweenness, either in a single objective parameter setting or when used in conjunction in a multi-objective environment. The new crossover technique achieved the best results when applied to Alzheimer's disease.Natural Sciences and Engineering Research Council of Canad
A Centrality Based Multi-Objective Disease-Gene Association Approach Using Genetic Algorithms
The Disease Gene Association Problem (DGAP) is a bioinformatics problem in which genes are ranked with respect to how involved they are in the presentation of a particular disease. Previous approaches have shown the strength of both Monte Carlo and evolutionary computation (EC) based techniques. Typically these past approaches improve ranking measures, develop new gene relation definitions, or implement more complex EC systems.
This thesis presents a hybrid approach which implements a multi-objective genetic algorithm, where input consists of centrality measures based on various relational biological evidence types merged into a complex network. In an effort to explore the effectiveness of the technique compared to past work, multiple objective settings and different EC parameters are studied including the development of a new exchange methodology, safe dealer-based (SDB) crossover. Successful results with respect to breast cancer and Parkinson's disease compared to previous EC techniques and popular known databases are shown. In addition, the newly developed methodology is also successfully applied to Alzheimer’s, further demonstrating the flexibility of the technique.
Across all three cases studies the strongest results were produced by the shortest path-based measures stress and betweenness in a single objective parameter setting. When used in conjunction in a multi-objective environment, competitive results were also obtained but fell short of the single objective settings studied as part of this work. Lastly, while SDB crossover fell short of expectations on breast cancer and Parkinson's, it achieved the best results when applied to Alzheimer’s, illustrating the potential of the technique for future study
Disease-Gene Association Using a Genetic Algorithm
Understanding the relationship between genetic diseases and the genes associated with them is an important problem regarding human health. The vast amount of data created from a large number of high-throughput experiments performed in the last few years has resulted in an unprecedented growth in computational methods to tackle the disease gene association problem. Nowadays, it is clear that a genetic disease is not a consequence of a defect in a single gene. Instead, the disease phenotype is a reflection of various genetic components interacting in a complex network. In fact, genetic diseases, like any other phenotype, occur as a result of various genes working in sync with each other in a single or several biological module(s). Using a genetic algorithm, our method tries to evolve communities containing the set of potential disease genes likely to be involved in a given genetic disease. Having a set of known disease genes, we first obtain a protein-protein interaction (PPI) network containing all the known disease genes. All the other genes inside the procured PPI network are then considered as candidate disease genes as they lie in the vicinity of the known disease genes in the network. Our method attempts to find communities of potential disease genes strongly working with one another and with the set of known disease genes. As a proof of concept, we tested our approach on 16 breast cancer genes and 15 Parkinson's Disease genes. We obtained comparable or better results than CIPHER, ENDEAVOUR and GPEC, three of the most reliable and frequently used disease-gene ranking frameworks