11 research outputs found

    Multilevel refinement based on neighborhood similarity

    Get PDF
    The multilevel graph partitioning strategy aims to reduce the computational cost of the partitioning algorithm by applying it on a coarsened version of the original graph. This strategy is very useful when large-scale networks are analyzed. To improve the multilevel solution, refinement algorithms have been used in the uncorsening phase. Typical refinement algorithms exploit network properties, for example minimum cut or modularity, but they do not exploit features from domain specific networks. For instance, in social networks partitions with high clustering coefficient or similarity between vertices indicate a better solution. In this paper, we propose a refinement algorithm (RSim) which is based on neighborhood similarity. We compare RSim with: 1. two algorithms from the literature and 2. one baseline strategy, on twelve real networks. Results indicate that RSim is competitive with methods evaluated for general domains, but for social networks it surpasses the competing refinement algorithms.CNPq (grant 151836-/2013-2)FAPESP (grants 2011/22749-8, 11/20451-1 and 2013/12191-5)CAPE

    A Naïve Bayes model based on overlapping groups for link prediction in online social networks

    Get PDF
    Link prediction in online social networks is useful in numerous applications, mainly for recommendation. Recently, different approaches have considered friendship groups information for increasing the link prediction accuracy. Nevertheless, these approaches do not consider the different roles that common neighbors may play in the different overlapping groups that they belong to. In this paper, we propose a new approach that uses overlapping groups structural information for building a naïve Bayes model. From this proposal, we show three different measures derived from the common neighbors. We perform experiments for both unsupervised and supervised link prediction strategies considering the link imbalance problem. We compare sixteen measures in four well-known online social networks: Flickr, LiveJournal, Orkut and Youtube. Results show that our proposals help to improve the link prediction accuracy.São Paulo Research Foundation (FAPESP) (grants: 2013/12191-5, 2011/21880-3, 2011/23689-9 and 2011/22749-8

    Método multinível em redes bipartidas

    No full text
    Bipartite networks comprise a particular class of network models in which the vertex set is split into two disjoint and independent subsets, with edges connecting only vertices placed in different subsets. They provide a powerful representation of the relationships in many realworld systems and have been widely employed to model data-intensive problems. In a related scenario, multilevel methods have been previously applied to handle computationally expensive optimization problems defined in networks. The strategy aims at reducing the cost of executing an expensive algorithm or task by exploiting a hierarchy of coarsened versions of the original network. There is a growing interest in multilevel methods in networked systems, motivated mostly by their capability of handling large-scale networks and applicability to a variety of problems, most notably community detection and network drawing. Despite their potential, existing approaches are not directly applicable to bipartite networks and, to the best of our knowledge, the multilevel strategy had not been considered in this context so far, opening a vast space for scientific exploration. This gap motivated this research project, which introduces a study on multilevel methods applicable to bipartite networks. In order to overcome the aforementioned limitations, this thesis presents two novel multilevel frameworks for handling bipartite structures, named OPM and MOb. OPM analyzes the bipartite network based on its one-mode projections, allowing the reuse of classical and already established solutions from the literature. MOb (and its Mdr, CSV and CSL variations) operate directly on the bipartite representation to execute the multilevel method, providing a cost-effective implementation. Empirical results obtained on a set of synthetic and real-world networks on diverse applications indicate a considerable speed up with no significant loss in the quality of the solutions obtained in the coarsened networks as compared to those obtained in the original network (i.e., conventional approaches). The potential applicability and reliability of the proposed methods have been illustrated in multiple scenarios, namely optimization, community detection, dimensionality reduction and visualization. Furthermore, the results provide empirical evidence that the proposed methods can foster novel applications of the multilevel method in bipartite networks, e.g. link prediction and trajectory mining and, therefore, that this thesis brings a relevant contribution to the field.As redes bipartidas compreendem uma classe particular das redes complexas, na qual vértices são divididos em dois subconjuntos separados e independentes e as arestas conectam apenas vértices de conjuntos diferentes. Tais redes fornecem uma poderosa representação para a modelagem de muitos sistemas complexos do mundo real e têm sido amplamente empregadas em problemas caracterizados pelo alto custo computacional e uso intensivo de dados. Nessa linha, os chamados métodos multinível têm sido empregados para tratar problemas computacionalmente custosos e descrevem uma estratégia escalável que explora (e cria) uma hierarquia de versões reduzidas, ou simplificadas, da rede original. Nos últimos anos, houve um crescente interesse em métodos multinível motivado, principalmente, por sua capacidade de manipular redes de larga escala, bem como sua aplicabilidade em diversos problemas, como detecção de comunidades e visualização. Apesar de seu potencial, as abordagens atuais não são diretamente aplicáveis às redes bipartites e, até onde sabemos, a estratégia multinível não havia sido considerada neste contexto anteriormente, abrindo um vasto espaço para exploração científica. Essa lacuna motivou este projeto de pesquisa, o qual introduz um estudo sobre métodos multinível aplicáveis às redes bipartidas. Para superar as limitações mencionadas, esta tese apresenta duas novas estratégias direcionadas às redes bipartidas, denominadas OPM e MOb. O OPM analisa a rede bipartida em suas projeções unipartidas e permite a reutilização de algoritmos multinível clássicos e já estabelecidos na literatura. O MOb (e suas variações Mdr, CSV e CSL) considera diretamente a estrutura bipartida para executar o método multinível e fornecer uma implementação eficiente e eficaz. Os resultados empíricos obtidos em conjuntos de redes reais e sintéticas, em uma variedade de aplicações, demonstram uma redução considerável no tempo de processamento sem perda significativa na qualidade da solução obtida na rede reduzida, quando comparada aos resultados obtidos na rede original. A potencial aplicabilidade e confiabilidade dos métodos propostos foram ilustradas em múltiplos cenários, a saber: otimização, detecção de comunidades, redução de dimensionalidade e visualização. Além disso, os resultados fornecem evidências empíricas de que os métodos propostos podem fomentar novas aplicações do método multinível em redes bipartidas, por exemplo, na predição de arestas e mineração de trajetórias, e evidenciam que este estudo gerou contribuições relevantes para a área

    Multilevel refinement in complex networks based on neighborhood similarity

    No full text
    No contexto de Redes Complexas, particularmente das redes sociais, grupos de objetos densamente conectados entre si, esparsamente conectados a outros grupos, são denominados de comunidades. Detecção dessas comunidades tornou-se um campo de crescente interesse científico e possui inúmeras aplicações práticas. Nesse contexto, surgiram várias pesquisas sobre estratégias multinível para particionar redes com elevada quantidade de vértices e arestas. O objetivo dessas estratégias é diminuir o custo do algoritmo de particionamento aplicando-o sobre uma versão reduzida da rede original. Uma possibilidade dessa estratégia, ainda pouco explorada, é utilizar heurísticas de refinamento local para melhorar a solução final. A maioria das abordagens de refinamento exploram propriedades gerais de redes complexas, tais como corte mínimo ou modularidade, porém, não exploram propriedades inerentes de domínios específicos. Por exemplo, redes sociais são caracterizadas por elevado coeficiente de agrupamento e assortatividade significativa, consequentemente, maximizar tais características pode conduzir a uma boa solução e uma estrutura de comunidades bem definida. Motivado por essa lacuna, neste trabalho é proposto um novo algoritmo de refinamento, denominado RSim, que explora características de alto grau de transitividade e assortatividade presente em algumas redes reais, em particular em redes sociais. Para isso, adotou-se medidas de similaridade híbridas entre pares de vértices, que utilizam os conceitos de vizinhança e informações de comunidades para interpretar a semelhança entre pares de vértices. Uma análise comparativa e sistemática demonstrou que o RSim supera os algoritmos de refinamento habituais em redes com alto coeficiente de agrupamento e assortatividade. Além disso, avaliou-se o RSim em uma aplicação real. Nesse cenário, o RSim supera todos os métodos avaliado quanto a eficiência e eficácia, considerando todos os conjuntos de dados selecionados.In the context of complex networks, particularly social networks, groups of densely interconnected objects, sparsely linked to other groups are called communities. Detection of these communities has become a field of increasing scientific interest and has numerous practical applications. In this context, several studies have emerged on multilevel strategies for partitioning networks with high amount of vertices and edges. The goal of these strategies is to reduce the cost of partitioning algorithm by applying it on a reduced version of the original network. The possibility for this strategy, yet little explored, is to apply local refinement heuristics to improve the final solution. Most refinement approaches explore general properties of complex networks, such as minimum cut or modularity, however, do not exploit inherent properties of specific domains. For example, social networks are characterized by high clustering coefficient and significant assortativity, hence maximize such characteristics may lead to a good solution and a well-defined community structure. Motivated by this gap, in this thesis, we propose a new refinement algorithm, called RSim, which exploits characteristics of high degree of transitivity and assortativity present in some real networks, particularly social networks. For this, we adopted hybrid similarity measures between pairs of vertices, using the concepts of neighborhood and community information to interpret the similarity between pairs of vertices. A systematic and comparative analysis showed that the RSim statistically outperforms usual refinement algorithms in networks with high clustering coefficient and assortativity. In addition, we assessed the RSim in a real application. In this scenario, the RSim surpasses all evaluated methods in efficiency and effectiveness, considering all the selected data sets

    Coarsening effects on k-partite network classification

    No full text
    Abstract The growing data size poses challenges for storage and computational processing time in semi-supervised models, making their practical application difficult; researchers have explored the use of reduced network versions as a potential solution. Real-world networks contain diverse types of vertices and edges, leading to using k-partite network representation. However, the existing methods primarily reduce uni-partite networks with a single type of vertex and edge. We develop a new coarsening method applicable to the k-partite networks that maintain classification performance. The empirical analysis of hundreds of thousands of synthetically generated networks demonstrates the promise of coarsening techniques in solving large networks’ storage and processing problems. The findings indicate that the proposed coarsening algorithm achieved significant improvements in storage efficiency and classification runtime, even with modest reductions in the number of vertices, leading to over one-third savings in storage and twice faster classifications; furthermore, the classification performance metrics exhibited low variation on average

    [pt] SEGUNDA LISTA DE EXERCÍCIOS - ECO1109 - 2005.1

    No full text
    Many real world complex networks have an a overlapping community structure, in which a vertex belongs to one or more communities. Numerous approaches for crisp overlapping community detection were proposed in the literature, most of them have a good accuracy but their computational costs are considerably high and infeasible for large-scale networks. Since the multilevel approach has not been previously applied to deal with overlapping communities detection problem, in this paper we propose an adaptation of this approach to tackle the detection problem to overlapping communities case. The goal is to analyze the time impact and the quality of solution of our multilevel strategy regarding to traditional algorithms. Our experiments show that our proposal consistently produces good performance compared to single-level algorithms and in less time.CNPq (grant: 151836/2013-2)FAPESP (grants: 2011/22749-8 and 2013/12191-5)CAPE

    A review and comparative analysis of coarsening algorithms on bipartite networks

    No full text
    Coarsening algorithms have been successfully used as a powerful strategy to deal with data-intensive machine learning problems defined in bipartite networks, such as clustering, dimensionality reduction, and visualization. Their main goal is to build informative simplifications of the original network at different levels of details. Despite its widespread relevance, a comparative analysis of these algorithms and performance evaluation is needed. Additionally, some aspects of these algorithms’ current versions have not been explored in their original or complementary studies. In that regard, we strive to fill this gap, presenting a formal and illustrative description of coarsening algorithms developed for bipartite networks. Afterward, we illustrate the usage of these algorithms in a set of emblematic problems. Finally, we evaluate and quantify their accuracy using quality and runtime measures in a set of thousands of synthetic and real-world networks with various properties and structures. The presented empirical analysis provides evidence to assess the strengths and shortcomings of such algorithms. Our study is a unified and useful resource that provides guidelines to researchers interested in learning about and applying these algorithms

    Data mining techniques for road accidentes: Clustering versus complex netwoks

    No full text
    This work analyses the performance of grouping methods based on complex networks and clusters, in order to identify main road accident groups and risk factors. The research included a balancing step of data classes, used in the classification and extraction process of decision rules applied in each grouping. Then, was possible the assessment and visualization of critical areas of traffic accidents involving victims with material damage, non-fatal and fatal victims. The results pointed out that complex networks present better possibility of generalization for different subsets of data, and higher accuracy in group formation when compared to traditional clustering methods. The use of complex networks aided in the process of acquiring decision rules with higher level of confidence, and higher probability of occurrence.Este trabajo analiza el rendimiento de los métodos de agrupación basados ​​en redes y clústeres complejos, con el fin de identificar los principales grupos de accidentes de tráfico y factores de riesgo. Una investigación incluyó un paso para equilibrar las clases de datos, utilizadas en el proceso de clasificación y extracción de reglas de decisión, aplicadas en cada grupo. Entonces, fue posible evaluar y responder a áreas críticas de accidentes de tránsito, que involucran daños materiales, víctimas no fatales y fatales. Los resultados señalaron que como redes complejas presentan una mejor posibilidad de generalización para diferentes subconjuntos de datos y una mayor precisión en la formación de grupos, en comparación con los métodos tradicionales de agrupación. El uso de redes complejas ayuda en el proceso de adquisición de reglas de decisión con un mayor nivel de confianza y una mayor probabilidad de ocurrencia
    corecore