86 research outputs found

    Scalable Community Detection using Distributed Louvain Algorithm

    Get PDF
    Community detection (or clustering) in large-scale graph is an important problem in graph mining. Communities reveal interesting characteristics of a network. Louvain is an efficient sequential algorithm but fails to scale emerging large-scale data. Developing distributed-memory parallel algorithms is challenging because of inter-process communication and load-balancing issues. In this work, we design a shared memory-based algorithm using OpenMP, which shows a 4-fold speedup but is limited to available physical cores. Our second algorithm is an MPI-based parallel algorithm that scales to a moderate number of processors. We also implement a hybrid algorithm combining both. Finally, we incorporate dynamic load-balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms, shows around 12-fold speedup scaling to a larger number of processors. Overall, we present the challenges, our solutions, and the empirical performance of our algorithms for several large real-world networks

    Activity Report: Automatic Control 2012

    Get PDF

    Activity Report: Automatic Control 2011

    Get PDF

    HIGH PERFORMANCE DECENTRALISED COMMUNITY DETECTION ALGORITHMS FOR BIG DATA FROM SMART COMMUNICATION APPLICATIONS

    Get PDF
    Many systems in the world can be represented as models of complex networks and subsequently be analysed fruitfully. One fundamental property of the real-world networks is that they usually exhibit inhomogeneity in which the network tends to organise according to an underlying modular structure, commonly referred to as community structure or clustering. Analysing such communities in large networks can help people better understand the structural makeup of the networks. For example, it can be used in mobile ad-hoc and sensor networks to improve the energy consumption and communication tasks. Thus, community detection in networks has become an important research area within many application fields such as computer science, physical sciences, mathematics and biology. Driven by the recent emergence of big data, clustering of real-world networks using traditional methods and algorithms is almost impossible to be processed in a single machine. The existing methods are limited by their computational requirements and most of them cannot be directly parallelised. Furthermore, in many cases the data set is very big and does not fit into the main memory of a single machine, therefore needs to be distributed among several machines. The main topic of this thesis is about network community detection within these big data networks. More specifically, in this thesis, a novel approach, namely Decentralized Iterative Community Clustering Approach (DICCA) for clustering large and undirected networks is introduced. An important property of this approach is its ability to cluster the entire network without the global knowledge of the network topology. Moreover, an extension of the DICCA called Parallel Decentralized Iterative Community Clustering approach (PDICCA) is proposed for efficiently processing data distributed across several machines. PDICCA is based on MapReduce computing platform to work efficiently in distributed and parallel fashion. In addition, the real-world networks are usually noisy and imperfect with missing and false edges. These imperfections are often difficult to eliminate and highly affect the quality and accuracy of conventional methods used to find the community structure in the network. However, in real-world networks, node attribute information is also available in addition to topology information. Considering more than one source of information for community detection could produce meaningful clusters and improve the robustness of the network. Therefore, a pre-processing approach that considers attribute information, shared neighbours and connectivity information aspects of the network for community detection is presented in this thesis as part of my research. Finally, a set of real-world mobile phone usage data obtained from Cambridge Laboratories (Device Analyzer) has been analysed as an exploratory step for viability to apply the algorithms developed in this thesis. All the proposed approaches have been evaluated and verified for feasibility using real-world large data set. The evaluation results of these experimentations prove very promising for the type of large data networks considered

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Continuous Influence-based Community Partition for Social Networks

    Full text link
    Community partition is of great importance in social networks because of the rapid increasing network scale, data and applications. We consider the community partition problem under LT model in social networks, which is a combinatorial optimization problem that divides the social network to disjoint mm communities. Our goal is to maximize the sum of influence propagation through maximizing it within each community. As the influence propagation function of community partition problem is supermodular under LT model, we use the method of Lov{aˊ\acute{a}}sz Extension to relax the target influence function and transfer our goal to maximize the relaxed function over a matroid polytope. Next, we propose a continuous greedy algorithm using the properties of the relaxed function to solve our problem, which needs to be discretized in concrete implementation. Then, random rounding technique is used to convert the fractional solution to integer solution. We present a theoretical analysis with 1−1/e1-1/e approximation ratio for the proposed algorithms. Extensive experiments are conducted to evaluate the performance of the proposed continuous greedy algorithms on real-world online social networks datasets and the results demonstrate that continuous community partition method can improve influence spread and accuracy of the community partition effectively.Comment: arXiv admin note: text overlap with arXiv:2003.1043

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Accuracy-Aware Adaptive Traffic Monitoring for Software Dataplanes

    Get PDF
    Network operators have recently been developing multi-Gbps traffic monitoring tools on commodity hardware, as part of the packet-processing pipelines realizing software dataplanes. These solutions allow the execution of sophisticated per-packet monitoring using the processing power available on servers. Although advances in packet capture have enabled the interception of packets at high rates, bottlenecks can still arise in the monitoring process as a result of concurrent access to shared processor resources, variations of the traffic skew, and unbalanced packet-rate spikes. In this paper we present an adaptive monitoring framework, →ol, which is resilient to bottlenecks while maintaining the accuracy of monitoring reports above a user-specified threshold. →ol dynamically reduces the measurement task sets under adverse conditions, and reconfigures them to recover potential accuracy degradations. To quantify the monitoring accuracy at run time, →ol adopts a novel task-independent technique that generates accuracy estimates according to recently observed traffic characteristics. With a prototype implementation based on a generic packet-processing pipeline, and using well-known measurements tasks, we show that →ol achieves lossless traffic monitoring for a wide range of conditions, significantly enhances the level of monitoring accuracy, and performs adaptations at the time scale of milliseconds with limited overhead

    Energy autonomous systems : future trends in devices, technology, and systems

    Get PDF
    The rapid evolution of electronic devices since the beginning of the nanoelectronics era has brought about exceptional computational power in an ever shrinking system footprint. This has enabled among others the wealth of nomadic battery powered wireless systems (smart phones, mp3 players, GPS, …) that society currently enjoys. Emerging integration technologies enabling even smaller volumes and the associated increased functional density may bring about a new revolution in systems targeting wearable healthcare, wellness, lifestyle and industrial monitoring applications

    Introduction to fast Super-Paramagnetic Clustering

    Get PDF
    We map stock market interactions to spin models to recover their hierarchical structure using a simulated annealing based Super-Paramagnetic Clustering (SPC) algorithm. This is directly compared to a modified implementation of a maximum likelihood approach to fast-Super-Paramagnetic Clustering (f-SPC). The methods are first applied standard toy test-case problems, and then to a dataset of 447 stocks traded on the New York Stock Exchange (NYSE) over 1249 days. The signal to noise ratio of stock market correlation matrices is briefly considered. Our result recover approximately clusters representative of standard economic sectors and mixed clusters whose dynamics shine light on the adaptive nature of financial markets and raise concerns relating to the effectiveness of industry based static financial market classification in the world of real-time data-analytics. A key result is that we show that the standard maximum likelihood methods are confirmed to converge to solutions within a Super-Paramagnetic (SP) phase. We use insights arising from this to discuss the implications of using a Maximum Entropy Principle (MEP) as opposed to the Maximum Likelihood Principle (MLP) as an optimization device for this class of problems
    • …
    corecore