1 research outputs found
Computing methods for parallel processing and analysis on complex networks
Nowadays to solve some problems is required to model complex systems to simulate and
understand its behavior.
A good example of one of those complex systems is the Facebook Social Network, this
system represents people and their relationships, Other example, the Internet composed
by a vast number of servers, computers, modems and routers, All Science field (physics,
economics political, and so on) have complex systems which are complex because of the
big volume of data required to represent them and their fast change on their structure
Analyze the behavior of these complex systems is important to create simulations or
discover dynamics over it with main goal of understand how it works.
Some complex systems cannot be easily modeled; We can begin by analyzing their
structure, this is possible creating a network model, Mapping the problem´s entities and
the relations between them.
Some popular analysis over the structure of a network are:
• The Community Detection – discover how their entities are grouped
• Identify the most important entities – measure the node´s influence over the
network
• Features over whole network like – the diameter, number of triangles, clustering
coefficient, and the shortest path between two entities.
Multiple algorithms have been created to give a result to these analyses over the network
model although if they are executed by one machine take a lot of time to complete the task
or may not be executed due to machine limitation resources.
As more demanding applications have been appearing to process the algorithms of these
type of analysis, several parallel programming models and different kind of hardware
architecture have been created to deal with the big input of data, reduce the time
execution, save power consumption and enhance the efficiency in the computation in each
machine also taking in mine the application requirements.
Parallelize these algorithms are a challenge due to:
• We need to analyze data dependence to implement a parallel version of the
algorithm always taking in mine the scalability and the performance of the code.
• Create a implementation of the algorithm for one parallel programming model like
MapReduce (Apache Hadoop), RDD (Apache Spark), Pregel(Apache Giraph) these
oriented to bigdata or HPC models how MPI + OpenMP , OmpSS or CUDA.
• Distribute the data input over the processing platform for each node or offload it
into accelerators such as GPU or FPGA and so on.
• Store the data input and store the result of the processing requires techniques of
Distribute file systems(HDFS), distribute NoSQL Data Bases (Object Data Bases,
Graph Data Bases, Document Data Bases) or traditional relational Data
Bases(oracle, SQL server).
In this Master Thesis, we decided create Graph processing using Apache bigdata Tools
mainly creating testing over MareNostrum III and the Amazon cloud for some Community
Detection Algorithms using SNAP Graphs with ground-truth communities.
Creating a comparative between their parallel computational time execution and scalability