6 research outputs found

    Simulating the behavior of the human brain on GPUS

    Get PDF
    The simulation of the behavior of the Human Brain is one of the most important challenges in computing today. The main problem consists of finding efficient ways to manipulate and compute the huge volume of data that this kind of simulations need, using the current technology. In this sense, this work is focused on one of the main steps of such simulation, which consists of computing the Voltage on neurons’ morphology. This is carried out using the Hines Algorithm and, although this algorithm is the optimum method in terms of number of operations, it is in need of non-trivial modifications to be efficiently parallelized on GPUs. We proposed several optimizations to accelerate this algorithm on GPU-based architectures, exploring the limitations of both, method and architecture, to be able to solve efficiently a high number of Hines systems (neurons). Each of the optimizations are deeply analyzed and described. Two different approaches are studied, one for mono-morphology simulations (batch of neurons with the same shape) and one for multi-morphology simulations (batch of neurons where every neuron has a different shape). In mono-morphology simulations we obtain a good performance using just a single kernel to compute all the neurons. However this turns out to be inefficient on multi-morphology simulations. Unlike the previous scenario, in multi-morphology simulations a much more complex implementation is necessary to obtain a good performance. In this case, we must execute more than one single GPU kernel. In every execution (kernel call) one specific part of the batch of the neurons is solved. These parts can be seen as multiple and independent tridiagonal systems. Although the present paper is focused on the simulation of the behavior of the Human Brain, some of these techniques, in particular those related to the solving of tridiagonal systems, can be also used for multiple oil and gas simulations. Our studies have proven that the optimizations proposed in the present work can achieve high performance on those computations with a high number of neurons, being our GPU implementations about 4× and 8× faster than the OpenMP multicore implementation (16 cores), using one and two NVIDIA K80 GPUs respectively. Also, it is important to highlight that these optimizations can continue scaling, even when dealing with a very high number of neurons.This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 720270 (HBP SGA1), from the Spanish Ministry of Economy and Competitiveness under the project Computación de Altas Prestaciones VII (TIN2015-65316-P), the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Parallels (2014-SGR-1051). We thank the support of NVIDIA through the BSC/UPC NVIDIA GPU Center of Excellence, and the European Union’s Horizon 2020 Research and Innovation Program under the Marie Sklodowska-Curie Grant Agreement No. 749516.Peer ReviewedPostprint (published version

    Improved Dynamic Graph Coloring

    Get PDF
    This paper studies the fundamental problem of graph coloring in fully dynamic graphs. Since the problem of computing an optimal coloring, or even approximating it to within n^{1-epsilon} for any epsilon > 0, is NP-hard in static graphs, there is no hope to achieve any meaningful computational results for general graphs in the dynamic setting. It is therefore only natural to consider the combinatorial aspects of dynamic coloring, or alternatively, study restricted families of graphs. Towards understanding the combinatorial aspects of this problem, one may assume a black-box access to a static algorithm for C-coloring any subgraph of the dynamic graph, and investigate the trade-off between the number of colors and the number of recolorings per update step. Optimizing the number of recolorings, sometimes referred to as the recourse bound, is important for various practical applications. In WADS\u2717, Barba et al. devised two complementary algorithms: For any beta > 0, the first (respectively, second) maintains an O(C beta n^{1/beta}) (resp., O(C beta))-coloring while recoloring O(beta) (resp., O(beta n^{1/beta})) vertices per update. Barba et al. also showed that the second trade-off appears to exhibit the right behavior, at least for beta = O(1): Any algorithm that maintains a c-coloring of an n-vertex dynamic forest must recolor Omega(n^{2/(c(c-1))}) vertices per update, for any constant c >= 2. Our contribution is two-fold: - We devise a new algorithm for general graphs that improves significantly upon the first trade-off in a wide range of parameters: For any beta > 0, we get a O~(C/(beta)log^2 n)-coloring with O(beta) recolorings per update, where the O~ notation supresses polyloglog(n) factors. In particular, for beta = O(1) we get constant recolorings with polylog(n) colors; not only is this an exponential improvement over the previous bound, but it also unveils a rather surprising phenomenon: The trade-off between the number of colors and recolorings is highly non-symmetric. - For uniformly sparse graphs, we use low out-degree orientations to strengthen the above result by bounding the update time of the algorithm rather than the number of recolorings. Then, we further improve this result by introducing a new data structure that refines bounded out-degree edge orientations and is of independent interest

    BDWatchdog: real-time monitoring and profiling of Big Data applications and frameworks

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Future Generation Computer Systems. The final authenticated version is available online at: https://doi.org/10.1016/j.future.2017.12.068[Abstract] Current Big Data applications are characterized by a heavy use of system resources (e.g., CPU, disk) generally distributed across a cluster. To effectively improve their performance there is a critical need for an accurate analysis of both Big Data workloads and frameworks. This means to fully understand how the system resources are being used in order to identify potential bottlenecks, from resource to code bottlenecks. This paper presents BDWatchdog, a novel framework that allows real-time and scalable analysis of Big Data applications by combining time series for resource monitorization and flame graphs for code profiling, focusing on the processes that make up the workload rather than the underlying instances on which they are executed. This shift from the traditional system-based monitorization to a process-based analysis is interesting for new paradigms such as software containers or serverless computing, where the focus is put on applications and not on instances. BDWatchdog has been evaluated on a Big Data cloud-based service deployed at the CESGA supercomputing center. The experimental results show that a process-based analysis allows for a more effective visualization and overall improves the understanding of Big Data workloads. BDWatchdog is publicly available at http://bdwatchdog.dec.udc.es.Ministerio de Economía, Industria y Competitividad; TIN2016-75845-PMinsiterio de Educación; FPU15/0338

    Efficient communication protection of many-core systems against active attackers

    Get PDF
    Many-core system-on-chips, together with their established communication infrastructures, Networks-on-Chip (NoC), are growing in complexity, which encourages the integration of third-party components to simplify and accelerate production processes. However, this also adversely exposes the surface for attacks through the injection of hardware Trojans. This work addresses active attacks on NoCs and focuses on the integrity and availability of transmitted data. In particular, we consider the modification and/or dropping of data during transmission as active attacks that might be performed by malicious routers. To mitigate the impact of such active attacks, we propose two lightweight solutions that respect the performance constraints of NoCs. Assuming the presence of symmetric keys, these approaches combine lightweight authentication codes for integrity protection with network coding for increased efficiency and robustness. The proposed solutions prevent undetected modifications and significantly increase availability through a reliable detection of attacks. The efficiency of these solutions is investigated in different scenarios using cycle-accurate simulations and the area overhead is analyzed relative to state-of-the-art many-core system. The results demonstrate that one authentication scheme with network coding protects the integrity of data to a low residual error of 1.36% at 0.2 attack probability with an area overhead of 2.68%. For faster and more flexible evaluation, an analytical approach is developed which is validated against the cycle-accurate simulations. The analytical approach is more than 1000× faster while having a maximum estimation error of 5%. Moreover, the analytical model provides a deeper insight into the system’s behavior. For example, it reveals which factors influence the performance parameters
    corecore