Search CORE

142 research outputs found

Expander Chunked Codes

Author: Lu Sanglu
Tang Bin
Yang Shenghao
Ye Baoliu
Yin Yitong
Publication venue
Publication date: 26/03/2015
Field of study

Chunked codes are efficient random linear network coding (RLNC) schemes with low computational cost, where the input packets are encoded into small chunks (i.e., subsets of the coded packets). During the network transmission, RLNC is performed within each chunk. In this paper, we first introduce a simple transfer matrix model to characterize the transmission of chunks, and derive some basic properties of the model to facilitate the performance analysis. We then focus on the design of overlapped chunked codes, a class of chunked codes whose chunks are non-disjoint subsets of input packets, which are of special interest since they can be encoded with negligible computational cost and in a causal fashion. We propose expander chunked (EC) codes, the first class of overlapped chunked codes that have an analyzable performance,where the construction of the chunks makes use of regular graphs. Numerical and simulation results show that in some practical settings, EC codes can achieve rates within 91 to 97 percent of the optimum and outperform the state-of-the-art overlapped chunked codes significantly.Comment: 26 pages, 3 figures, submitted for journal publicatio

arXiv.org e-Print Archive

Springer - Publisher Connector

Batched Sparse Codes

Author: Yang Shenghao
Yeung Raymond W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/02/2014
Field of study

Network coding can significantly improve the transmission rate of communication networks with packet loss compared with routing. However, using network coding usually incurs high computational and storage costs in the network devices and terminals. For example, some network coding schemes require the computational and/or storage capacities of an intermediate network node to increase linearly with the number of packets for transmission, making such schemes difficult to be implemented in a router-like device that has only constant computational and storage capacities. In this paper, we introduce BATched Sparse code (BATS code), which enables a digital fountain approach to resolve the above issue. BATS code is a coding scheme that consists of an outer code and an inner code. The outer code is a matrix generation of a fountain code. It works with the inner code that comprises random linear coding at the intermediate network nodes. BATS codes preserve such desirable properties of fountain codes as ratelessness and low encoding/decoding complexity. The computational and storage capacities of the intermediate network nodes required for applying BATS codes are independent of the number of packets for transmission. Almost capacity-achieving BATS code schemes are devised for unicast networks, two-way relay networks, tree networks, a class of three-layer networks, and the butterfly network. For general networks, under different optimization criteria, guaranteed decoding rates for the receiving nodes can be obtained.Comment: 51 pages, 12 figures, submitted to IEEE Transactions on Information Theor

arXiv.org e-Print Archive

CiteSeerX

Evaluating techniques for parallelization tuning in MPI, OmpSs and MPI/OmpSs

Author: Subotic Vladimir
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

Parallel programming is used to partition a computational problem among multiple processing units and to define how they interact (communicate and synchronize) in order to guarantee the correct result. The performance that is achieved when executing the parallel program on a parallel architecture is usually far from the optimal: computation unbalance and excessive interaction among processing units often cause lost cycles, reducing the efficiency of parallel computation. In this thesis we propose techniques oriented to better exploit parallelism in parallel applications, with emphasis in techniques that increase asynchronism. Theoretically, this type of parallelization tuning promises multiple benefits. First, it should mitigate communication and synchronization delays, thus increasing the overall performance. Furthermore, parallelization tuning should expose additional parallelism and therefore increase the scalability of execution. Finally, increased asynchronism would provide higher tolerance to slower networks and external noise. In the first part of this thesis, we study the potential for tuning MPI parallelism. More specifically, we explore automatic techniques to overlap communication and computation. We propose a speculative messaging technique that increases the overlap and requires no changes of the original MPI application. Our technique automatically identifies the application’s MPI activity and reinterprets that activity using optimally placed non-blocking MPI requests. We demonstrate that this overlapping technique increases the asynchronism of MPI messages, maximizing the overlap, and consequently leading to execution speedup and higher tolerance to bandwidth reduction. However, in the case of realistic scientific workloads, we show that the overlapping potential is significantly limited by the pattern by which each MPI process locally operates on MPI messages. In the second part of this thesis, we study the potential for tuning hybrid MPI/OmpSs parallelism. We try to gain a better understanding of the parallelism of hybrid MPI/OmpSs applications in order to evaluate how these applications would execute on future machines and to predict the execution bottlenecks that are likely to emerge. We explore how MPI/OmpSs applications could scale on the parallel machine with hundreds of cores per node. Furthermore, we investigate how this high parallelism within each node would reflect on the network constraints. We especially focus on identifying critical code sections in MPI/OmpSs. We devised a technique that quickly evaluates, for a given MPI/OmpSs application and the selected target machine, which code section should be optimized in order to gain the highest performance benefits. Also, this thesis studies techniques to quickly explore the potential OmpSs parallelism inherent in applications. We provide mechanisms to easily evaluate potential parallelism of any task decomposition. Furthermore, we describe an iterative trialand-error approach to search for a task decomposition that will expose sufficient parallelism for a given target machine. Finally, we explore potential of automating the iterative approach by capturing the programmers’ experience into an expert system that can autonomously lead the search process. Also, throughout the work on this thesis, we designed development tools that can be useful to other researchers in the field. The most advanced of these tools is Tareador – a tool to help porting MPI applications to MPI/OmpSs programming model. Tareador provides a simple interface to propose some decomposition of a code into OmpSs tasks. Tareador dynamically calculates data dependencies among the annotated tasks, and automatically estimates the potential OmpSs parallelization. Furthermore, Tareador gives additional hints on how to complete the process of porting the application to OmpSs. Tareador already proved itself useful, by being included in the academic classes on parallel programming at UPC.La programación paralela consiste en dividir un problema de computación entre múltiples unidades de procesamiento y definir como interactúan (comunicación y sincronización) para garantizar un resultado correcto. El rendimiento de un programa paralelo normalmente está muy lejos de ser óptimo: el desequilibrio de la carga computacional y la excesiva interacción entre las unidades de procesamiento a menudo causa ciclos perdidos, reduciendo la eficiencia de la computación paralela. En esta tesis proponemos técnicas orientadas a explotar mejor el paralelismo en aplicaciones paralelas, poniendo énfasis en técnicas que incrementan el asincronismo. En teoría, estas técnicas prometen múltiples beneficios. Primero, tendrían que mitigar el retraso de la comunicación y la sincronización, y por lo tanto incrementar el rendimiento global. Además, la calibración de la paralelización tendría que exponer un paralelismo adicional, incrementando la escalabilidad de la ejecución. Finalmente, un incremente en el asincronismo proveería una tolerancia mayor a redes de comunicación lentas y ruido externo. En la primera parte de la tesis, estudiamos el potencial para la calibración del paralelismo a través de MPI. En concreto, exploramos técnicas automáticas para solapar la comunicación con la computación. Proponemos una técnica de mensajería especulativa que incrementa el solapamiento y no requiere cambios en la aplicación MPI original. Nuestra técnica identifica automáticamente la actividad MPI de la aplicación y la reinterpreta usando solicitudes MPI no bloqueantes situadas óptimamente. Demostramos que esta técnica maximiza el solapamiento y, en consecuencia, acelera la ejecución y permite una mayor tolerancia a las reducciones de ancho de banda. Aún así, en el caso de cargas de trabajo científico realistas, mostramos que el potencial de solapamiento está significativamente limitado por el patrón según el cual cada proceso MPI opera localmente en el paso de mensajes. En la segunda parte de esta tesis, exploramos el potencial para calibrar el paralelismo híbrido MPI/OmpSs. Intentamos obtener una comprensión mejor del paralelismo de aplicaciones híbridas MPI/OmpSs para evaluar de qué manera se ejecutarían en futuras máquinas. Exploramos como las aplicaciones MPI/OmpSs pueden escalar en una máquina paralela con centenares de núcleos por nodo. Además, investigamos cómo este paralelismo de cada nodo se reflejaría en las restricciones de la red de comunicación. En especia, nos concentramos en identificar secciones críticas de código en MPI/OmpSs. Hemos concebido una técnica que rápidamente evalúa, para una aplicación MPI/OmpSs dada y la máquina objetivo seleccionada, qué sección de código tendría que ser optimizada para obtener la mayor ganancia de rendimiento. También estudiamos técnicas para explorar rápidamente el paralelismo potencial de OmpSs inherente en las aplicaciones. Proporcionamos mecanismos para evaluar fácilmente el paralelismo potencial de cualquier descomposición en tareas. Además, describimos una aproximación iterativa para buscar una descomposición en tareas que mostrará el suficiente paralelismo en la máquina objetivo dada. Para finalizar, exploramos el potencial para automatizar la aproximación iterativa. En el trabajo expuesto en esta tesis hemos diseñado herramientas que pueden ser útiles para otros investigadores de este campo. La más avanzada es Tareador, una herramienta para ayudar a migrar aplicaciones al modelo de programación MPI/OmpSs. Tareador proporciona una interfaz simple para proponer una descomposición del código en tareas OmpSs. Tareador también calcula dinámicamente las dependencias de datos entre las tareas anotadas, y automáticamente estima el potencial de paralelización OmpSs. Por último, Tareador da indicaciones adicionales sobre como completar el proceso de migración a OmpSs. Tareador ya se ha mostrado útil al ser incluido en las clases de programación de la UPC

On Tunable Sparse Network Coding in Commercial Devices for Networks and Filesystems

Author: Sørensen Chres Wiant
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2017
Field of study

VBN