1,940 research outputs found

    Parallel local search for solving Constraint Problems on the Cell Broadband Engine (Preliminary Results)

    Full text link
    We explore the use of the Cell Broadband Engine (Cell/BE for short) for combinatorial optimization applications: we present a parallel version of a constraint-based local search algorithm that has been implemented on a multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16 SPUs per blade). This algorithm was chosen because it fits very well the Cell/BE architecture and requires neither shared memory nor communication between processors, while retaining a compact memory footprint. We study the performance on several large optimization benchmarks and show that this achieves mostly linear time speedups, even sometimes super-linear. This is possible because the parallel implementation might explore simultaneously different parts of the search space and therefore converge faster towards the best sub-space and thus towards a solution. Besides getting speedups, the resulting times exhibit a much smaller variance, which benefits applications where a timely reply is critical

    VirtFogSim: A parallel toolbox for dynamic energy-delay performance testing and optimization of 5G Mobile-Fog-Cloud virtualized platforms

    Get PDF
    It is expected that the pervasive deployment of multi-tier 5G-supported Mobile-Fog-Cloudtechnological computing platforms will constitute an effective means to support the real-time execution of future Internet applications by resource- and energy-limited mobile devices. Increasing interest in this emerging networking-computing technology demands the optimization and performance evaluation of several parts of the underlying infrastructures. However, field trials are challenging due to their operational costs, and in every case, the obtained results could be difficult to repeat and customize. These emergingMobile-Fog-Cloud ecosystems still lack, indeed, customizable software tools for the performance simulation of their computing-networking building blocks. Motivated by these considerations, in this contribution, we present VirtFogSim. It is aMATLAB-supported software toolbox that allows the dynamic joint optimization and tracking of the energy and delay performance of Mobile-Fog-Cloud systems for the execution of applications described by general Directed Application Graphs (DAGs). In a nutshell, the main peculiar features of the proposed VirtFogSim toolbox are that: (i) it allows the joint dynamic energy-aware optimization of the placement of the application tasks and the allocation of the needed computing-networking resources under hard constraints on acceptable overall execution times, (ii) it allows the repeatable and customizable simulation of the resulting energy-delay performance of the overall system; (iii) it allows the dynamic tracking of the performed resource allocation under time-varying operational environments, as those typically featuring mobile applications; (iv) it is equipped with a user-friendly Graphic User Interface (GUI) that supports a number of graphic formats for data rendering, and (v) itsMATLAB code is optimized for running atop multi-core parallel execution platforms. To check both the actual optimization and scalability capabilities of the VirtFogSim toolbox, a number of experimental setups featuring different use cases and operational environments are simulated, and their performances are compared

    Running stream-like programs on heterogeneous multi-core systems

    Get PDF
    All major semiconductor companies are now shipping multi-cores. Phones, PCs, laptops, and mobile internet devices will all require software that can make effective use of these cores. Writing high-performance parallel software is difficult, time-consuming and error prone, increasing both time-to-market and cost. Software outlives hardware; it typically takes longer to develop new software than hardware, and legacy software tends to survive for a long time, during which the number of cores per system will increase. Development and maintenance productivity will be improved if parallelism and technical details are managed by the machine, while the programmer reasons about the application as a whole. Parallel software should be written using domain-specific high-level languages or extensions. These languages reveal implicit parallelism, which would be obscured by a sequential language such as C. When memory allocation and program control are managed by the compiler, the program's structure and data layout can be safely and reliably modified by high-level compiler transformations. One important application domain contains so-called stream programs, which are structured as independent kernels interacting only through one-way channels, called streams. Stream programming is not applicable to all programs, but it arises naturally in audio and video encode and decode, 3D graphics, and digital signal processing. This representation enables high-level transformations, including kernel unrolling and kernel fusion. This thesis develops new compiler and run-time techniques for stream programming. The first part of the thesis is concerned with a statically scheduled stream compiler. It introduces a new static partitioning algorithm, which determines which kernels should be fused, in order to balance the loads on the processors and interconnects. A good partitioning algorithm is crucial if the compiler is to produce efficient code. The algorithm also takes account of downstream compiler passes---specifically software pipelining and buffer allocation---and it models the compiler's ability to fuse kernels. The latter is important because the compiler may not be able to fuse arbitrary collections of kernels. This thesis also introduces a static queue sizing algorithm. This algorithm is important when memory is distributed, especially when local stores are small. The algorithm takes account of latencies and variations in computation time, and is constrained by the sizes of the local memories. The second part of this thesis is concerned with dynamic scheduling of stream programs. First, it investigates the performance of known online, non-preemptive, non-clairvoyant dynamic schedulers. Second, it proposes two dynamic schedulers for stream programs. The first is specifically for one-dimensional stream programs. The second is more general: it does not need to be told the stream graph, but it has slightly larger overhead. This thesis also introduces some support tools related to stream programming. StarssCheck is a debugging tool, based on Valgrind, for the StarSs task-parallel programming language. It generates a warning whenever the program's behaviour contradicts a pragma annotation. Such behaviour could otherwise lead to exceptions or race conditions. StreamIt to OmpSs is a tool to convert a streaming program in the StreamIt language into a dynamically scheduled task based program using StarSs.Totes les empreses de semiconductors produeixen actualment multi-cores. Mòbils,PCs, portàtils, i dispositius mòbils d’Internet necessitaran programari quefaci servir eficientment aquests cores. Escriure programari paral·lel d’altrendiment és difícil, laboriós i propens a errors, incrementant tant el tempsde llançament al mercat com el cost. El programari té una vida més llarga queel maquinari; típicament pren més temps desenvolupar nou programi que noumaquinari, i el programari ja existent pot perdurar molt temps, durant el qualel nombre de cores dels sistemes incrementarà. La productivitat dedesenvolupament i manteniment millorarà si el paral·lelisme i els detallstècnics són gestionats per la màquina, mentre el programador raona sobre elconjunt de l’aplicació.El programari paral·lel hauria de ser escrit en llenguatges específics deldomini. Aquests llenguatges extrauen paral·lelisme implícit, el qual és ocultatper un llenguatge seqüencial com C. Quan l’assignació de memòria i lesestructures de control són gestionades pel compilador, l’estructura iorganització de dades del programi poden ser modificades de manera segura ifiable per les transformacions d’alt nivell del compilador.Un dels dominis de l’aplicació importants és el que consta dels programes destream; aquest programes són estructurats com a nuclis independents queinteractuen només a través de canals d’un sol sentit, anomenats streams. Laprogramació de streams no és aplicable a tots els programes, però sorgeix deforma natural en la codificació i descodificació d’àudio i vídeo, gràfics 3D, iprocessament de senyals digitals. Aquesta representació permet transformacionsd’alt nivell, fins i tot descomposició i fusió de nucli.Aquesta tesi desenvolupa noves tècniques de compilació i sistemes en tempsd’execució per a programació de streams. La primera part d’aquesta tesi esfocalitza amb un compilador de streams de planificació estàtica. Presenta unnou algorisme de partició estàtica, que determina quins nuclis han de serfusionats, per tal d’equilibrar la càrrega en els processadors i en lesinterconnexions. Un bon algorisme de particionat és fonamental per tal de queel compilador produeixi codi eficient. L’algorisme també té en compte elspassos de compilació subseqüents---específicament software pipelining il’arranjament de buffers---i modela la capacitat del compilador per fusionarnuclis. Aquesta tesi també presenta un algorisme estàtic de redimensionament de cues.Aquest algorisme és important quan la memòria és distribuïda, especialment quanles memòries locals són petites. L’algorisme té en compte latències ivariacions en els temps de càlcul, i considera el límit imposat per la mida deles memòries locals.La segona part d’aquesta tesi es centralitza en la planificació dinàmica deprogrames de streams. En primer lloc, investiga el rendiment dels planificadorsdinàmics online, non-preemptive i non-clairvoyant. En segon lloc, proposa dosplanificadors dinàmics per programes de stream. El primer és específicament pera programes de streams unidimensionals. El segon és més general: no necessitael graf de streams, però els overheads són una mica més grans.Aquesta tesi també presenta un conjunt d’eines de suport relacionades amb laprogramació de streams. StarssCheck és una eina de depuració, que és basa enValgrind, per StarSs, un llenguatge de programació paral·lela basat en tasques.Aquesta eina genera un avís cada vegada que el comportament del programa estàen contradicció amb una anotació pragma. Aquest comportament d’una altra manerapodria causar excepcions o situacions de competició. StreamIt to OmpSs és unaeina per convertir un programa de streams codificat en el llenguatge StreamIt aun programa de tasques en StarSs planificat de forma dinàmica.Postprint (published version

    Radio Resource Management Optimization For Next Generation Wireless Networks

    Get PDF
    The prominent versatility of today’s mobile broadband services and the rapid advancements in the cellular phones industry have led to a tremendous expansion in the wireless market volume. Despite the continuous progress in the radio-access technologies to cope with that expansion, many challenges still remain that need to be addressed by both the research and industrial sectors. One of the many remaining challenges is the efficient allocation and management of wireless network resources when using the latest cellular radio technologies (e.g., 4G). The importance of the problem stems from the scarcity of the wireless spectral resources, the large number of users sharing these resources, the dynamic behavior of generated traffic, and the stochastic nature of wireless channels. These limitations are further tightened as the provider’s commitment to high quality-of-service (QoS) levels especially data rate, delay and delay jitter besides the system’s spectral and energy efficiencies. In this dissertation, we strive to solve this problem by presenting novel cross-layer resource allocation schemes to address the efficient utilization of available resources versus QoS challenges using various optimization techniques. The main objective of this dissertation is to propose a new predictive resource allocation methodology using an agile ray tracing (RT) channel prediction approach. It is divided into two parts. The first part deals with the theoretical and implementational aspects of the ray tracing prediction model, and its validation. In the second part, a novel RT-based scheduling system within the evolving cloud radio access network (C-RAN) architecture is proposed. The impact of the proposed model on addressing the long term evolution (LTE) network limitations is then rigorously investigated in the form of optimization problems. The main contributions of this dissertation encompass the design of several heuristic solutions based on our novel RT-based scheduling model, developed to meet the aforementioned objectives while considering the co-existing limitations in the context of LTE networks. Both analytical and numerical methods are used within this thesis framework. Theoretical results are validated with numerical simulations. The obtained results demonstrate the effectiveness of our proposed solutions to meet the objectives subject to limitations and constraints compared to other published works

    SWAPHI: Smith-Waterman Protein Database Search on Xeon Phi Coprocessors

    Full text link
    The maximal sensitivity of the Smith-Waterman (SW) algorithm has enabled its wide use in biological sequence database search. Unfortunately, the high sensitivity comes at the expense of quadratic time complexity, which makes the algorithm computationally demanding for big databases. In this paper, we present SWAPHI, the first parallelized algorithm employing Xeon Phi coprocessors to accelerate SW protein database search. SWAPHI is designed based on the scale-and-vectorize approach, i.e. it boosts alignment speed by effectively utilizing both the coarse-grained parallelism from the many co-processing cores (scale) and the fine-grained parallelism from the 512-bit wide single instruction, multiple data (SIMD) vectors within each core (vectorize). By searching against the large UniProtKB/TrEMBL protein database, SWAPHI achieves a performance of up to 58.8 billion cell updates per second (GCUPS) on one coprocessor and up to 228.4 GCUPS on four coprocessors. Furthermore, it demonstrates good parallel scalability on varying number of coprocessors, and is also superior to both SWIPE on 16 high-end CPU cores and BLAST+ on 8 cores when using four coprocessors, with the maximum speedup of 1.52 and 1.86, respectively. SWAPHI is written in C++ language (with a set of SIMD intrinsics), and is freely available at http://swaphi.sourceforge.net.Comment: A short version of this paper has been accepted by the IEEE ASAP 2014 conferenc

    SANTO: Social Aerial NavigaTion in Outdoors

    Get PDF
    In recent years, the advances in remote connectivity, miniaturization of electronic components and computing power has led to the integration of these technologies in daily devices like cars or aerial vehicles. From these, a consumer-grade option that has gained popularity are the drones or unmanned aerial vehicles, namely quadrotors. Although until recently they have not been used for commercial applications, their inherent potential for a number of tasks where small and intelligent devices are needed is huge. However, although the integrated hardware has advanced exponentially, the refinement of software used for these applications has not beet yet exploited enough. Recently, this shift is visible in the improvement of common tasks in the field of robotics, such as object tracking or autonomous navigation. Moreover, these challenges can become bigger when taking into account the dynamic nature of the real world, where the insight about the current environment is constantly changing. These settings are considered in the improvement of robot-human interaction, where the potential use of these devices is clear, and algorithms are being developed to improve this situation. By the use of the latest advances in artificial intelligence, the human brain behavior is simulated by the so-called neural networks, in such a way that computing system performs as similar as possible as the human behavior. To this end, the system does learn by error which, in an akin way to the human learning, requires a set of previous experiences quite considerable, in order for the algorithm to retain the manners. Applying these technologies to robot-human interaction do narrow the gap. Even so, from a bird's eye, a noticeable time slot used for the application of these technologies is required for the curation of a high-quality dataset, in order to ensure that the learning process is optimal and no wrong actions are retained. Therefore, it is essential to have a development platform in place to ensure these principles are enforced throughout the whole process of creation and optimization of the algorithm. In this work, multiple already-existing handicaps found in pipelines of this computational gauge are exposed, approaching each of them in a independent and simple manner, in such a way that the solutions proposed can be leveraged by the maximum number of workflows. On one side, this project concentrates on reducing the number of bugs introduced by flawed data, as to help the researchers to focus on developing more sophisticated models. On the other side, the shortage of integrated development systems for this kind of pipelines is envisaged, and with special care those using simulated or controlled environments, with the goal of easing the continuous iteration of these pipelines.Thanks to the increasing popularity of drones, the research and development of autonomous capibilities has become easier. However, due to the challenge of integrating multiple technologies, the available software stack to engage this task is restricted. In this thesis, we accent the divergencies among unmanned-aerial-vehicle simulators and propose a platform to allow faster and in-depth prototyping of machine learning algorithms for this drones
    corecore