92 research outputs found

    Properties and algorithms of the (n, k)-arrangement graphs

    Get PDF
    The (n, k)-arrangement interconnection topology was first introduced in 1992. The (n, k )-arrangement graph is a class of generalized star graphs. Compared with the well known n-star, the (n, k )-arrangement graph is more flexible in degree and diameter. However, there are few algorithms designed for the (n, k)-arrangement graph up to present. In this thesis, we will focus on finding graph theoretical properties of the (n, k)- arrangement graph and developing parallel algorithms that run on this network. The topological properties of the arrangement graph are first studied. They include the cyclic properties. We then study the problems of communication: broadcasting and routing. Embedding problems are also studied later on. These are very useful to develop efficient algorithms on this network. We then study the (n, k )-arrangement network from the algorithmic point of view. Specifically, we will investigate both fundamental and application algorithms such as prefix sums computation, sorting, merging and basic geometry computation: finding convex hull on the (n, k )-arrangement graph. A literature review of the state-of-the-art in relation to the (n, k)-arrangement network is also provided, as well as some open problems in this area

    A Minimum Area VLSI Architecture for O(logn) Time Sorting

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryJoint Services Electronics Program / N00014-79-C-0424IBM predoctoral Fellowshi

    Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming

    Get PDF
    Faced with nearly stagnant clock speed advances, chip manufacturers have turned to parallelism as the source for continuing performance improvements. But even though numerous parallel architectures have already been brought to market, a universally accepted methodology for programming them for general purpose applications has yet to emerge. Existing solutions tend to be hardware-specific, rendering them difficult to use for the majority of application programmers and domain experts, and not providing scalability guarantees for future generations of the hardware. This dissertation advances the validation of the following thesis: it is possible to develop efficient general-purpose programs for a many-core platform using a model recognized for its simplicity. To prove this thesis, we refer to the eXplicit Multi-Threading (XMT) architecture designed and built at the University of Maryland. XMT is an attempt at re-inventing parallel computing with a solid theoretical foundation and an aggressive scalable design. Algorithmically, XMT is inspired by the PRAM (Parallel Random Access Machine) model and the architecture design is focused on reducing inter-task communication and synchronization overheads and providing an easy-to-program parallel model. This thesis builds upon the existing XMT infrastructure to improve support for efficient execution with a focus on ease-of-programming. Our contributions aim at reducing the programmer's effort in developing XMT applications and improving the overall performance. More concretely, we: (1) present a work-flow guiding programmers to produce efficient parallel solutions starting from a high-level problem; (2) introduce an analytical performance model for XMT programs and provide a methodology to project running time from an implementation; (3) propose and evaluate RAP -- an improved resource-aware compiler loop prefetching algorithm targeted at fine-grained many-core architectures; we demonstrate performance improvements of up to 34.79% on average over the GCC loop prefetching implementation and up to 24.61% on average over a simple hardware prefetching scheme; and (4) implement a number of parallel benchmarks and evaluate the overall performance of XMT relative to existing serial and parallel solutions, showing speedups of up to 13.89x vs.~ a serial processor and 8.10x vs.~parallel code optimized for an existing many-core (GPU). We also discuss the implementation and optimization of the Max-Flow algorithm on XMT, a problem which is among the more advanced in terms of complexity, benchmarking and research interest in the parallel algorithms community. We demonstrate better speed-ups compared to a best serial solution than previous attempts on other parallel platforms

    Multiple Bus Networks for Binary -Tree Algorithms.

    Get PDF
    Multiple bus networks (MBN) connect processors via buses. This dissertation addresses issues related to running binary-tree algorithms on MBNs. These algorithms are of a fundamental nature, and reduce inputs at leaves of a binary tree to a result at the root. We study the relationships between running time, degree (maximum number of connections per processor) and loading (maximum number of connections per bus). We also investigate fault-tolerance, meshes enhanced with MBNs, and VLSI layouts for binary-tree MBNs. We prove that the loading of optimal-time, degree-2, binary-tree MBNs is non-constant. In establishing this result, we derive three loading lower bounds Wn , W&parl0;n23&parr0; and W&parl0;nlogn&parr0; , each tighter than the previous one. We also show that if the degree is increased to 3, then the loading can be a constant. A constant loading degree-2 MBN exists, if the algorithm is allowed to run slower than the optimal. We introduce a new enhanced mesh architecture (employing binary-tree MBNs) that captures features of all existing enhanced meshes. This architecture is more flexible, allowing all existing enhanced mesh results to be ported to a more implementable platform. We present two methods for imparting tolerance to bus and processor faults in binary-tree MBNs. One of the methods is general, and can be used with any MBN and for both processor and bus faults. A key feature of this method is that it permits the network designer to designate a set of buses as unimportant and consider all faulty buses as unimportant. This minimizes the impact of faulty elements on the MBN. The second method is specific to bus faults in binary-tree MBNs, whose features it exploits to produce faster solutions. We also derive a series of results that distill the lower bound on the perimeter layout area of optimal-time, binary-tree MBNs to a single conjecture. Based on this we believe that optimal-time, binary-tree MBNs require no less area than a balanced tree topology even though such MBNs can reuse buses over various steps of the algorithm

    Parallel computation on sparse networks of processors

    Get PDF
    SIGLELD:D48226/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Scalable Community Detection

    Get PDF

    Radio Resource Management Optimization For Next Generation Wireless Networks

    Get PDF
    The prominent versatility of today’s mobile broadband services and the rapid advancements in the cellular phones industry have led to a tremendous expansion in the wireless market volume. Despite the continuous progress in the radio-access technologies to cope with that expansion, many challenges still remain that need to be addressed by both the research and industrial sectors. One of the many remaining challenges is the efficient allocation and management of wireless network resources when using the latest cellular radio technologies (e.g., 4G). The importance of the problem stems from the scarcity of the wireless spectral resources, the large number of users sharing these resources, the dynamic behavior of generated traffic, and the stochastic nature of wireless channels. These limitations are further tightened as the provider’s commitment to high quality-of-service (QoS) levels especially data rate, delay and delay jitter besides the system’s spectral and energy efficiencies. In this dissertation, we strive to solve this problem by presenting novel cross-layer resource allocation schemes to address the efficient utilization of available resources versus QoS challenges using various optimization techniques. The main objective of this dissertation is to propose a new predictive resource allocation methodology using an agile ray tracing (RT) channel prediction approach. It is divided into two parts. The first part deals with the theoretical and implementational aspects of the ray tracing prediction model, and its validation. In the second part, a novel RT-based scheduling system within the evolving cloud radio access network (C-RAN) architecture is proposed. The impact of the proposed model on addressing the long term evolution (LTE) network limitations is then rigorously investigated in the form of optimization problems. The main contributions of this dissertation encompass the design of several heuristic solutions based on our novel RT-based scheduling model, developed to meet the aforementioned objectives while considering the co-existing limitations in the context of LTE networks. Both analytical and numerical methods are used within this thesis framework. Theoretical results are validated with numerical simulations. The obtained results demonstrate the effectiveness of our proposed solutions to meet the objectives subject to limitations and constraints compared to other published works

    System Architectures for Cooperative Teams of Unmanned Aerial Vehicles Interacting Physically with the Environment

    Get PDF
    Unmanned Aerial Vehicles (UAVs) have become quite a useful tool for a wide range of applications, from inspection & maintenance to search & rescue, among others. The capabilities of a single UAV can be extended or complemented by the deployment of more UAVs, so multi-UAV cooperative teams are becoming a trend. In that case, as di erent autopilots, heterogeneous platforms, and application-dependent software components have to be integrated, multi-UAV system architectures that are fexible and can adapt to the team's needs are required. In this thesis, we develop system architectures for cooperative teams of UAVs, paying special attention to applications that require physical interaction with the environment, which is typically unstructured. First, we implement some layers to abstract the high-level components from the hardware speci cs. Then we propose increasingly advanced architectures, from a single-UAV hierarchical navigation architecture to an architecture for a cooperative team of heterogeneous UAVs. All this work has been thoroughly tested in both simulation and eld experiments in di erent challenging scenarios through research projects and robotics competitions. Most of the applications required physical interaction with the environment, mainly in unstructured outdoors scenarios. All the know-how and lessons learned throughout the process are shared in this thesis, and all relevant code is publicly available.Los vehículos aéreos no tripulados (UAVs, del inglés Unmanned Aerial Vehicles) se han convertido en herramientas muy valiosas para un amplio espectro de aplicaciones, como inspección y mantenimiento, u operaciones de rescate, entre otras. Las capacidades de un único UAV pueden verse extendidas o complementadas al utilizar varios de estos vehículos simultáneamente, por lo que la tendencia actual es el uso de equipos cooperativos con múltiples UAVs. Para ello, es fundamental la integración de diferentes autopilotos, plataformas heterogéneas, y componentes software -que dependen de la aplicación-, por lo que se requieren arquitecturas multi-UAV que sean flexibles y adaptables a las necesidades del equipo. En esta tesis, se desarrollan arquitecturas para equipos cooperativos de UAVs, prestando una especial atención a aplicaciones que requieran de interacción física con el entorno, cuya naturaleza es típicamente no estructurada. Primero se proponen capas para abstraer a los componentes de alto nivel de las particularidades del hardware. Luego se desarrollan arquitecturas cada vez más avanzadas, desde una arquitectura de navegación para un único UAV, hasta una para un equipo cooperativo de UAVs heterogéneos. Todo el trabajo ha sido minuciosamente probado, tanto en simulación como en experimentos reales, en diferentes y complejos escenarios motivados por proyectos de investigación y competiciones de robótica. En la mayoría de las aplicaciones se requería de interacción física con el entorno, que es normalmente un escenario en exteriores no estructurado. A lo largo de la tesis, se comparten todo el conocimiento adquirido y las lecciones aprendidas en el proceso, y el código relevante está publicado como open-source

    Towards Scalable Characterization of Noisy, Intermediate-Scale Quantum Information Processors

    Get PDF
    In recent years, quantum information processors (QIPs) have grown from one or two qubits to tens of qubits. As a result, characterizing QIPs – measuring how well they work, and how they fail – has become much more challenging. The obstacles to characterizing today’s QIPs will grow even more difficult as QIPs grow from tens of qubits to hundreds, and enter what has been called the “noisy, intermediate-scale quantum” (NISQ) era. This thesis develops methods based on advanced statistics and machine learning algorithms to address the difficulties of “quantum character- ization, validation, and verification” (QCVV) of NISQ processors. In the first part of this thesis, I use statistical model selection to develop techniques for choosing between several models for a QIPs behavior. In the second part, I deploy machine learning algorithms to develop a new QCVV technique and to do experiment design. These investigations help lay a foundation for extending QCVV to characterize the next generation of NISQ processors
    corecore