14 research outputs found

    Efficient Q. S support for higt-performance interconnects

    Get PDF
    Las redes de interconexión son un componente clave en un gran número de sistemas. Los mecanismos de calidad de servicio (qos) son responsables de asegurar que se alcanza un cierto rendimiento en la red. Las soluciones tradicionales para ofrecer qos en redes de interconexión de altas prestaciones normalmente se basan en arquitecturas complejas. El principal objetivo de esta tesis es investigar si podemos ofrecer mecanismos eficientes de qos. Nuestro propósito es alcanzar un soporte completo de qos con el mínimo de recursos. Para ello, se identifican redundancias en los mecanismos propuestos de qos y son eliminados sin afectar al rendimiento. Esta tesis consta de tres partes. En la primera comenzamos con las propuestas tradicionales de qos a nivel de clase de tráfico. En la segunda parte, proponemos como adaptar los mecanismos de qos basados en deadlines para redes de interconexión de altas prestaciones. Por último, también investigamos la interacción de los mecanismos de qos con el control de congestión

    SCHEDULING AND RATE PROVISIONING FOR INPUT-BUFERED CELL BASED SWITCH FABRICS

    Get PDF
    In this dissertation, we develop and analyze algorithms for scheduling in input-buffered switch fabrics. We have introduced new deterministic and randomized scheduling algorithms that are capable of rate provisioning, achieves 100% throughput and have lower complexity than other proposed solutions. We consider QoS provisioning in general and rate provisioning in particular as the basic requirements for the next generation switch fabrics. To do rate provisioning, we extend the concept of packetized tracking policies for fluid policies to the input-buffered switches. It is considered that the speed up of the switch is one and the fluid policy is feasible, i.e., utilization of all ports is less than one. For the 2x2 switches, we show that ideal non-anticipative tracking policies always exist. By ideal, we mean a tracking policy that is at most one cell behind the corresponding fluid policy. Using a 3x3 counter example, we show that non-anticipative policies do not generally exist. For the NxN switches, a heuristic tracking policy is provided. The encouraging results for the heuristic policy motivated us to explore for analytical result for its performance. This effort leads us to the introduction of maximum node contained matching (MNCM) a new class of deterministic maximal size matching algorithms. We use fluid model techniques to prove that these algorithms achieve 100% throughput with no speedup. The only assumption on the arrival pattern is that it satisfies strong law of large numbers. We also introduce a new weighted matching algorithm in MNCM, maximum first matching (MFM) with complexity O(N^{2.5}). MFM, to the best of our knowledge, is the lowest complexity deterministic algorithm that delivers 100% throughput. We extend the concept of MNCM schedulers and introduce the Maximum Size Unit Interval Matching (MSUIM) algorithm for rate provisioning. MSUIM is at most N cells behind the corresponding fluid policy. Finally, we propose a general parallel architecture for self-randomized algorithms that is appropriate for practical applications. We introduce the concept of max-min fair self-randomized scheduling algorithms for rate provisioning. Using fluid model technique, we provide analytical results for the performance of the self-randomized schedulers

    HopliteBuf FPGA Network-on-Chip: Architecture and Analysis

    Get PDF
    We can prove occupancy bounds of stall-free FIFOs used in deflection-free, low-cost, and high-speed FPGA overlay Network-on-chips (NoCs). In our work, we build on top of the HopliteRT livelock-free overlay NoC with an FPGA-friendly 2D unidirectional torus topology to propose the novel HopliteBuf NoC. In our new NoC, we strategically introduce stall-free FIFOs in the network and support these FIFOs with static analysis based on network calculus to compute FIFO occupancy, latency, and bandwidth bounds. The microarchitecture of HopliteBuf combines the performance benefits of conventional buffered NoCs (high throughput, low latency) with the cost advantages of deflection-routed NoCs (low FPGA area, high clock frequencies). Specifically, we look at two design variants of the HopliteBuf NoC: (1) Single corner-turn FIFO (W to S), and (2) Dual corner-turn FIFO (W to S+N). The single corner-turn (W to S) design is simpler and only introduces a buffering requirement for packets changing dimension from X ring to the downhill Y ring (or West to South). The dual corner-turn variant requires two FIFOs for turning packets going downhill (W to S) as well as uphill (W to N). The dual corner-turn design overcomes the mathematical analysis challenges associated with single corner-turn designs for communication workloads with cyclic dependencies between flow traversal paths at the expense of small increase in resource cost. Essentially, we resolve an analysis challenge with extra hardware resources. Across a range of 100 synthetically-generated workloads on a 5 x 5 NoC, HopliteBuf outperforms HopliteRT by 1.2-2x in terms of latency, 10% in terms of injection rate, and 30-60% in terms of flowset feasibiliy. These advantages come at the cost of 3-4x higher FPGA resource requirement for buffers and muxes. Our analysis also deliver latency bounds that are not only better than HopliteRT in absolute terms but also tighter by 2-3x allowing us to provision less hardware to meet our specifications

    DETERMINATION OF END-TO-END DELAYS OF SWITCHED ETHERNET LOCAL AREA NETWORKS

    Get PDF
    The design of switched local area networks in practice has largely been based on heuristics and experience; in fact, in many situations, no network design is carried out, but only network installation (network cabling and nodes/equipment placements). This has resulted in local area networks that are sluggish, and that fail to satisfy the users that are connected to such networks in terms of speed of uploading and downloading of information, when, a user’s computer is in a communication session with other computers or host machines that are attached to the local area network or with switching devices that connect the local area network to wide area networks. Therefore, the need to provide deterministic guarantees on the delays of packets’ flows when designing switched local area networks has led to the need for analytic and formal basis for designing such networks. This is because, if the maximum packet delay between any two nodes of a network is not known, it is impossible to provide a deterministic guarantee of worst case response time of packets’ flows. This is the problem that this research work set out to solve. A model of a packet switch was developed, with which the maximum delay for a packet to cross any N-ports packet switch can be calculated. The maximum packet delay value provided by this model was compared from the point of view of practical reality to values that were obtained from literature, and was found to be by far a more realistic value. An algorithm with which network design engineers can generate optimum network designs in terms of installed network switches and attached number of hosts while respecting specified maximum end-to-end delay constraints was developed. This work revealed that the widely held notion in the literature as regards origin-destination pairs of hosts enumeration for end-to-end delay computation appears to be wrong in the context of switched local area networks. We have for the first time shown how this enumeration should be done. It has also been empirically shown in this work that the number of hosts that can be attached to any switched local area network is actually bounded by the number of ports in the switches of which the network is composed. Computed numerical values of maximum end-to-end delays using the developed model and algorithm further revealed that the predominant cause of delay (sluggishness) in switched local area networks is the queuing delay, and not the number of users (hosts) that are connected to the networks. The fact that a switched local area network becomes slow as more users are logged on to it is as a result of the flow of bursty traffic (uploading and downloading of high-bit rates and bandwidth consuming applications). We have also implemented this work’s model and algorithms in a developed C programming language-based inter-active switched local area networks’ design application program. Further studies were recommended on the need to develop method(s) for determining the maximum amount of traffic that can arrive to a switch in a burst, on the need for the introduction of weighting function(s) in the end-to-end delay computation models; and on the need to introduce cost variables in determining the optimal Internet access device input and output rates specifications

    Approximating fluid schedules in packet-switched networks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 2004.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 145-151).We consider a problem motivated by the desire to provide exible, rate-based, quality of service guarantees for packets sent over switches and switch networks. Our focus is solving a type of on-line, traffic scheduling problem, whose input at each time step is a set of desired traffic rates through the switch network. These traffic rates in general cannot be exactly achieved since they treat the incoming data as fluid, that is, they assume arbitrarily small fractions of packets can be transmitted at each time step. The goal of the traffic scheduling problem is to closely approximate the given sequence of traffic rates by a sequence of switch uses throughout the network in which only whole packets are sent. We prove worst-case bounds on the additional delay and buffer use that result from using such an approximation. These bounds depend on the network topology, the resources available to the scheduler, and the types of fluid policy allowed.by Michael Aaron Rosenblum.Ph.D

    A time-predictable many-core processor design for critical real-time embedded systems

    Get PDF
    Critical Real-Time Embedded Systems (CRTES) are in charge of controlling fundamental parts of embedded system, e.g. energy harvesting solar panels in satellites, steering and breaking in cars, or flight management systems in airplanes. To do so, CRTES require strong evidence of correct functional and timing behavior. The former guarantees that the system operates correctly in response of its inputs; the latter ensures that its operations are performed within a predefined time budget. CRTES aim at increasing the number and complexity of functions. Examples include the incorporation of \smarter" Advanced Driver Assistance System (ADAS) functionality in modern cars or advanced collision avoidance systems in Unmanned Aerial Vehicles (UAVs). All these new features, implemented in software, lead to an exponential growth in both performance requirements and software development complexity. Furthermore, there is a strong need to integrate multiple functions into the same computing platform to reduce the number of processing units, mass and space requirements, etc. Overall, there is a clear need to increase the computing power of current CRTES in order to support new sophisticated and complex functionality, and integrate multiple systems into a single platform. The use of multi- and many-core processor architectures is increasingly seen in the CRTES industry as the solution to cope with the performance demand and cost constraints of future CRTES. Many-cores supply higher performance by exploiting the parallelism of applications while providing a better performance per watt as cores are maintained simpler with respect to complex single-core processors. Moreover, the parallelization capabilities allow scheduling multiple functions into the same processor, maximizing the hardware utilization. However, the use of multi- and many-cores in CRTES also brings a number of challenges related to provide evidence about the correct operation of the system, especially in the timing domain. Hence, despite the advantages of many-cores and the fact that they are nowadays a reality in the embedded domain (e.g. Kalray MPPA, Freescale NXP P4080, TI Keystone II), their use in CRTES still requires finding efficient ways of providing reliable evidence about the correct operation of the system. This thesis investigates the use of many-core processors in CRTES as a means to satisfy performance demands of future complex applications while providing the necessary timing guarantees. To do so, this thesis contributes to advance the state-of-the-art towards the exploitation of parallel capabilities of many-cores in CRTES contributing in two different computing domains. From the hardware domain, this thesis proposes new many-core designs that enable deriving reliable and tight timing guarantees. From the software domain, we present efficient scheduling and timing analysis techniques to exploit the parallelization capabilities of many-core architectures and to derive tight and trustworthy Worst-Case Execution Time (WCET) estimates of CRTES.Los sistemas críticos empotrados de tiempo real (en ingles Critical Real-Time Embedded Systems, CRTES) se encargan de controlar partes fundamentales de los sistemas integrados, e.g. obtención de la energía de los paneles solares en satélites, la dirección y frenado en automóviles, o el control de vuelo en aviones. Para hacerlo, CRTES requieren fuerte evidencias del correcto comportamiento funcional y temporal. El primero garantiza que el sistema funciona correctamente en respuesta de sus entradas; el último asegura que sus operaciones se realizan dentro de unos limites temporales establecidos previamente. El objetivo de los CRTES es aumentar el número y la complejidad de las funciones. Algunos ejemplos incluyen los sistemas inteligentes de asistencia a la conducción en automóviles modernos o los sistemas avanzados de prevención de colisiones en vehiculos aereos no tripulados. Todas estas nuevas características, implementadas en software,conducen a un crecimiento exponencial tanto en los requerimientos de rendimiento como en la complejidad de desarrollo de software. Además, existe una gran necesidad de integrar múltiples funciones en una sóla plataforma para así reducir el número de unidades de procesamiento, cumplir con requisitos de peso y espacio, etc. En general, hay una clara necesidad de aumentar la potencia de cómputo de los actuales CRTES para soportar nueva funcionalidades sofisticadas y complejas e integrar múltiples sistemas en una sola plataforma. El uso de arquitecturas multi- y many-core se ve cada vez más en la industria CRTES como la solución para hacer frente a la demanda de mayor rendimiento y las limitaciones de costes de los futuros CRTES. Las arquitecturas many-core proporcionan un mayor rendimiento explotando el paralelismo de aplicaciones al tiempo que proporciona un mejor rendimiento por vatio ya que los cores se mantienen más simples con respecto a complejos procesadores de un solo core. Además, las capacidades de paralelización permiten programar múltiples funciones en el mismo procesador, maximizando la utilización del hardware. Sin embargo, el uso de multi- y many-core en CRTES también acarrea ciertos desafíos relacionados con la aportación de evidencias sobre el correcto funcionamiento del sistema, especialmente en el ámbito temporal. Por eso, a pesar de las ventajas de los procesadores many-core y del hecho de que éstos son una realidad en los sitemas integrados (por ejemplo Kalray MPPA, Freescale NXP P4080, TI Keystone II), su uso en CRTES aún precisa de la búsqueda de métodos eficientes para proveer evidencias fiables sobre el correcto funcionamiento del sistema. Esta tesis ahonda en el uso de procesadores many-core en CRTES como un medio para satisfacer los requisitos de rendimiento de aplicaciones complejas mientras proveen las garantías de tiempo necesarias. Para ello, esta tesis contribuye en el avance del estado del arte hacia la explotación de many-cores en CRTES en dos ámbitos de la computación. En el ámbito del hardware, esta tesis propone nuevos diseños many-core que posibilitan garantías de tiempo fiables y precisas. En el ámbito del software, la tesis presenta técnicas eficientes para la planificación de tareas y el análisis de tiempo para aprovechar las capacidades de paralelización en arquitecturas many-core, y también para derivar estimaciones de peor tiempo de ejecución (Worst-Case Execution Time, WCET) fiables y precisas
    corecore