98 research outputs found
Lower bounds for dilation, wirelength, and edge congestion of embedding graphs into hypercubes
Interconnection networks provide an effective mechanism for exchanging data
between processors in a parallel computing system. One of the most efficient
interconnection networks is the hypercube due to its structural regularity,
potential for parallel computation of various algorithms, and the high degree
of fault tolerance. Thus it becomes the first choice of topological structure
of parallel processing and computing systems. In this paper, lower bounds for
the dilation, wirelength, and edge congestion of an embedding of a graph into a
hypercube are proved. Two of these bounds are expressed in terms of the
bisection width. Applying these results, the dilation and wirelength of
embedding of certain complete multipartite graphs, folded hypercubes, wheels,
and specific Cartesian products are computed
Non-minimal adaptive routing for efficient interconnection networks
RESUMEN: La red de interconexión es un concepto clave de los sistemas de computación paralelos. El primer aspecto que define una red de interconexión es su topología. Habitualmente, las redes escalables y eficientes en términos de coste y consumo energético tienen bajo diámetro y se basan en topologías que encaran el límite de Moore y en las que no hay diversidad de caminos mínimos. Una vez definida la topología, quedando implícitamente definidos los límites de rendimiento de la red, es necesario diseñar un algoritmo de enrutamiento que se acerque lo máximo posible a esos límites y debido a la ausencia de caminos mínimos, este además debe explotar los caminos no mínimos cuando el tráfico es adverso. Estos algoritmos de enrutamiento habitualmente seleccionan entre rutas mínimas y no mínimas en base a las condiciones de la red. Las rutas no mínimas habitualmente se basan en el algoritmo de balanceo de carga propuesto por Valiant, esto implica que doblan la longitud de las rutas mínimas y por lo tanto, la latencia soportada por los paquetes se incrementa. En cuanto a la tecnología, desde su introducción en entornos HPC a principios de los años 2000, Ethernet ha sido usado en un porcentaje representativo de los sistemas.
Esta tesis introduce una implementación realista y competitiva de una red escalable y sin pérdidas basada en dispositivos de red Ethernet commodity, considerando topologías de bajo diámetro y bajo consumo energético y logrando un ahorro energético de hasta un 54%. Además, propone un enrutamiento sobre la citada arquitectura, en adelante QCN-Switch, el cual selecciona entre rutas mínimas y no mínimas basado en notificaciones de congestión explícitas. Una vez implementada la decisión de enrutar siguiendo rutas no mínimas, se introduce un enrutamiento adaptativo en fuente capaz de adaptar el número de saltos en las rutas no mínimas. Este enrutamiento, en adelante ACOR, es agnóstico de la topología y mejora la latencia en hasta un 28%. Finalmente, se introduce un enrutamiento dependiente de la topología, en adelante LIAN, que optimiza el número de saltos de las rutas no mínimas basado en las condiciones de la red. Los resultados de su evaluación muestran que obtiene una latencia cuasi óptima y mejora el rendimiento de algoritmos de enrutamiento actuales reduciendo la latencia en hasta un 30% y obteniendo un rendimiento estable y equitativo.ABSTRACT: Interconnection network is a key concept of any parallel computing system. The first aspect to define an interconnection network is its topology. Typically, power and cost-efficient scalable networks with low diameter rely on topologies that approach the Moore bound in which there is no minimal path diversity. Once the topology is defined, the performance bounds of the network are determined consequently, so a suitable routing algorithm should be designed to accomplish as much as possible of those limits and, due to the lack of minimal path diversity, it must exploit non-minimal paths when the traffic pattern is adversarial. These routing algorithms usually select between minimal and non-minimal paths based on the network conditions, where the non-minimal paths are built according to Valiant load-balancing algorithm. This implies that these paths double the length of minimal ones and then the latency supported by packets increases. Regarding the technology, from its introduction in HPC systems in the early 2000s, Ethernet has been used in a significant fraction of the systems.
This dissertation introduces a realistic and competitive implementation of a scalable lossless Ethernet network for HPC environments considering low-diameter and low-power topologies. This allows for up to 54% power savings. Furthermore, it proposes a routing upon the cited architecture, hereon QCN-Switch, which selects between minimal and non-minimal paths per packet based on explicit congestion notifications instead of credits. Once the miss-routing decision is implemented, it introduces two mechanisms regarding the selection of the intermediate switch to develop a source adaptive routing algorithm capable of adapting the number of hops in the non-minimal paths. This routing, hereon ACOR, is topology-agnostic and improves average latency in all cases up to 28%. Finally, a topology-dependent routing, hereon LIAN, is introduced to optimize the number of hops in the non-minimal paths based on the network live conditions. Evaluations show that LIAN obtains almost-optimal latency and outperforms state-of-the-art adaptive routing algorithms, reducing latency by up to 30.0% and providing stable throughput and fairness.This work has been supported by the Spanish Ministry of Education, Culture and Sports
under grant FPU14/02253, the Spanish Ministry of Economy, Industry and Competitiveness
under contracts TIN2010-21291-C02-02, TIN2013-46957-C2-2-P, and TIN2013-46957-C2-2-P (AEI/FEDER, UE), the Spanish Research Agency under contract PID2019-105660RBC22/AEI/10.13039/501100011033, the European Union under agreements FP7-ICT-2011-
7-288777 (Mont-Blanc 1) and FP7-ICT-2013-10-610402 (Mont-Blanc 2), the University of
Cantabria under project PAR.30.P072.64004, and by the European HiPEAC Network of Excellence through an internship grant supported by the European Union’s Horizon 2020 research
and innovation program under grant agreement No. H2020-ICT-2015-687689
The connection machine
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1988.Bibliography: leaves 134-157.by William Daniel Hillis.Ph.D
Mapping the Diffuse Universe: Integral Field Spectroscopy of Galaxy Environments
The population of galaxies we see today is the result of billions of years of gas inflows, outflows, mergers, and feedback. To develop any holistic picture of the origin and evolution of galaxies, we thus need to understand their environments. The circumgalactic and intergalactic media (CGM and IGM) - the gas around and between galaxies, respectively - represent a large part of this environment. However, this gas is extremely faint and thus difficult to observe, and only recently have we been able to image it directly. This thesis presents instrumental and observational work focused on revealing galaxy environments in the early universe.
Chapter 1 presents a brief history of our understanding of galaxies and an overview of our current picture of galaxy formation, including the role played by galaxy environments. In particular, it focuses on presenting the evolution of baryonic structures within a cosmological density field dominated by dark matter.
Chapter 2 presents instrumental work on the Keck Cosmic Web Imager (KCWI, Morrissey et al. 2018), a new integral field spectrograph (IFS) for the Keck-2 10m telescope designed to study faint, extended emission. As an introduction, I discuss the advantages and disadvantages of integral field spectroscopy for the application of studying galaxy environments, as well as an overview of the prototype instrument - the Palomar Cosmic Web Imager (PCWI, Matuszewski et al. 2010). This chapter focuses primarily on engineering work during the development and testing of KCWI, though I conclude with a brief comparison of PCWI and KCWI performance in measuring the CGM around a high-redshift QSO.
Chapter 3 presents the development of a software package designed to extract and analyze faint, extended emission in PCWI and KCWI data: CWITools. Although software is often an afterthought in astronomical and observational work, it is likely to become a primary barrier to conducting large IFS surveys of the CGM and IGM. This semi-automated analysis pipeline is presented and released publicly to empower future PCWI and KCWI studies.
Chapter 4 presents the FLASHES (Fluorescent Lyman-α Structures in High-z Environments) pilot survey, published as O'Sullivan et al. 2020. The FLASHES pilot survey is an IFS study of extended HI Lyman-α emission in the environments of 48 z = 2.3 - 3.0 QSOs. The FLASHES Survey is the core project of this thesis, enabled by the instrumentation in Chapter 2 and the analysis pipeline developed in Chapter 3. The pilot survey represents the first statistically significant (N ≳ 30) sample of direct CGM observations in its redshift range. As such, it provides the first direct constraints on the 2D morphology, surface brightness profiles, and spatially resolved kinematics of the CGM during this period.
Chapter 5 presents the first FLASHES follow-up study; deep IFS observations targeting extended Lyα 1216Å, NV 1240Å, CIV 1549Å, and HeII 1640Å emission from a subset of FLASHES pilot targets (O'Sullivan et al., in prep). Emission from metals in the CGM is expected to be an order of magnitude or more fainter than its Lyα, yet is a crucial ingredient in understanding the composition of the gas. Detecting this emission still requires multiple hours on 10m class telescopes. As such, large surveys of the multi-phase CGM remain extremely difficult to conduct. In this chapter, I present detections and upper limits of CGM metal emission around 8 FLASHES targets.
Chapter 6 presents engineering work on FIREBall-2 (the Faint Intergalactic Redshifted Emission Balloon, second generation), a high-altitude UV telescope and IFS targeting CGM emission in the low-redshift universe (z ≃ 0.7). FIREBall-2 is an ambitious project deploying a novel, electron-multiplying CCD designed to achieve ≳ 50% quantum efficiency in the UV. This technology represents an order of magnitude increase in sensitivity from the microchannel plates used in the GALEX space telescope. FIREBall-2 serves as both an observational project in its own right, studying the low-z CGM, and a pathfinder mission for future UV space missions.
Finally, Chapter 7 summarizes the contributions from this thesis and present a brief outlook on a few topics related to observations of galaxy environments.</p
Performance analysis of wormhole routing in multicomputer interconnection networks
Perhaps the most critical component in determining the ultimate performance potential of a multicomputer is its interconnection network, the hardware fabric supporting communication among individual processors. The message latency and throughput of such a network are affected by many factors of which topology, switching method, routing algorithm and traffic load are the most significant. In this context, the present study focuses on a performance analysis of k-ary n-cube networks employing wormhole switching, virtual channels and adaptive routing, a scenario of especial interest to current research.
This project aims to build upon earlier work in two main ways: constructing new analytical models for k-ary n-cubes, and comparing the performance merits of cubes of different dimensionality. To this end, some important topological properties of k-ary n-cubes are explored initially; in particular, expressions are derived to calculate the number of nodes at/within a given distance from a chosen centre. These results are important in their own right but their primary significance here is to assist in the construction of new and more realistic analytical models of wormhole-routed k-ary n-cubes.
An accurate analytical model for wormhole-routed k-ary n-cubes with adaptive routing and uniform traffic is then developed, incorporating the use of virtual channels and the effect of locality in the traffic pattern. New models are constructed for wormhole k-ary n-cubes, with the ability to simulate behaviour under adaptive routing and non-uniform communication workloads, such as hotspot traffic, matrix-transpose and digit-reversal permutation patterns. The models are equally applicable to unidirectional and bidirectional k-ary n-cubes and are significantly more realistic than any in use up to now. With this level of accuracy, the effect of each important network parameter on the overall network performance can be investigated in a more comprehensive manner than before.
Finally, k-ary n-cubes of different dimensionality are compared using the new models. The comparison takes account of various traffic patterns and implementation costs, using both pin-out and bisection bandwidth as metrics. Networks with both normal and pipelined channels are considered. While previous similar studies have only taken account of network channel costs, our model incorporates router costs as well thus generating more realistic results. In fact the results of this work differ markedly from those yielded by earlier studies which assumed deterministic routing and uniform traffic, illustrating the importance of using accurate models to conduct such analyses
Efficient Q. S support for higt-performance interconnects
Las redes de interconexión son un componente clave en un gran número de sistemas. Los mecanismos de calidad de servicio (qos) son responsables de asegurar que se alcanza un cierto rendimiento en la red.
Las soluciones tradicionales para ofrecer qos en redes de interconexión de altas prestaciones normalmente se basan en arquitecturas complejas. El principal objetivo de esta tesis es investigar si podemos ofrecer mecanismos eficientes de qos. Nuestro propósito es alcanzar un soporte completo de qos con el mínimo de recursos. Para ello, se identifican redundancias en los mecanismos propuestos de qos y son eliminados sin afectar al rendimiento.
Esta tesis consta de tres partes. En la primera comenzamos con las propuestas tradicionales de qos a nivel de clase de tráfico. En la segunda parte, proponemos como adaptar los mecanismos de qos basados en deadlines para redes de interconexión de altas prestaciones. Por último, también investigamos la interacción de los mecanismos de qos con el control de congestión
Distributed coordination in unstructured intelligent agent societies
Current research on multi-agent coordination and distributed problem
solving is still not robust or scalable enough to build large real-world
collaborative agent societies because it relies on either centralised components
with full knowledge of the domain or pre-defined social structures.
Our approach allows overcoming these limitations by using
a generic coordination framework for distributed problem solving on
totally unstructured environments that enables each agent to decompose
problems into sub-problems, identify those which it can solve
and search for other agents to delegate the sub-problems for which it
does not have the necessary knowledge or resources. Regarding the
problem decomposition process, we have developed two distributed
versions of the Graphplan planning algorithm. To allow an agent
to discover other agents with the necessary skills for dealing with
unsolved sub-problems, we have created two peer-to-peer search algorithms
that build and maintain a semantic overlay network that
connects agents relying on dependency relationships, which improves
future searches. Our approach was evaluated using two different scenarios,
which allowed us to conclude that it is efficient, scalable and
robust, allowing the coordinated distributed solving of complex problems
in unstructured environments without the unacceptable assumptions
of alternative approaches developed thus far.As abordagens actuais de coordenação multi-agente e resolução distribuída de problemas não são suficientemente robustas ou escaláveis
para criar sociedades de agentes colaborativos uma vez que assentam
ou em componentes centralizados com total conhecimento do
domínio ou em estruturas sociais pré-definidas. A nossa abordagem
permite superar estas limitações através da utilização de um algoritmo
genérico de coordenação de resolução distribuída de problemas
em ambientes totalmente não estruturados, o qual permite a cada
agente decompor problemas em sub-problemas, identificar aqueles que
consegue resolver e procurar outros agentes a quem delegar os subproblemas
para os quais não tem conhecimento suficiente. Para a
decomposição de problemas, criámos duas versões distribuídas do algoritmo
de planeamento Graphplan. Para procurar os agentes com as
capacidades necessárias à resolução das partes não resolvidas do problema,
criámos dois algoritmos de procura que constroem e mantêm
uma camada de rede semântica que relaciona agentes dependentes
com o fim de facilitar as procuras. A nossa abordagem foi avaliada
em dois cenários diferentes, o que nos permitiu concluir que ´e uma
abordagem eficiente, escalável e robusta, possibilitando a resolução
distribuída e coordenada de problemas complexos em ambientes não
estruturados sem os pressupostos inaceitáveis em que assentava o trabalho
feito até agora
- …