12 research outputs found
Recommended from our members
Spider: An overview of an object-oriented distributed computing system
The Spider Project is an object-oriented distributed system which provides a testbed for researchers in the Department of Computer Science, CSUSB, to conduct research on distributed systems
Recommended from our members
The Grand Challenge of Managing the Petascale Facility.
This report is the result of a study of networks and how they may need to evolve to support petascale leadership computing and science. As Dr. Ray Orbach, director of the Department of Energy's Office of Science, says in the spring 2006 issue of SciDAC Review, 'One remarkable example of growth in unexpected directions has been in high-end computation'. In the same article Dr. Michael Strayer states, 'Moore's law suggests that before the end of the next cycle of SciDAC, we shall see petaflop computers'. Given the Office of Science's strong leadership and support for petascale computing and facilities, we should expect to see petaflop computers in operation in support of science before the end of the decade, and DOE/SC Advanced Scientific Computing Research programs are focused on making this a reality. This study took its lead from this strong focus on petascale computing and the networks required to support such facilities, but it grew to include almost all aspects of the DOE/SC petascale computational and experimental science facilities, all of which will face daunting challenges in managing and analyzing the voluminous amounts of data expected. In addition, trends indicate the increased coupling of unique experimental facilities with computational facilities, along with the integration of multidisciplinary datasets and high-end computing with data-intensive computing; and we can expect these trends to continue at the petascale level and beyond. Coupled with recent technology trends, they clearly indicate the need for including capability petascale storage, networks, and experiments, as well as collaboration tools and programming environments, as integral components of the Office of Science's petascale capability metafacility. The objective of this report is to recommend a new cross-cutting program to support the management of petascale science and infrastructure. The appendices of the report document current and projected DOE computation facilities, science trends, and technology trends, whose combined impact can affect the manageability and stewardship of DOE's petascale facilities. This report is not meant to be all-inclusive. Rather, the facilities, science projects, and research topics presented are to be considered examples to clarify a point
Parallel Programming with Migratable Objects: Charm++ in Practice
The advent of petascale computing has introduced new challenges (e.g. Heterogeneity, system failure) for programming scalable parallel applications. Increased complexity and dynamism in science and engineering applications of today have further exacerbated the situation. Addressing these challenges requires more emphasis on concepts that were previously of secondary importance, including migratability, adaptivity, and runtime system introspection. In this paper, we leverage our experience with these concepts to demonstrate their applicability and efficacy for real world applications. Using the CHARM++ parallel programming framework, we present details on how these concepts can lead to development of applications that scale irrespective of the rough landscape of supercomputing technology. Empirical evaluation presented in this paper spans many miniapplications and real applications executed on modern supercomputers including Blue Gene/Q, Cray XE6, and Stampede
Improving Energy and Area Scalability of the Cache Hierarchy in CMPs
As the core counts increase in each chip multiprocessor generation, CMPs should improve scalability in performance, area, and energy consumption to meet the demands of
larger core counts. Directory-based protocols constitute the most scalable alternative.
A conventional directory, however, suffers from an inefficient use of storage and energy.
First, the large, non-scalable, sharer vectors consume unnecessary area and leakage, especially considering that most of the blocks tracked in a directory are cached by a single
core. Second, although increasing directory size and associativity could boost system
performance by reducing the coverage misses, it would come at the expense of area and
energy consumption.
This thesis focuses and exploits the important differences of behavior between private
and shared blocks from the directory point of view. These differences claim for a separate
management of both types of blocks at the directory. First, we propose the PS-Directory,
a two-level directory cache that keeps the reduced number of frequently accessed shared
entries in a small and fast first-level cache, namely Shared Directory Cache, and uses
a larger and slower second-level Private Directory Cache to track the large amount of
private blocks. Experimental results show that, compared to a conventional directory, the PS-Directory improves performance while also reducing silicon area and energy consumption.
In this thesis we also show that the shared/private ratio of entries in the directory varies
across applications and across different execution phases within the applications, which
encourages us to propose Dynamic Way Partitioning (DWP) Directory. DWP-Directory
reduces the number of ways with storage for shared blocks and it allows this storage to be
powered off or on at run-time according to the dynamic requirements of the applications
following a repartitioning algorithm. Results show similar performance as a traditional
directory with high associativity, and similar area requirements as recent state-of-the-art schemes. In addition, DWP-Directory achieves notable static and dynamic power
consumption savings.
This dissertation also deals with the scalability issues in terms of power found
in processor caches. A significant fraction of the total power budget is consumed by
on-chip caches which are usually deployed with a high associativity degree (even L1
caches are being implemented with eight ways) to enhance the system performance. On
a cache access, each way in the corresponding set is accessed in parallel, which is costly
in terms of energy. This thesis presents the PS-Cache architecture, an energy-efficient
cache design that reduces the number of accessed ways without hurting the performance.
The PS-Cache takes advantage of the private-shared knowledge of the referenced block
to reduce energy by accessing only those ways holding the kind of block looked up.
Results show significant dynamic power consumption savings.
Finally, we propose an energy-efficient architectural design that can be effectively applied
to any kind of set-associative cache memory, not only to processor caches. The proposed
approach, called the Tag Filter (TF) Architecture, filters the ways accessed in the target
cache set, and just a few ways are searched in the tag and data arrays. This allows the
approach to reduce the dynamic energy consumption of caches without hurting their
access time. For this purpose, the proposed architecture holds the X least significant
bits of each tag in a small auxiliary X-bit-wide array. These bits are used to filter
the ways where the least significant bits of the tag do not match with the bits in the
X-bit array. Experimental results show that this filtering mechanism achieves energy
consumption in set-associative caches similar to direct mapped ones.
Experimental results show that the proposals presented in this thesis offer a good tradeoff
among these three major design axes.Conforme se incrementa el número de núcleos en las nuevas generaciones de multiprocesadores en chip, los CMPs deben de escalar en prestaciones, área y consumo energético
para cumplir con las demandas de un número núcleos mayor. Los protocolos basados
en directorio constituyen la alternativa más escalable. Un directorio convencional, no
obstante, sufre de una utilización ineficiente de almacenamiento y energía. En primer
lugar, los grandes y poco escalables vectores de compartidores consumen una cantidad
de energía de fuga y de área innecesaria, especialmente si se tiene en consideración que
la mayoría de los bloques en un directorio solo se encuentran en la cache de un único
núcleo. En segundo lugar, aunque incrementar el tamaño y la asociatividad del directorio aumentaría las prestaciones del sistema, esto supondría un incremento notable en el
consumo energético.
Esta tesis estudia las diferencias significativas entre el comportamiento de bloques privados y compartidos en el directorio, lo que nos lleva hacia una gestión separada para
cada uno de los tipos de bloque. Proponemos el PS-Directory, una cache de directorio de dos niveles que mantiene el reducido número de las entradas compartidas, que
son los que se acceden con más frecuencia, en una estructura pequeña de primer nivel
(concretamente, la Shared Directory Cache) y que utiliza una estructura más grande y
lenta en el segundo nivel (Private Directory Cache) para poder mantener la información
de los bloques privados. Los resultados experimentales muestran
que, comparado con un directorio convencional, el PS-Directory consigue mejorar las
prestaciones a la vez que reduce el área de silicio y el consumo energético.
Ya que el ratio compartido/privado de las entradas en el directorio varia entre aplicaciones y entre las diferentes fases de ejecución dentro de las aplicaciones, proponemos el
Dynamic Way Partitioning (DWP) Directory. El DWP-Directory reduce el número de
vías que almacenan entradas compartidas y permite que éstas se enciendan o apaguen
en tiempo de ejecución según los requisitos dinámicos de las aplicaciones según un algoritmo de reparticionado. Los resultados muestran unas prestaciones similares a un
directorio tradicional de alta asociatividad y un área similar a otros esquemas recientes
del estado del arte. Adicionalmente, el DWP-Directory obtiene importantes reducciones
de consumo estático y dinámico.
Esta disertación también se enfrenta a los problemas de escalabilidad que se pueden
encontrar en las memorias cache. En un acceso a la cache, se accede a cada vía del conjunto en paralelo, siendo
así un acción costosa en energía. Esta tesis presenta la arquitectura PS-Cache, un
diseño energéticamente eficiente que reduce el número de vías accedidas sin perjudicar
las prestaciones. La PS-Cache utiliza la información del estado privado-compartido del
bloque referenciado para reducir la energía, ya que tan solo accedemos a un subconjunto
de las vías que mantienen los bloques del tipo solicitado. Los resultados muestran unos
importantes ahorros de energía dinámica.
Finalmente, proponemos otro diseño de arquitectura energéticamente eficiente que se
puede aplicar a cualquier tipo de memoria cache asociativa por conjuntos. La propuesta, la Tag Filter (TF) Architecture, filtra las vías accedidas en el conjunto de la cache, de manera que solo se mira un número reducido de
vías tanto en el array de etiquetas como en el de datos. Esto permite que nuestra propuesta reduzca el consumo de energía dinámico de las caches sin perjudicar su tiempo de
acceso. Los resultados experimentales muestran que este mecanismo de filtrado es capaz de obtener un
consumo energético en caches asociativas por conjunto similar de las caches de mapeado
directo.
Los resultados
experimentales muestran que las propuestas presentadas en esta tesis consiguen un buen
compromiso entre estos tres importantes pilares de diseño.Conforme s'incrementen el nombre de nuclis en les noves generacions de multiprocessadors en xip, els CMPs han d'escalar en prestacions, àrea i consum energètic per complir en les demandes d'un nombre de nuclis major. El protocols basats en directori són
l'alternativa més escalable. Un directori convencional, no obstant, pateix una utilització
ineficient d'emmagatzematge i energia. En primer lloc, els grans i poc escalables vectors
de compartidors consumeixen una quantitat d'energia estàtica i d'àrea innecessària, especialment si es considera que la majoria dels blocs en un directori només es troben en la
cache d'un sol nucli. En segon lloc, tot i que incrementar la grandària i l'associativitat del
directori augmentaria les prestacions del sistema, això suposaria un increment notable
en el consum d'energia.
Aquesta tesis estudia les diferències significatives entre el comportament de blocs privats
i compartits dins del directori, la qual cosa ens guia cap a una gestió separada per a cada
un dels tipus de bloc. Proposem el PS-Directory, una cache de directori de dos nivells que
manté el reduït nombre de les entrades de blocs compartits, que són els que s'accedeixen
amb més freqüència, en una estructura menuda de primer nivell (concretament, la Shared
Directory Cache) i que empra una estructura més gran i lenta en el segon nivell (Private
Directory Cache) per poder mantenir la informació dels blocs privats.
Els resultats experimentals mostren que, comparat amb un directori convencional, el
PS-Directory aconsegueix millorar les prestacions a la vegada que redueix l'àrea de silici
i el consum energètic.
Ja que la ràtio compartit/privat de les entrades en el directori varia entre aplicacions
i entre les diferents fases d'execució dins de les aplicacions, proposem el Dynamic Way
Partitioning (DWP) Directory. DWP-Directory redueix el nombre de vies que emmagatzemen entrades compartides i permeten que aquest s'encengui o apagui en temps
d'execució segons els requeriments dinàmics de les aplicacions seguint un algoritme de
reparticionat. Els resultats mostren unes prestacions similars a un directori tradicional
d'alta associativitat i una àrea similar a altres esquemes recents de l'estat de l'art. Adicionalment, el DWP-Directory obté importants reduccions de consum estàtic i dinàmic.
Aquesta dissertació també s'enfronta als problemes d'escalabilitat que es poden tro-
bar en les memòries cache. Les caches on-chip consumeixen una part significativa del
consum total del sistema. Aquestes caches implementen un alt nivell d'associativitat. En un accés a la cache, s'accedeix a cada via del conjunt en paral·lel, essent
així una acció costosa en energia. Aquesta tesis presenta l'arquitectura PS-Cache, un
disseny energèticament eficient que redueix el nombre de vies accedides sense perjudicar
les prestacions. La PS-Cache utilitza la informació de l'estat privat-compartit del bloc
referenciat per a reduir energia, ja que només accedim al subconjunt de vies que mantenen blocs del tipus sol·licitat. Els resultats mostren uns importants estalvis d'energia
dinàmica.
Finalment, proposem un altre disseny d'arquitectura energèticament eficient que es pot
aplicar a qualsevol tipus de memòria cache associativa per conjunts. La proposta, la Tag Filter (TF) Architecture, filtra les vies
accedides en el conjunt de la cache, de manera que només un reduït nombre de vies es
miren tant en el array d'etiquetes com en el de dades. Això permet que la nostra proposta
redueixi el consum dinàmic energètic de les caches sense perjudicar el seu temps d'accés.
Els
resultats experimentals mostren que aquest mecanisme de filtre és capaç d'obtenir un
consum energètic en caches associatives per conjunt similar al de les caches de mapejada
directa.
Els resultats experimentals mostren que les propostes presentades en aquesta tesis conseguixen un bon
compromís entre aquestros tres importants pilars de diseny.Valls Mompó, JJ. (2017). Improving Energy and Area Scalability of the Cache Hierarchy in CMPs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/79551TESI
Dynamic deployment of specialized ESB instances in the cloud
In the last years the interaction among heterogeneous applications within one or among multiple enterprises has considerably increased. This fact has arisen several challenges related to how to enable the interaction among enterprises in an interoperable manner. Towards addressing this problem, the Enterprise Service Bus (ESB) has been proposed as an integration middleware capable of wiring all the components of an enterprise system in a transparent and interoperable manner. Enterprise Service Buses are nowadays used to transparently establish and handle interactions among the components within an application or with consumed external services. There are several ESB solutions available in the market as a result of continuously developing message-based approaches aiming at enabling interoperability among enterprise applications. However, the configuration of an ESB is typically custom, and complex. Moreover, there is little support and guidance for developers related to how to efficiently customize and configure the ESB with respect to their application requirements. Consequently, this fact also increments notably the maintenance and operational costs for enterprises. Our target is mainly to simplify the configuration tasks at the same time as provisioning customized ESB instances to satisfy the application's functional and non-functional requirements. Similar works focus on optimizing existing ESB configurations based on runtime reconfiguration rather than offering customized light-weight middleware components.
This Master thesis aims at providing the means to build customized and specialized ESB instances following a reusable and light-weight approach. We propose the creation of a framework capable of guiding the application developer in the tasks related to configuring, provisioning, and executing specialized ESB instances in an automatic, dynamic, and reusable manner. Specialized ESB instances are created automatically and provided to application developers that can build an ESB instance with a specific configuration which may change over time. The proposed framework also incorporates the necessary support for administering, provisioning, and maintaining a clustered infrastructure hosting the specialized ESB instances in an isolated manner
Low Latency Rendering with Dataflow Architectures
The research presented in this thesis concerns latency in VR and synthetic environments. Latency is the end-to-end delay experienced by the user of an interactive computer system, between their physical actions and the perceived response to these actions. Latency is a product of the various processing, transport and buffering delays present in any current computer system. For many computer mediated applications, latency can be distracting, but it is not critical to the utility of the application. Synthetic environments on the other hand attempt to facilitate direct interaction with a digitised world. Direct interaction here implies the formation of a sensorimotor loop between the user and the digitised world - that is, the user makes predictions about how their actions affect the world, and see these predictions realised. By facilitating the formation of the this loop, the synthetic environment allows users to directly sense the digitised world, rather than the interface, and induce perceptions, such as that of the digital world existing as a distinct physical place. This has many applications for knowledge transfer and efficient interaction through the use of enhanced communication cues. The complication is, the formation of the sensorimotor loop that underpins this is highly dependent on the fidelity of the virtual stimuli, including latency. The main research questions we ask are how can the characteristics of dataflow computing be leveraged to improve the temporal fidelity of the visual stimuli, and what implications does this have on other aspects of the fidelity. Secondarily, we ask what effects latency itself has on user interaction. We test the effects of latency on physical interaction at levels previously hypothesized but unexplored. We also test for a previously unconsidered effect of latency on higher level cognitive functions. To do this, we create prototype image generators for interactive systems and virtual reality, using dataflow computing platforms. We integrate these into real interactive systems to gain practical experience of how the real perceptible benefits of alternative rendering approaches, but also what implications are when they are subject to the constraints of real systems. We quantify the differences of our systems compared with traditional systems using latency and objective image fidelity measures. We use our novel systems to perform user studies into the effects of latency. Our high performance apparatuses allow experimentation at latencies lower than previously tested in comparable studies. The low latency apparatuses are designed to minimise what is currently the largest delay in traditional rendering pipelines and we find that the approach is successful in this respect. Our 3D low latency apparatus achieves lower latencies and higher fidelities than traditional systems. The conditions under which it can do this are highly constrained however. We do not foresee dataflow computing shouldering the bulk of the rendering workload in the future but rather facilitating the augmentation of the traditional pipeline with a very high speed local loop. This may be an image distortion stage or otherwise. Our latency experiments revealed that many predictions about the effects of low latency should be re-evaluated and experimenting in this range requires great care