37 research outputs found

    System-Level Thermal-Aware Design of 3D Multiprocessors with Inter-Tier Liquid Cooling

    Get PDF
    Rising chip temperatures and aggravated thermal reliability issues have characterized the emergence of 3D multiprocessor system-on-chips (3D-MPSoCs), necessitating the development of advanced cooling technologies. Microchannel based inter-tier liquid cooling of ICs has been envisaged as the most promising solution to this problem. A system-level thermal-aware design of electronic systems becomes imperative with the advent of these new cooling technologies, in order to preserve the reliable functioning of these ICs and effective management of the rising energy budgets of high-performance computing systems. This paper reviews the recent advances in the area of systemlevel thermal modeling and management techniques for 3D multiprocessors with advanced liquid cooling. These concepts are combined to present a vision of a green data center of the future which reduces the CO2 emissions by reusing the heat it generates

    Performance and power optimizations in chip multiprocessors for throughput-aware computation

    Get PDF
    The so-called "power (or power density) wall" has caused core frequency (and single-thread performance) to slow down, giving rise to the era of multi-core/multi-thread processors. For example, the IBM POWER4 processor, released in 2001, incorporated two single-thread cores into the same chip. In 2010, IBM released the POWER7 processor with eight 4-thread cores in the same chip, for a total capacity of 32 execution contexts. The ever increasing number of cores and threads gives rise to new opportunities and challenges for software and hardware architects. At software level, applications can benefit from the abundant number of execution contexts to boost throughput. But this challenges programmers to create highly-parallel applications and operating systems capable of scheduling them correctly. At hardware level, the increasing core and thread count puts pressure on the memory interface, because memory bandwidth grows at a slower pace ---phenomenon known as the "bandwidth (or memory) wall". In addition to memory bandwidth issues, chip power consumption rises due to manufacturers' difficulty to lower operating voltages sufficiently every processor generation. This thesis presents innovations to improve bandwidth and power consumption in chip multiprocessors (CMPs) for throughput-aware computation: a bandwidth-optimized last-level cache (LLC), a bandwidth-optimized vector register file, and a power/performance-aware thread placement heuristic. In contrast to state-of-the-art LLC designs, our organization avoids data replication and, hence, does not require keeping data coherent. Instead, the address space is statically distributed all over the LLC (in a fine-grained interleaving fashion). The absence of data replication increases the cache effective capacity, which results in better hit rates and higher bandwidth compared to a coherent LLC. We use double buffering to hide the extra access latency due to the lack of data replication. The proposed vector register file is composed of thousands of registers and organized as an aggregation of banks. We leverage such organization to attach small special-function "local computation elements" (LCEs) to each bank. This approach ---referred to as the "processor-in-regfile" (PIR) strategy--- overcomes the limited number of register file ports. Because each LCE is a SIMD computation element and all of them can proceed concurrently, the PIR strategy constitutes a highly-parallel super-wide-SIMD device (ideal for throughput-aware computation). Finally, we present a heuristic to reduce chip power consumption by dynamically placing software (application) threads across hardware (physical) threads. The heuristic gathers chip-level power and performance information at runtime to infer characteristics of the applications being executed. For example, if an application's threads share data, the heuristic may decide to place them in fewer cores to favor inter-thread data sharing and communication. In such case, the number of active cores decreases, which is a good opportunity to switch off the unused cores to save power. It is increasingly harder to find bulletproof (micro-)architectural solutions for the bandwidth and power scalability limitations in CMPs. Consequently, we think that architects should attack those problems from different flanks simultaneously, with complementary innovations. This thesis contributes with a battery of solutions to alleviate those problems in the context of throughput-aware computation: 1) proposing a bandwidth-optimized LLC; 2) proposing a bandwidth-optimized register file organization; and 3) proposing a simple technique to improve power-performance efficiency.El excesivo consumo de potencia de los procesadores actuales ha desacelerado el incremento en la frecuencia operativa de los mismos para dar lugar a la era de los procesadores con múltiples núcleos y múltiples hilos de ejecución. Por ejemplo, el procesador POWER7 de IBM, lanzado al mercado en 2010, incorpora ocho núcleos en el mismo chip, con cuatro hilos de ejecución por núcleo. Esto da lugar a nuevas oportunidades y desafíos para los arquitectos de software y hardware. A nivel de software, las aplicaciones pueden beneficiarse del abundante número de núcleos e hilos de ejecución para aumentar el rendimiento. Pero esto obliga a los programadores a crear aplicaciones altamente paralelas y sistemas operativos capaces de planificar correctamente la ejecución de las mismas. A nivel de hardware, el creciente número de núcleos e hilos de ejecución ejerce presión sobre la interfaz de memoria, ya que el ancho de banda de memoria crece a un ritmo más lento. Además de los problemas de ancho de banda de memoria, el consumo de energía del chip se eleva debido a la dificultad de los fabricantes para reducir suficientemente los voltajes de operación entre generaciones de procesadores. Esta tesis presenta innovaciones para mejorar el ancho de banda y consumo de energía en procesadores multinúcleo en el ámbito de la computación orientada a rendimiento ("throughput-aware computation"): una memoria caché de último nivel ("last-level cache" o LLC) optimizada para ancho de banda, un banco de registros vectorial optimizado para ancho de banda, y una heurística para planificar la ejecución de aplicaciones paralelas orientada a mejorar la eficiencia del consumo de potencia y desempeño. En contraste con los diseños de LLC de última generación, nuestra organización evita la duplicación de datos y, por tanto, no requiere de técnicas de coherencia. El espacio de direcciones de memoria se distribuye estáticamente en la LLC con un entrelazado de grano fino. La ausencia de replicación de datos aumenta la capacidad efectiva de la memoria caché, lo que se traduce en mejores tasas de acierto y mayor ancho de banda en comparación con una LLC coherente. Utilizamos la técnica de "doble buffering" para ocultar la latencia adicional necesaria para acceder a datos remotos. El banco de registros vectorial propuesto se compone de miles de registros y se organiza como una agregación de bancos. Incorporamos a cada banco una pequeña unidad de cómputo de propósito especial ("local computation element" o LCE). Este enfoque ---que llamamos "computación en banco de registros"--- permite superar el número limitado de puertos en el banco de registros. Debido a que cada LCE es una unidad de cómputo con soporte SIMD ("single instruction, multiple data") y todas ellas pueden proceder de forma concurrente, la estrategia de "computación en banco de registros" constituye un dispositivo SIMD altamente paralelo. Por último, presentamos una heurística para planificar la ejecución de aplicaciones paralelas orientada a reducir el consumo de energía del chip, colocando dinámicamente los hilos de ejecución a nivel de software entre los hilos de ejecución a nivel de hardware. La heurística obtiene, en tiempo de ejecución, información de consumo de potencia y desempeño del chip para inferir las características de las aplicaciones. Por ejemplo, si los hilos de ejecución a nivel de software comparten datos significativamente, la heurística puede decidir colocarlos en un menor número de núcleos para favorecer el intercambio de datos entre ellos. En tal caso, los núcleos no utilizados se pueden apagar para ahorrar energía. Cada vez es más difícil encontrar soluciones de arquitectura "a prueba de balas" para resolver las limitaciones de escalabilidad de los procesadores actuales. En consecuencia, creemos que los arquitectos deben atacar dichos problemas desde diferentes flancos simultáneamente, con innovaciones complementarias

    Winter 2010 Vol. 11 No. 2

    Get PDF
    https://surface.syr.edu/ischool_news/1001/thumbnail.jp

    Winter 2010

    Get PDF

    ACUTA Journal of Telecommunications in Higher Education

    Get PDF
    In This Issue The COSTS Project: Benchmarks for Understanding lT lnvestments. How to Save Money and lncrease Revenues Cornell Welcomes Legal Music lncreasing Revenue Means Decreasing Expenses Managing High Expectations for Wireless Service on Campus Reducing the Cost of Distance Learning Making the Outsourcing Decision lnstitutional Excellence Award Sinclair Community College Bill D. Morris Award ACUTA Ruth A. Michalecki Leadership Award Interview President\u27s Message From the Executive Director Here\u27s My Advic

    An Integrated Bargaining Solution Analysis For Vertical Cooperative Sales Promotion Campaigns Based On The Win-Win-Win Papakonstantinidis Model

    Get PDF
    Authors intention was to examine the possibility to investigate win-win-win papakonstantinidis model in order to develop an integrated bargaining solution analysis for vertical cooperative sales promotion campaigns. Based on previous theoretical extensions (Spais and Papakonstantinidis, 2011; Spais, Papakonstantinidis and Papakonstantinidis, 2009), this study presented an integrated bargaining solution analysis for cases of optimal allocation of a promotion budget in a cooperative sales promotion campaign in vertical marketing channels. This integrated bargaining solution analysis included: a) three (3) adjusted utility functions, considering the parameters of sales response budgeting method, the break-even sales analysis and the marketing channel members trade promotion goals; b) the referee solution, the optimal solution for the three players and the constraints; c) the definition of the third win in terms of a continuous sensitization process and perfect information; and d) the presentation of the potential outputs from a bargaining process regarding to the sharing of the cooperative sales promotion cost among A, B and C parties/players for different sales promotion offerings. Encouragingly, the review of the modern literature and the four (4) critical case studies of cooperative marketing programs confirmed the need for a win-win-win approach in cooperative sales promotion planning in vertical marketing channels

    Public ICT investment in reaction to the economic crisis : a case study on measuring IT-related intangibles in the public sector

    Full text link
    In this paper, we (1) analyse the German public IT-spending programme 2009-11 adopted after the crisis in terms of its tangible vs. intangible asset creation, (2) consider this relatively well-described programme as a use case for categorising IT-related intangibles in government beyond software (including e.g., IT-training, innovation in e-services), (3) investigate how to form insightful aggregates of intangible IT-related investment from project level data and, in comparison, from the regular public budget in Germany. Based on project descriptions, we find out that half of the spending was on IT security-related projects. According to our estimations based on quantitative information, qualitative information and approximations, about half of the total spending was on intangibles, of which again about half went into software and a quarter into consulting. As a new output-based category for some assets created in the programme, we propose the category “concepts”

    Palvelinkeskusten kasvun purku - kokeellinen tutkielma

    Get PDF
    Due to the massive increase in demand for cloud services, and popularity of mobile devices, the number of data centers and the amount of energy consumed by data centers is constantly growing. IT hardware does become more energy efficient according to Koomey's law, but the power proportinality of, e.g., servers and network switches is still quite poor. In this thesis, the trends in data center energy consumption and efficiency is closely examined, and some alternative methods for reversing the trend of data center power consumption are considered. In the experimental phase, a pilot data center is built and a basic web service architecture is designed on top of it in order to study how optimization and allocation of resources affect the quality of experience of the service. The results from the measurements indicate that, for this particular system, a surprisingly small amount of application processing server instances was required for near optimal quality of experience.Pilvipalveluiden kysynnän ja mobiililaitteiden suosion valtavan kasvun vuoksi palvelinkeskusten lukumäärä ja energiankulutus on maailmanlaajuisesti jatkuvassa kasvussa. Vaikka IT-laitteiston energiatehokkuus paraneekin jatkuvasti Koomeyn lain mukaisesti, on esimerkiksi palvelinten ja verkkokytkinten tehonkulutus edelleen varsin epädynaamista. Tässä opinnäytetyössä tutkitaan palvelinkeskusten energiankulutuksen ja energiatehokkuuden kehityssuuntia ja selvitetään vaihtoisia toimintatapoja yllä mainitun kehityssuunnan kääntämiseksi. Kokeellisesta osiota varten rakennettiin pilottipalvelinkeskus ja sinne suunniteltiin tavanomainen verkkopalveluarkkitehtuuri, jotta olisi mahdollista tutkia kuinka optimointi ja resurssien allokointi vaikuttavat palvelun käyttökokemukseen. Mittausten tulokset osoittivat, että tässä kyseisessä järjestelmässä yllättävän pieni määrä sovelluspalvelimia riittää lähes optimaalisen käyttökokemuksen tarjoamiseen

    An adaptive admission control and load balancing algorithm for a QoS-aware Web system

    Get PDF
    The main objective of this thesis focuses on the design of an adaptive algorithm for admission control and content-aware load balancing for Web traffic. In order to set the context of this work, several reviews are included to introduce the reader in the background concepts of Web load balancing, admission control and the Internet traffic characteristics that may affect the good performance of a Web site. The admission control and load balancing algorithm described in this thesis manages the distribution of traffic to a Web cluster based on QoS requirements. The goal of the proposed scheduling algorithm is to avoid situations in which the system provides a lower performance than desired due to servers' congestion. This is achieved through the implementation of forecasting calculations. Obviously, the increase of the computational cost of the algorithm results in some overhead. This is the reason for designing an adaptive time slot scheduling that sets the execution times of the algorithm depending on the burstiness that is arriving to the system. Therefore, the predictive scheduling algorithm proposed includes an adaptive overhead control. Once defined the scheduling of the algorithm, we design the admission control module based on throughput predictions. The results obtained by several throughput predictors are compared and one of them is selected to be included in our algorithm. The utilisation level that the Web servers will have in the near future is also forecasted and reserved for each service depending on the Service Level Agreement (SLA). Our load balancing strategy is based on a classical policy. Hence, a comparison of several classical load balancing policies is also included in order to know which of them better fits our algorithm. A simulation model has been designed to obtain the results presented in this thesis

    Acta Technica Jaurinensis 2016

    Get PDF
    corecore