19 research outputs found

    Hardware design of task superscalar architecture

    Get PDF
    Exploiting concurrency to achieve greater performance is a difficult and important challenge for current high performance systems. Although the theory is plain, the complexity of traditional parallel programming models in most cases impedes the programmer to harvest performance. Several partitioning granularities have been proposed to better exploit concurrency at task granularity. In this sense, different dynamic software task management systems, such as task-based dataflow programming models, benefit dataflow principles to improve task-level parallelism and overcome the limitations of static task management systems. These models implicitly schedule computation and data and use tasks instead of instructions as a basic work unit, thereby relieving the programmer of explicitly managing parallelism. While these programming models share conceptual similarities with the well-known Out-of-Order superscalar pipelines (e.g., dynamic data dependency analysis and dataflow scheduling), they rely on software-based dependency analysis, which is inherently slow, and limits their scalability when there is fine-grained task granularity and a large amount of tasks. The aforementioned problem increases with the number of available cores. In order to keep all the cores busy and accelerate the overall application performance, it becomes necessary to partition it into more and smaller tasks. The task scheduling (i.e., creation and management of the execution of tasks) in software introduces overheads, and so becomes increasingly inefficient with the number of cores. In contrast, a hardware scheduling solution can achieve greater speed-ups as a hardware task scheduler requires fewer cycles than the software version to dispatch a task. The Task Superscalar is a hybrid dataflow/von-Neumann architecture that exploits the task level parallelism of the program. The Task Superscalar combines the effectiveness of Out-of-Order processors together with the task abstraction, and thereby provides an unified management layer for CMPs which effectively employs processors as functional units. The Task Superscalar has been implemented in software with limited parallelism and high memory consumption due to the nature of the software implementation. In this thesis, a Hardware Task Superscalar architecture is designed to be integrated in a future High Performance Computer with the ability to exploit fine-grained task parallelism. The main contributions of this thesis are: (1) a design of the operational flow of Task Superscalar architecture adapted and improved for hardware implementation, (2) a HDL prototype for latency exploration, (3) a full cycle-accurate simulator of the Hardware Task Superscalar (based on the previously obtained latencies), (4) full design space exploration of the Task Superscalar component configuration (number and size) for systems with different number of processing elements (cores), (5) comparison with a software implementation of a real task-based programming model runtime using real benchmarks, and (6) hardware resource usage exploration of the selected configurations.Explotar la concurrencia para conseguir un mejor rendimiento es un reto importante y difícil para los sistemas de alto rendimiento. Aunque la teoría es sencilla, en muchos casos la complejidad de los modelos de programación paralela tradicionales impide al programador obtener un buen rendimiento. Se han propuesto diferentes granularidades de particionamiento de tareas para explotar mejor la concurrencia implícita en las aplicaciones. En este sentido, diferentes sistemas software de manejo dinámico de tareas utilizan los principios de ejecución "dataflow" para mejorar el paralelismo a nivel de tarea y superar el rendimiento de los sistemas de planificación estáticos. Estos modelos planfican la ejecución dinámicamente y utilizan tareas, en lugar de instrucciones, como unidad básica de trabajo. De esta forma descargan al programador de tener que realizar la sincronización de las tareas explícitamente en su programa. Aunque estos modelos de programación comparten muchas similitudes con los bien conocidos procesadores fuera de orden (como el análisis dinámico de dependencias y la ejecución en "dataflow"), dependen de un análisis dinámico software de las dependencias. Dicho análisis es inherentemente lento y limita la escalabilidad cuando hay un gran número de tareas pequeñas. Los problemas antes mencionados se incrementan exponencialmente con el número de núcleos disponibles. Para conseguir mantener todos los núcleos ocupados y conseguir acelerar el rendimiento global de la aplicación se hace necesario particionarla en muchas tareas pequeñas. La gestión de dichas tareas (es decir, su creación y distribución entre los núcleos) en software introduce sobrecostes, y por tanto resulta ineficiente conforme aumenta el número de núcleos. En contraposición, un sistema hardware de planificación de tareas puede conseguir mejores rendimientos ya que requiere una menor latencia en la gestión de las tareas. El Task Superscalar (TSS) es una arquitectura híbrida dataflow/von-Neumann que explota el paralelismo a nivel de tareas de los programas. El TSS combina la efectividad de los procesadores fuera de orden con la abstracción de tarea, y por tanto provee una capa unificada de gestión para los CMPs que gestiona los núcleos como unidades funcionales. Previo al trabajo de esta tesis el Task Superscalar se había implementado en software con un paralelismo limitado y mucho consumo de memoria debido a las limitaciones inherentes de una implementación software. En esta tesis se diseñado una implementación hardware de la arquitectura Task Superscalar con capacidad para manejar muchas tareas de pequeño tamaño que es integrable en un futuro computador de altas prestaciones. Así pues, las contribuciones principales de esta tesis son: (1) el diseño de un flujo operacional de la arquitectura Task Superscalar adaptado y mejorado para su implementación hardware; (2) un prototipo HDL de dicho flujo para la exploración de las latencias asociadas a la implementación hardware; (3) un simulador ciclo a ciclo del diseño hardware basado en los resultados obtenidos en la implementación hardware; (4) una exploración completa del espacio de diseño de los componentes hardware (número y cantidad de módulos, tamaños de las memorias, etc.) para diferentes tamaños de computadores (es decir, para diferentes cantidades de nucleos); (5) una comparación con la implementación software actual del mismo modelo de programación utilizando aplicaciones reales y; (6) una exploración de la utilización de recursos hardware de las diferentes configuraciones seleccionadas

    Woody Feedstock Pretreatments to Enhance Pyrolysis Bio-oil Quality and Produce Transportation Fuel

    Get PDF
    Lignocellulosic biomass as a potential renewable source of energy has a near-zero CO2 emission. Pyrolysis converts biomass to a liquid fuel and increases the energy density and transportability. The pyrolysis bio-oil shows promising properties to substitute the conventional fossil fuels. But, unprocessed biomass is low in bulk and energy density; high in moisture; heterogeneous in physical and chemical properties, highly hygroscopic and difficult to handle. That is why the biomass needs mechanical, chemical and/or thermal pretreatments to turn into a more homogeneous feedstock and minimize the post-treatment fuel upgrading. This chapter explains the effects that various pretreatments such as size reduction, drying, washing and thermal pretreatments have on the quality and quantity of bio-oil. Washing with water or acid/alkali solutions extracts the minerals that consequently reduces the ash and shortens the reactor clean-out cycle. Torrefaction is gaining attention as an effective pretreatment to modify the quality of biomass in terms of physical and chemical properties. Torrefaction produces a uniform biomass with lower moisture, acidity and oxygen contents and higher energy density and grindability than raw biomass. Pyrolysis of torrefied biomass produces bio-oil with enhanced compositional and physical properties such as a higher heating value and increased C (lower O/C ratio)

    Analysis of the Task Superscalar architecture hardware design

    Get PDF
    In this paper, we analyze the operational flow of two hardware implementations of the Task Superscalar architecture. The Task Superscalar is an experimental task based dataflow scheduler that dynamically detects inter-task data dependencies, identifies task-level parallelism, and executes tasks in the out-of-order manner. In this paper, we present a base implementation of the Task Superscalar architecture, as well as a new design with improved performance. We study the behavior of processing some dependent and non-dependent tasks with both base and improved hardware designs and present the simulation results compared with the results of the runtime implementation.This work is supported by the Ministry of Science and Technology of Spain and the European Union (FEDER funds) under contract TIN2007-60625, by the Generalitat de Catalunya (contract 2009-SGR-980), and by the European FP7 project TERAFLUX id. 249013, http://www.tera ux.eu. We would also like to thank the Xilinx University Program for its hardware and software donations.Postprint (author’s final draft

    Permeability of bulk wood pellets with respect to airflow

    No full text
    Data on the resistance of wood pellets to air flow are required for the design and control of ventilation and drying of bulk pellets in storage. In this study, pressure drops versus air flows were measured for several sizes of wood pellets, with diameter of 6 mm and lengths varying from 4 to 34 mm. Air flow rates ranging from 0.014 to 0.80 m s⁻¹ were used in the experiment. The maximum pressure drop measured was 2550 Pa m⁻¹. Three predictive models - Shedd, Hukill-Ives, and Ergun equations that relate pressure drop to air flow in bulk granular materials were used to analyze the data. The Ergun equation was found to provide the best fit to the data. Aeration of bulk pellets in storage requires a low airflow. The airflow range used for low permeability tests was from 0.0002 to 0.0220 m s⁻¹. The corresponding measured pressure drop ranged from 0.18 to 8.30 Pa m⁻¹ for low permeability tests. Three models were investigated and compared for the low permeability data. The increase in moisture content over a wider range of airflows (0.0042 to 0.7 148 m ⁻¹) showed a slight decrease in the resistance to airflow due to increased moisture content. Broken and fines are produced when pellets are handled. The resistance to air flow for wood pellets was measured in the presence of fine materials. Fines were defined as broken pellets passed a 4 mm sieve. The average geometric diameter of the fines was 0.75 mm. The pressure drop for pellets mixed with fines ranged from (2.0 to 191.2 Pa m⁻¹) and (7.9 to 1779.0 Pa m⁻¹) for 1% and 20% fines content (mass basis) respectively. Coefficients of Hukill and Ives’ equation for pellets were estimated as a function of percent fines content.Applied Science, Faculty ofChemical and Biological Engineering, Department ofGraduat

    Evolution and stratification of off-gasses in stored wood pellets

    No full text
    Storage of wood pellets has resulted in several deathly accidents in connection with off-gassing and self-heating. The goal of the present study was to quantify off-gassing characteristics of white wood pellets when stored in an experimental silo. Wood pellets properties were characterized with respect to gas adsorption-desorption and spatial and temporal concentrations of off-gases and thermal conditions within the pilot storage were quantified. In the last part, the effectiveness of purging the silo in reducing off-gas concentration was evaluated. To assess the adsorption of off-gases by wood pellets in storage, Temperature Programmed Desorption was used. Highest CO₂ adsorption was seen by torrefied wood pellets while lowest uptake was showed to be for steam exploded pellets. Quantifying the uptake of CO was challenging due to chemical reaction and therefore strong bonds between the material and carbon monoxide. Studies on emission and stratification of off-gases showed higher emission factor compared to work done with white wood pellets in small scale. Some stratifications were observed for CO₂ and CH₄ over the first days of storage. However for CO the stratification was much clear and related to high uptake of CO by wood pellets over time. During the entire period of storage, maximum temperature in the silo was recorded on day 15 of storage (storage time was 63 days) at the elevation of 2.5 m (silo dimension was 1.2m diameter and 4.6m height). Measured temperature in the silo during 5.5 hour purging experiments with air at 18-18.5 °C, helped the temperature decrease in the lower parts and slightly middle parts of the silo after 200 minutes of purging. To evaluate the effectiveness of a purging system to sweep the off-gases from the experimental silo, multiple purging tests were done. Mixing experiments showed large deviations from plug flow and thus better mixing for all superficial velocities used. Predicted results showed the concentration model fitted best to the measured off-gas concentration at the bottom and in the middle of the silo while the model overestimated the exponential decay of the off-gases in the head-space of the silo.Applied Science, Faculty ofChemical and Biological Engineering, Department ofGraduat

    Effects of free and encapsulated transglutaminase on the physicochemical, textural, microbial, sensorial, and microstructural properties of white cheese

    No full text
    Abstract In this study, the effect of free and encapsulated transglutaminase (TGase) on physicochemical, textural, microstructural, microbial, and sensorial properties of white cheese was investigated. For this purpose, different types of white cheese incorporated with 20 and 60 ppm free enzyme (F20 and F60) and encapsulated enzyme (E20 and E60) were prepared and then compared with control (C) white cheese without TGase. The results showed that the addition of encapsulated TGase significantly (p ˂ .05) increased protein and fat content, dry matter, nitrogen recovery, and pH, as well as the production yield of cheeses. The hardness of treated samples was increased during the storage time, while the reverse trend was observed for the control sample. F60 and E60 samples showed more oriented and compact structures compared with other samples. Based on the results of sensory evaluation, E60 sample received the highest taste and flavor scores. Generally, the physicochemical, sensorial, and microstructural properties of white chesses were improved by the presence of encapsulated enzyme in the formulation

    Biofilm formation and its genes expressions in Staphylococcus epidermidis isolated from urinary tract infections of children in Isfahan

    No full text
    Aims: Staphylococcus epidermidis is an important bacterium, also one of the 40 species related to the Staphylococcus family. It can be found in the human normal body flora, commonly on the skin, and less commonly on mucosal flora. Instrument and Methods: In the cross-sectional study, we were isolated samples according to the laboratories standards, and S. epidermidis identification were collected for 1 year, 90 S. epidermidis from urinary tract infections of children were selected from educational hospitals in Isfahan, (Iran). In this way, we use the Kirby–Bauer method. S. epidermidis isolates were collected for determined biofilm producing method, with culturing in (Congo red agar) medium and microplate titration. Results: The results reveal that 45 methicillin resistance S. epidermidis isolates produce biofilm in different levels. The high resistance was for methicillin (50%), erythromycin (43.5%), ciprofloxacin (50.2%), and penicillin (46.9%). The lowest resistance was for linezolid (4%) and nitrofurantoin (5%). Conclusions: The results of our study show the high prevalence of antibiotic-resistant and biofilm producing of S. epidermidis strains, especially, in methicillin resistance S. epidermidis strains in the Isfahan hospitals, which could be a reservoir for antibiotic resistance genes

    Picos: A hardware runtime architecture support for OmpSs

    No full text
    © 2015 Elsevier B.V. All rights reserved. OmpSs is a programming model that provides a simple and powerful way of annotating sequential programs to exploit heterogeneity and task parallelism based on runtime data dependency analysis, dataflow scheduling and out-of-order task execution; it has greatly influenced Version 4.0 of the OpenMP standard. The current implementation of OmpSs achieves those capabilities with a pure-software runtime library: Nanos++. Therefore, although powerful and easy to use, the performance benefits of exploiting fine-grained (pico) task parallelism are limited by the software runtime overheads. To overcome this handicap we propose Picos, an implementation of the Task Superscalar (TSS) architecture that provides hardware support to the OmpSs programming model. Picos is a novel hardware dataflow-based task scheduler that dynamically analyzes inter-task dependencies and identifies task-level parallelism at run-time. In this paper, we describe the Picos Hardware Design and the latencies of the main functionality of its components, based on the synthesis of their VHDL design. We have implemented a full cycle-accurate simulator based on those latencies to perform a design exploration of the characteristics and number of its components in a reasonable amount of time. Finally, we present a comparison of the Picos and Nanos++ runtime performance scalability with a set of real benchmarks. With Picos, a programmer can achieve ideal scalability using aggressive parallel strategies with a large number of fine granularity tasks.This work is supported by the Spanish Government through Programa Severo Ochoa (SEV-2011-0067), by the Spanish Ministry of Science and Technology through TIN2012-34557 project, by the Generalitat de Catalunya (contract 2009-SGR-980), by the European FP7 project TERAFLUX id. 249013 and by the European Research Council under the European Union’s 7th FP, ERC Grant Agreement number 321253. We also thank the Xilinx University Program for its hardware and software donationsPeer Reviewe
    corecore