1,832 research outputs found
Parallel and Distributed Computing
The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing
Tuning the Computational Effort: An Adaptive Accuracy-aware Approach Across System Layers
This thesis introduces a novel methodology to realize accuracy-aware systems, which will help designers integrate accuracy awareness into their systems. It proposes an adaptive accuracy-aware approach across system layers that addresses current challenges in that domain, combining and tuning accuracy-aware methods on different system layers. To widen the scope of accuracy-aware computing including approximate computing for other domains, this thesis presents innovative accuracy-aware methods and techniques for different system layers.
The required tuning of the accuracy-aware methods is integrated into a configuration layer that tunes the available knobs of the accuracy-aware methods integrated into a system
Adaptive Performance and Power Management in Distributed Computing Systems
The complexity of distributed computing systems has raised two unprecedented challenges for system management. First, various customers need to be assured by meeting their required service-level agreements such as response time and throughput. Second, system power consumption must be controlled in order to avoid system failures caused by power capacity overload or system overheating due to increasingly high server density. However, most existing work, unfortunately, either relies on open-loop estimations based on off-line profiled system models, or evolves in a more ad hoc fashion, which requires exhaustive iterations of tuning and testing, or oversimplifies the problem by ignoring the coupling between different system characteristics (\ie, response time and throughput, power consumption of different servers). As a result, the majority of previous work lacks rigorous guarantees on the performance and power consumption for computing systems, and may result in degraded overall system performance. In this thesis, we extensively study adaptive performance/power management and power-efficient performance management for distributed computing systems such as information dissemination systems, power grid management systems, and data centers, by proposing Multiple-Input-Multiple-Output (MIMO) control and hierarchical designs based on feedback control theory. For adaptive performance management, we design an integrated solution that controls both the average response time and CPU utilization in information dissemination systems to achieve bounded response time for high-priority information and maximized system throughput in an example information dissemination system. In addition, we design a hierarchical control solution to guarantee the deadlines of real-time tasks in power grid computing by grouping them based on their characteristics, respectively. For adaptive power management, we design MIMO optimal control solutions for power control at the cluster and server level and a hierarchical solution for large-scale data centers. Our MIMO control design can capture the coupling among different system characteristics, while our hierarchical design can coordinate controllers at different levels. For power-efficient performance management, we discuss a two-layer coordinated management solution for virtualized data centers. Experimental results in both physical testbeds and simulations demonstrate that all the solutions outperform state-of-the-art management schemes by significantly improving overall system performance
Performance assessment of real-time data management on wireless sensor networks
Technological advances in recent years have allowed the maturity of Wireless Sensor Networks
(WSNs), which aim at performing environmental monitoring and data collection. This sort of
network is composed of hundreds, thousands or probably even millions of tiny smart computers
known as wireless sensor nodes, which may be battery powered, equipped with sensors, a radio
transceiver, a Central Processing Unit (CPU) and some memory. However due to the small size and
the requirements of low-cost nodes, these sensor node resources such as processing power, storage
and especially energy are very limited.
Once the sensors perform their measurements from the environment, the problem of data
storing and querying arises. In fact, the sensors have restricted storage capacity and the on-going
interaction between sensors and environment results huge amounts of data. Techniques for data
storage and query in WSN can be based on either external storage or local storage. The external
storage, called warehousing approach, is a centralized system on which the data gathered by the
sensors are periodically sent to a central database server where user queries are processed. The
local storage, in the other hand called distributed approach, exploits the capabilities of sensors
calculation and the sensors act as local databases. The data is stored in a central database server
and in the devices themselves, enabling one to query both.
The WSNs are used in a wide variety of applications, which may perform certain operations on
collected sensor data. However, for certain applications, such as real-time applications, the sensor
data must closely reflect the current state of the targeted environment. However, the environment
changes constantly and the data is collected in discreet moments of time. As such, the collected
data has a temporal validity, and as time advances, it becomes less accurate, until it does not
reflect the state of the environment any longer. Thus, these applications must query and analyze
the data in a bounded time in order to make decisions and to react efficiently, such as industrial
automation, aviation, sensors network, and so on. In this context, the design of efficient real-time
data management solutions is necessary to deal with both time constraints and energy consumption.
This thesis studies the real-time data management techniques for WSNs. It particularly it focuses
on the study of the challenges in handling real-time data storage and query for WSNs and on the
efficient real-time data management solutions for WSNs.
First, the main specifications of real-time data management are identified and the available
real-time data management solutions for WSNs in the literature are presented. Secondly, in order to
provide an energy-efficient real-time data management solution, the techniques used to manage
data and queries in WSNs based on the distributed paradigm are deeply studied. In fact, many
research works argue that the distributed approach is the most energy-efficient way of managing
data and queries in WSNs, instead of performing the warehousing. In addition, this approach can provide quasi real-time query processing because the most current data will be retrieved from the
network.
Thirdly, based on these two studies and considering the complexity of developing, testing, and
debugging this kind of complex system, a model for a simulation framework of the real-time
databases management on WSN that uses a distributed approach and its implementation are
proposed. This will help to explore various solutions of real-time database techniques on WSNs
before deployment for economizing money and time. Moreover, one may improve the proposed
model by adding the simulation of protocols or place part of this simulator on another available
simulator. For validating the model, a case study considering real-time constraints as well as energy
constraints is discussed.
Fourth, a new architecture that combines statistical modeling techniques with the distributed
approach and a query processing algorithm to optimize the real-time user query processing are
proposed. This combination allows performing a query processing algorithm based on admission
control that uses the error tolerance and the probabilistic confidence interval as admission
parameters. The experiments based on real world data sets as well as synthetic data sets
demonstrate that the proposed solution optimizes the real-time query processing to save more
energy while meeting low latency.Fundação para a Ciência e Tecnologi
Hardware Acceleration for Unstructured Big Data and Natural Language Processing.
The confluence of the rapid growth in electronic data in recent years, and the renewed interest in domain-specific hardware accelerators presents exciting technical opportunities. Traditional scale-out solutions for processing the vast amounts of text data have been shown to be energy- and cost-inefficient. In contrast, custom hardware accelerators can provide higher throughputs, lower latencies, and significant energy savings. In this thesis, I present a set of hardware accelerators for unstructured big-data processing and natural language processing.
The first accelerator, called HAWK, aims to speed up the processing of ad hoc queries against large in-memory logs. HAWK is motivated by the observation that traditional software-based tools for processing large text corpora use memory bandwidth inefficiently due to software overheads, and, thus, fall far short of peak scan rates possible on modern memory systems. HAWK is designed to process data at a constant rate of 32 GB/s—faster than most extant memory systems. I demonstrate that HAWK outperforms state-of-the-art software solutions for text processing, almost by an order of magnitude in many cases. HAWK occupies an area of 45 sq-mm in its pareto-optimal configuration and consumes 22 W of power, well within the area and power envelopes of modern CPU chips.
The second accelerator I propose aims to speed up similarity measurement calculations for semantic search in the natural language processing space. By leveraging the latency hiding concepts of multi-threading and simple scheduling mechanisms, my design maximizes functional unit utilization. This similarity measurement accelerator provides speedups of 36x-42x over optimized software running on server-class cores, while requiring 56x-58x lower energy, and only 1.3% of the area.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116712/1/prateekt_1.pd
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
Securing Real-Time Internet-of-Things
Modern embedded and cyber-physical systems are ubiquitous. A large number of
critical cyber-physical systems have real-time requirements (e.g., avionics,
automobiles, power grids, manufacturing systems, industrial control systems,
etc.). Recent developments and new functionality requires real-time embedded
devices to be connected to the Internet. This gives rise to the real-time
Internet-of-things (RT-IoT) that promises a better user experience through
stronger connectivity and efficient use of next-generation embedded devices.
However RT- IoT are also increasingly becoming targets for cyber-attacks which
is exacerbated by this increased connectivity. This paper gives an introduction
to RT-IoT systems, an outlook of current approaches and possible research
challenges towards secure RT- IoT frameworks
Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study
The number and diversity of consumer devices are growing rapidly, alongside
their target applications' memory consumption. Unfortunately, DRAM scalability
is becoming a limiting factor to the available memory capacity in consumer
devices. As a potential solution, manufacturers have introduced emerging
non-volatile memories (NVMs) into the market, which can be used to increase the
memory capacity of consumer devices by augmenting or replacing DRAM. Since
entirely replacing DRAM with NVM in consumer devices imposes large system
integration and design challenges, recent works propose extending the total
main memory space available to applications by using NVM as swap space for
DRAM. However, no prior work analyzes the implications of enabling a real
NVM-based swap space in real consumer devices.
In this work, we provide the first analysis of the impact of extending the
main memory space of consumer devices using off-the-shelf NVMs. We extensively
examine system performance and energy consumption when the NVM device is used
as swap space for DRAM main memory to effectively extend the main memory
capacity. For our analyses, we equip real web-based Chromebook computers with
the Intel Optane SSD, which is a state-of-the-art low-latency NVM-based SSD
device. We compare the performance and energy consumption of interactive
workloads running on our Chromebook with NVM-based swap space, where the Intel
Optane SSD capacity is used as swap space to extend main memory capacity,
against two state-of-the-art systems: (i) a baseline system with double the
amount of DRAM than the system with the NVM-based swap space; and (ii) a system
where the Intel Optane SSD is naively replaced with a state-of-the-art (yet
slower) off-the-shelf NAND-flash-based SSD, which we use as a swap space of
equivalent size as the NVM-based swap space
- …