94 research outputs found
Submicron Systems Architecture Project : Semiannual Technical Report
The Mosaic C is an experimental fine-grain multicomputer
based on single-chip nodes. The Mosaic C chip includes 64KB of fast dynamic RAM,
processor, packet interface, ROM for bootstrap and self-test, and a two-dimensional selftimed
router. The chip architecture provides low-overhead and low-latency handling of
message packets, and high memory and network bandwidth. Sixty-four Mosaic chips are
packaged by tape-automated bonding (TAB) in an 8 x 8 array on circuit boards that can, in
turn, be arrayed in two dimensions to build arbitrarily large machines. These 8 x 8 boards are
now in prototype production under a subcontract with Hewlett-Packard. We are planning
to construct a 16K-node Mosaic C system from 256 of these boards. The suite of Mosaic
C hardware also includes host-interface boards and high-speed communication cables. The
hardware developments and activities of the past eight months are described in section 2.1.
The programming system that we are developing for the Mosaic C is based on the
same message-passing, reactive-process, computational model that we have used with earlier
multicomputers, but the model is implemented for the Mosaic in a way that supports finegrain
concurrency. A process executes only in response to receiving a message, and may in
execution send messages, create new processes, and modify its persistent variables before
it either exits or becomes dormant in preparation for receiving another message. These
computations are expressed in an object-oriented programming notation, a derivative of
C++ called C+-. The computational model and the C+- programming notation are
described in section 2.2. The Mosaic C runtime system, which is written in C+-, provides
automatic process placement and highly distributed management of system resources. The
Mosaic C runtime system is described in section 2.3
Recommended from our members
A distributed analysis and monitoring framework for the compact Muon solenoid experiment and a pedestrian simulation
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The design of a parallel and distributed computing system is a very complicated task. It requires a detailed understanding of the design issues and of the theoretical and practical aspects of their solutions. Firstly, this thesis discusses in detail the major concepts and components required to make parallel and distributed computing a reality. A multithreaded and distributed framework capable of analysing the simulation data produced by a pedestrian simulation software was developed. Secondly, this thesis discusses the origins and fundamentals of Grid computing and the motivations for its use in High Energy Physics. Access to the data produced by the Large Hadron Collider (LHC) has to be provided for more than five thousand scientists all over the world. Users who run analysis jobs on the Grid do not necessarily have expertise in Grid computing. Simple, userfriendly and reliable monitoring of the analysis jobs is one of the key components of the operations of the distributed analysis; reliable monitoring is one of the crucial components of the Worldwide LHC Computing Grid for providing the functionality and performance that is required by the LHC experiments. The CMS Dashboard Task Monitoring and the CMS Dashboard Job Summary monitoring applications were developed to serve the needs of the CMS community
Single system image: A survey
Single system image is a computing paradigm where a number of distributed computing resources are aggregated and presented via an interface that maintains the illusion of interaction with a single system. This approach encompasses decades of research using a broad variety of techniques at varying levels of abstraction, from custom hardware and distributed hypervisors to specialized operating system kernels and user-level tools. Existing classification schemes for SSI technologies are reviewed, and an updated classification scheme is proposed. A survey of implementation techniques is provided along with relevant examples. Notable deployments are examined and insights gained from hands-on experience are summarized. Issues affecting the adoption of kernel-level SSI are identified and discussed in the context of technology adoption literature
Integrating multiple clusters for compute-intensive applications
Multicluster grids provide one promising solution to satisfying the growing computational demands of compute-intensive applications. However, it is challenging to seamlessly integrate all participating clusters in different domains into a single virtual computational platform. In order to fully utilize the capabilities of multicluster grids, computer scientists need to deal with the issue of joining together participating autonomic systems practically and efficiently to execute grid-enabled applications. Driven by several compute-intensive applications, this theses develops a multicluster grid management toolkit called Pelecanus to bridge the gap between user\u27s needs and the system\u27s heterogeneity. Application scientists will be able to conduct very large-scale execution across multiclusters with transparent QoS assurance. A novel model called DA-TC (Dynamic Assignment with Task Containers) is developed and is integrated into Pelecanus. This model uses the concept of a task container that allows one to decouple resource allocation from resource binding. It employs static load balancing for task container distribution and dynamic load balancing for task assignment. The slowest resources become useful rather than be bottlenecks in this manner. A cluster abstraction is implemented, which not only provides various cluster information for the DA-TC execution model, but also can be used as a standalone toolkit to monitor and evaluate the clusters\u27 functionality and performance. The performance of the proposed DA-TC model is evaluated both theoretically and experimentally. Results demonstrate the importance of reducing queuing time in decreasing the total turnaround time for an application. Experiments were conducted to understand the performance of various aspects of the DA-TC model. Experiments showed that our model could significantly reduce turnaround time and increase resource utilization for our targeted application scenarios. Four applications are implemented as case studies to determine the applicability of the DA-TC model. In each case the turnaround time is greatly reduced, which demonstrates that the DA-TC model is efficient for assisting application scientists in conducting their research. In addition, virtual resources were integrated into the DA-TC model for application execution. Experiments show that the execution model proposed in this thesis can work seamlessly with multiple hybrid grid/cloud resources to achieve reduced turnaround time
Reliability of power systems with climate change effects on PV and wind power generation
Concerns over global climate change has led utilities to reduce greenhouse gas (GHG) emissions by decarbonising the power sector. The accelerating rate of climate change is likely to expose a decarbonised power system to climate related stresses. In particular, Photo Voltaic (PV) and wind power generation systems comprise a significant share in the power grid, which is potentially vulnerable to climate change, and therefore may impact the reliability of power systems with their integrations. Typical reliability assessments do not consider the climate effects and related stresses either on the PV or wind power generating systems or at their component levels. Therefore, this thesis investigates and addresses the challenges of reliability assessment of power grid with the interaction of climate changes and renewable power generation systems.
As a part of the investigation, the thesis proposes a novel systematic framework to assess the PV system components’ availability with the interaction of future changes in climate. The framework is developed to quantify the climate related stresses on the hierarchical levels of a PV system, which include component, subsystem, PV system and the grid. The framework was formed by considering multiple elements including thermal stress, bathtub curve, ageing and degradation level and operated on Markov chain embedded Monte Carlo simulation. The uniqueness of the framework is its ability to identify the critical components in a PV system that lead to climate-associated failures. Thesis also proposes a comprehensive framework to assess the reliability of a PV and wind power integrated power system accounting climate change impacts by deploying diverse levels of GHG emission scenarios. Uncertainties in the future climate scenarios were established by proposing an advanced stochastic model considering likelihood-based Markov chain method for generating future climate scenario. The proposed model is integrated to the reliability assessment framework to assess realistic impacts on the reliability of a power system.
Investigations were suggested the impacts of climate change effects on PV and wind power generation system were true and in quantitative terms PV systems are more vulnerable to climate change effects than wind power generating systems. The climate change related true impacts on PV and wind power generating systems could be mitigated by quantifying change in impacts quantitatively and then systematic replacement of vulnerable sub system components in time before their end of life. Further investigations suggest that IGBTs and capacitors are key components that are more sensitive to thermal stresses of climate change effects resulting a considering impacts on their availability and on the power system reliability with their presence. Further assessments also revealed that the impacts on power system reliability due to the climate change effects on PV and wind power generation system were not uniform over the long run which further emphasises the need of a quantitative and system assessment in order to expose true impacts of climate change on PV and wind power generation system extending to the entire power system reliability. The thesis provides a solid foundation of frameworks required in the quantitative assessment
Modeling and optimization of high-performance many-core systems for energy-efficient and reliable computing
Thesis (Ph.D.)--Boston UniversityMany-core systems, ranging from small-scale many-core processors to large-scale high performance computing (HPC) data centers, have become the main trend in computing system design owing to their potential to deliver higher throughput per watt. However, power densities and temperatures increase following the growth in the performance capacity, and bring major challenges in energy efficiency, cooling costs, and reliability. These challenges require a joint assessment of performance, power, and temperature tradeoffs as well as the design of runtime optimization techniques that monitor and manage the interplay among them. This thesis proposes novel modeling and runtime management techniques that evaluate and optimize the performance, energy, and reliability of many-core systems.
We first address the energy and thermal challenges in 3D-stacked many-core processors. 3D processors with stacked DRAM have the potential to dramatically improve performance owing to lower memory access latency and higher bandwidth. However, the performance increase may cause 3D systems to exceed the power budgets or create thermal hot spots. In order to provide an accurate analysis and enable the design of efficient management policies, this thesis introduces a simulation framework to jointly analyze performance, power, and temperature for 3D systems. We then propose a runtime optimization policy that maximizes the system performance by characterizing the application behavior and predicting the operating points that satisfy the power and thermal constraints. Our policy reduces the energy-delay product (EDP) by up to 61.9% compared to existing strategies.
Performance, cooling energy, and reliability are also critical aspects in HPC data centers. In addition to causing reliability degradation, high temperatures increase the required cooling energy. Communication cost, on the other hand, has a significant impact on system performance in HPC data centers. This thesis proposes a topology-aware technique that maximizes system reliability by selecting between workload clustering and balancing. Our policy improves the system reliability by up to 123.3% compared to existing temperature balancing approaches. We also introduce a job allocation methodology to simultaneously optimize the communication cost and the cooling energy in a data center. Our policy reduces the cooling cost by 40% compared to cooling-aware and performance-aware policies, while achieving comparable performance to performance-aware policy
Dynamic load balancing strategies in heterogeneous distributed system
Distributed heterogeneous computing is being widely applied to a variety of large size computational problems. This computational environments are consists of multiple het-
erogeneous computing modules, these modules interact with each other to solve the prob-lem. Dynamic load balancing in distributed computing system is desirable because it is
an important key to establish dependability in a Heterogeneous Distributed Computing Systems (HDCS). Load balancing problem is an optimization problem with exponential solution space. The complexity of dynamic load balancing increases with the size of a HDCS and becomes difficult to solve effectively. The solution to this intractable problem is discussed under different algorithm paradigm.The load submitted to the a HDCS is assumed to be in the form of tasks. Dynamic allocation of n independent tasks to m computing nodes in heterogeneous distributed
computing system can be possible through centralized or decentralized control. In central-ized approach,we have formulated load balancing problem considering task and machine heterogeneity as a linear programming problem to minimize the time by which all task completes the execution in makespan.The load balancing problem in HDCS aims to maintain a balanced allocation of tasks while using the computational resources. The system state changes with time on arrival of tasks from the users. Therefore,heterogeneous distributed system is modeled as an M/M/m queue. The task model is represented either as a consistent or an inconsistent expected time to compute (ETC) matrix. A batch mode heuristic has been used to de-sign dynamic load balancing algorithms for heterogeneous distributed computing systems with four different type of machine heterogeneity. A number of experiments have been conducted to study the performance of load balancing algorithms with three different ar-rival rate for the task. A better performance of the algorithms is observed with increasing of heterogeneity in the HDCS.A new codification scheme suitable to simulated annealing and genetic algorithm has been introduced to design dynamic load balancing algorithms for HDCS. These stochastic iterative load balancing algorithms uses sliding window techniques to select a batch of tasks, and allocate them to the computing nodes in the HDCS. The proposed dynamic genetic algorithm based load balancer has been found to be effective, especially in the case of a large number of tasks
- …