43 research outputs found
Recommended from our members
Perspectives on distributed computing : thirty people, four user types, and the distributed computing user experience.
This report summarizes the methodology and results of a user perspectives study conducted by the Community Driven Improvement of Globus Software (CDIGS) project. The purpose of the study was to document the work-related goals and challenges facing today's scientific technology users, to record their perspectives on Globus software and the distributed-computing ecosystem, and to provide recommendations to the Globus community based on the observations. Globus is a set of open source software components intended to provide a framework for collaborative computational science activities. Rather than attempting to characterize all users or potential users of Globus software, our strategy has been to speak in detail with a small group of individuals in the scientific community whose work appears to be the kind that could benefit from Globus software, learn as much as possible about their work goals and the challenges they face, and describe what we found. The result is a set of statements about specific individuals experiences. We do not claim that these are representative of a potential user community, but we do claim to have found commonalities and differences among the interviewees that may be reflected in the user community as a whole. We present these as a series of hypotheses that can be tested by subsequent studies, and we offer recommendations to Globus developers based on the assumption that these hypotheses are representative. Specifically, we conducted interviews with thirty technology users in the scientific community. We included both people who have used Globus software and those who have not. We made a point of including individuals who represent a variety of roles in scientific projects, for example, scientists, software developers, engineers, and infrastructure providers. The following material is included in this report: (1) A summary of the reported work-related goals, significant issues, and points of satisfaction with the use of Globus software; (2) A method for characterizing users according to their technology interactions, and identification of four user types among the interviewees using the method; (3) Four profiles that highlight points of commonality and diversity in each user type; (4) Recommendations for technology developers and future studies; (5) A description of the interview protocol and overall study methodology; (6) An anonymized list of the interviewees; and (7) Interview writeups and summary data. The interview summaries in Section 3 and transcripts in Appendix D illustrate the value of distributed computing software--and Globus in particular--to scientific enterprises. They also document opportunities to make these tools still more useful both to current users and to new communities. We aim our recommendations at developers who intend their software to be used and reused in many applications. (This kind of software is often referred to as 'middleware.') Our two core recommendations are as follows. First, it is essential for middleware developers to understand and explicitly manage the multiple user products in which their software components are used. We must avoid making assumptions about the commonality of these products and, instead, study and account for their diversity. Second, middleware developers should engage in different ways with different kinds of users. Having identified four general user types in Section 4, we provide specific ideas for how to engage them in Section 5
Improving cache Behavior in CMP architectures throug cache partitioning techniques
The evolution of microprocessor design in the last few decades has changed significantly, moving from simple inorder single core architectures to superscalar and vector architectures in order to extract the maximum available instruction level parallelism. Executing several instructions from the same thread in parallel allows significantly improving the performance of an application. However, there is only a limited amount of parallelism available in each thread, because of data and control dependences. Furthermore, designing a high performance, single, monolithic processor has become very complex due to power and chip latencies constraints. These limitations have motivated the use of thread level parallelism (TLP) as a common strategy for improving processor performance. Multithreaded processors allow executing different threads at the same time, sharing some hardware resources. There are several flavors of multithreaded processors that exploit the TLP, such as chip multiprocessors (CMP), coarse grain multithreading, fine grain multithreading, simultaneous multithreading (SMT), and combinations of them.To improve cost and power efficiency, the computer industry has adopted multicore chips. In particular, CMP architectures have become the most common design decision (combined sometimes with multithreaded cores). Firstly, CMPs reduce design costs and average power consumption by promoting design re-use and simpler processor cores. For example, it is less complex to design a chip with many small, simple cores than a chip with fewer, larger, monolithic cores.Furthermore, simpler cores have less power hungry centralized hardware structures. Secondly, CMPs reduce costs by improving hardware resource utilization. On a multicore chip, co-scheduled threads can share costly microarchitecture resources that would otherwise be underutilized. Higher resource utilization improves aggregate performance and enables lower cost design alternatives.One of the resources that impacts most on the final performance of an application is the cache hierarchy. Caches store data recently used by the applications in order to take advantage of temporal and spatial locality of applications. Caches provide fast access to data, improving the performance of applications. Caches with low latencies have to be small, which prompts the design of a cache hierarchy organized into several levels of cache.In CMPs, the cache hierarchy is normally organized in a first level (L1) of instruction and data caches private to each core. A last level of cache (LLC) is normally shared among different cores in the processor (L2, L3 or both). Shared caches increase resource utilization and system performance. Large caches improve performance and efficiency by increasing the probability that each application can access data from a closer level of the cache hierarchy. It also allows an application to make use of the entire cache if needed.A second advantage of having a shared cache in a CMP design has to do with the cache coherency. In parallel applications, different threads share the same data and keep a local copy of this data in their cache. With multiple processors, it is possible for one processor to change the data, leaving another processor's cache with outdated data. Cache coherency protocol monitors changes to data and ensures that all processor caches have the most recent data. When the parallel application executes on the same physical chip, the cache coherency circuitry can operate at the speed of on-chip communications, rather than having to use the much slower between-chip communication, as is required with discrete processors on separate chips. These coherence protocols are simpler to design with a unified and shared level of cache onchip.Due to the advantages that multicore architectures offer, chip vendors use CMP architectures in current high performance, network, real-time and embedded systems. Several of these commercial processors have a level of the cache hierarchy shared by different cores. For example, the Sun UltraSPARC T2 has a 16-way 4MB L2 cache shared by 8 cores each one up to 8-way SMT. Other processors like the Intel Core 2 family also share up to a 12MB 24-way L2 cache. In contrast, the AMD K10 family has a private L2 cache per core and a shared L3 cache, with up to a 6MB 64-way L3 cache.As the long-term trend of increasing integration continues, the number of cores per chip is also projected to increase with each successive technology generation. Some significant studies have shown that processors with hundreds of cores per chip will appear in the market in the following years. The manycore era has already begun. Although this era provides many opportunities, it also presents many challenges. In particular, higher hardware resource sharing among concurrently executing threads can cause individual thread's performance to become unpredictable and might lead to violations of the individual applications' performance requirements. Current resource management mechanisms and policies are no longer adequate for future multicore systems.Some applications present low re-use of their data and pollute caches with data streams, such as multimedia, communications or streaming applications, or have many compulsory misses that cannot be solved by assigning more cache space to the application. Traditional eviction policies such as Least Recently Used (LRU), pseudo LRU or random are demand-driven, that is, they tend to give more space to the application that has more accesses to the cache hierarchy.When no direct control over shared resources is exercised (the last level cache in this case), it is possible that a particular thread allocates most of the shared resources, degrading other threads performance. As a consequence, high resource sharing and resource utilization can cause systems to become unstable and violate individual applications' requirements. If we want to provide a Quality of Service (QoS) to applications, we need to enhance the control over shared resources and enrich the collaboration between the OS and the architecture.In this thesis, we propose software and hardware mechanisms to improve cache sharing in CMP architectures. We make use of a holistic approach, coordinating targets of software and hardware to improve system aggregate performance and provide QoS to applications. We make use of explicit resource allocation techniques to control the shared cache in a CMP architecture, with resource allocation targets driven by hardware and software mechanisms.The main contributions of this thesis are the following:- We have characterized different single- and multithreaded applications and classified workloads with a systematic method to better understand and explain the cache sharing effects on a CMP architecture. We have made a special effort in studying previous cache partitioning techniques for CMP architectures, in order to acquire the insight to propose improved mechanisms.- In CMP architectures with out-of-order processors, cache misses can be served in parallel and share the miss penalty to access main memory. We take this fact into account to propose new cache partitioning algorithms guided by the memory-level parallelism (MLP) of each application. With these algorithms, the system performance is improved (in terms of throughput and fairness) without significantly increasing the hardware required by previous proposals.- Driving cache partition decisions with indirect indicators of performance such as misses, MLP or data re-use may lead to suboptimal cache partitions. Ideally, the appropriate metric to drive cache partitions should be the target metric to optimize, which is normally related to IPC. Thus, we have developed a hardware mechanism, OPACU, which is able to obtain at run-time accurate predictions of the performance of an application when running with different cache assignments.- Using performance predictions, we have introduced a new framework to manage shared caches in CMP architectures, FlexDCP, which allows the OS to optimize different IPC-related target metrics like throughput or fairness and provide QoS to applications. FlexDCP allows an enhanced coordination between the hardware and the software layers, which leads to improved system performance and flexibility.- Next, we have made use of performance estimations to reduce the load imbalance problem in parallel applications. We have built a run-time mechanism that detects parallel applications sensitive to cache allocation and, in these situations, the load imbalance is reduced by assigning more cache space to the slowest threads. This mechanism, helps reducing the long optimization time in terms of man-years of effort devoted to large-scale parallel applications.- Finally, we have stated the main characteristics that future multicore processors with thousands of cores should have. An enhanced coordination between the software and hardware layers has been proposed to better manage the shared resources in these architectures
Requirements Engineering in Building Climate Science Software.
Software has an important role in supporting scientific work. This dissertation studies teams that build scientific software, focusing on the way that they determine what the software should do. These requirements engineering processes are investigated through three case studies of climate science software projects. The Earth System Modeling Framework assists modeling applications, the Earth System Grid distributes data via a web portal, and the NCAR (National Center for Atmospheric Research) Command Language is used to convert, analyze and visualize data. Document analysis, observation, and interviews were used to investigate the requirements-related work.
The first research question is about how and why stakeholders engage in a project, and what they do for the project. Two key findings arise. First, user counts are a vital measure of project success, which makes adoption important and makes counting tricky and political. Second, despite the importance of quantities of users, a few particular “power users” develop a relationship with the software developers and play a special role in providing feedback to the software team and integrating the system into user practice.
The second research question focuses on how project objectives are articulated and how they are put into practice. The team seeks to both build a software system according to product requirements but also to conduct their work according to process requirements such as user support. Support provides essential communication between users and developers that assists with refining and identifying requirements for the software. It also helps users to learn and apply the software to their real needs. User support is a vital activity for scientific software teams aspiring to create infrastructure.
The third research question is about how change in scientific practice and knowledge leads to changes in the software, and vice versa. The “thickness” of a layer of software infrastructure impacts whether the software team or users have control and responsibility for making changes in response to new scientific ideas. Thick infrastructure provides more functionality for users, but gives them less control of it. The stability of infrastructure trades off against the responsiveness that the infrastructure can have to user needs.Ph.D.InformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86438/1/archerb_1.pd