Search CORE

321 research outputs found

Efficient Resource Management for Deep Learning Clusters

Author: Gu Juncheng
Publication venue
Publication date: 01/01/2021
Field of study

Deep Learning (DL) is gaining rapid popularity in various domains, such as computer vision, speech recognition, etc. With the increasing demands, large clusters have been built to develop DL models (i.e., data preparation and model training). DL jobs have some unique features ranging from their hardware requirements to execution patterns. However, the resource management techniques applied in existing DL clusters have not yet been adapted to those new features, which leads to resource inefficiency and hurts the performance of DL jobs. We observed three major challenges brought by DL jobs. First, data preparation jobs, which prepare training datasets from a large volume of raw data, are memory intensive. DL clusters often over-allocate memory resource to those jobs for protecting their performance, which causes memory underutilization in DL clusters. Second, the execution time of a DL training job is often unknown before job completion. Without such information, existing cluster schedulers are unable to minimize the average Job Completion Time (JCT) of those jobs. Third, model aggregations in Distributed Deep Learning (DDL) training are often assigned with a fixed group of CPUs. However, a large portion of those CPUs are wasted because the bursty model aggregations can not saturate them all the time. In this thesis, we propose a suite of techniques to eliminate the mismatches between DL jobs and resource management in DL clusters. First, we bring the idea of memory disaggregation to enhance the memory utilization of DL clusters. The unused memory in data preparation jobs is exposed as remote memory to other machines that are running out of local memory. Second, we design a two-dimensional attained-service-based scheduler to optimize the average JCT of DL training jobs. This scheduler takes the temporal and spatial characteristics of DL training jobs into consideration and can efficiently schedule them without knowing their execution time. Third, we define a shared model aggregation service to reduce the CPU cost of DDL training. Using this service, model aggregations from different DDL training jobs are carefully packed together and use the same group of CPUs in a time-sharing manner. With these techniques, we demonstrate that huge improvements in resource efficiency and job performance can be obtained when the cluster’s resource management matches with the features of DL jobs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169955/1/jcgu_1.pd

Deep Blue Documents at the University of Michigan

Placement of dynamic data objects over heterogeneous memory organizations in embedded systems

Author: Peón Quirós Miguel
Publication venue: 'Universidad Complutense de Madrid (UCM)'
Publication date: 25/01/2016
Field of study

Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadoras y Automática, leída el 24-11-2015Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

Docta Complutense

Network flow optimization for distributed clouds

Author: Abdu Jyothi Sangeetha
Publication venue
Publication date: 01/08/2019
Field of study

Internet applications, which rely on large-scale networked environments such as data centers for their back-end support, are often geo-distributed and typically have stringent performance constraints. The interconnecting networks, within and across data centers, are critical in determining these applications' performance. Data centers can be viewed as composed of three layers: physical infrastructure consisting of servers, switches, and links, control platforms that manage the underlying resources, and applications that run on the infrastructure. This dissertation shows that network flow optimization can improve performance of distributed applications in the cloud by designing high-throughput schemes spanning all three layers. At the physical infrastructure layer, we devise a framework for measuring and understanding throughput of network topologies. We develop a heuristic for estimating the worst-case performance of any topology and propose a systematic methodology for comparing performance of networks built with different equipment. At the control layer, we put forward a source-routed data center fabric which can achieve near-optimal throughput performance by leveraging a large number of available paths while using limited memory in switches. At the application layer, we show that current Application Network Interfaces (ANIs), abstractions that translate an application's performance goals to actionable network objectives, fail to capture the requirements of many emerging applications. We put forward a novel ANI that can capture application intent more effectively and quantify performance gains achievable with it. We also tackle resource optimization in the inter-data center context of cellular providers. In this emerging environment, a large amount of resources are geographically fragmented across thousands of micro data centers, each with a limited share of resources, necessitating cross-application optimization to satisfy diverse performance requirements and improve network and server utilization. Our solution, Patronus, employs hierarchical optimization for handling multiple performance requirements and temporally partitioned scheduling for scalability

Illinois Digital Environment for Access to Learning and Scholarship Repository

Data Resource Management in Throughput Processors

Author: Kloosterman John
Publication venue
Publication date
Field of study

Graphics Processing Units (GPUs) are becoming common in data centers for tasks like neural network training and image processing due to their high performance and efficiency. GPUs maintain high throughput by running thousands of threads simultaneously, issuing instructions from ready threads to hide latency in others that are stalled. While this is effective for keeping the arithmetic units busy, the challenge in GPU design is moving the data for computation at the same high rate. Any inefficiency in data movement and storage will compromise the throughput and energy efficiency of the system. Since energy consumption and cooling make up a large part of the cost of provisioning and running and a data center, making GPUs more suitable for this environment requires removing the bottlenecks and overheads that limit their efficiency. The performance of GPU workloads is often limited by the throughput of the memory resources inside each GPU core, and though many of the power-hungry structures in CPUs are not found in GPU designs, there is overhead for storing each thread's state. When sharing a GPU between workloads, contention for resources also causes interference and slowdown. This thesis develops techniques to manage and streamline the data movement and storage resources in GPUs in each of these places. The first part of this thesis resolves data movement restrictions inside each GPU core. The GPU memory system is optimized for sequential accesses, but many workloads load data in irregular or transposed patterns that cause a throughput bottleneck even when all loads are cache hits. This work identifies and leverages opportunities to merge requests across threads before sending them to the cache. While requests are waiting for merges, they can be reordered to achieve a higher cache hit rate. These methods yielded a 38% speedup for memory throughput limited workloads. Another opportunity for optimization is found in the register file. Since it must store the registers for thousands of active threads, it is the largest on-chip data storage structure on a GPU. The second work in this thesis replaces the register file with a smaller, more energy-efficient register buffer. Compiler directives allow the GPU to know ahead of time which registers will be accessed, allowing the hardware to store only the registers that will be imminently accessed in the buffer, with the rest moved to main memory. This technique reduced total GPU energy by 11%. Finally, in a data center, many different applications will be launching GPU jobs, and just as multiple processes can share the same CPU to increase its utilization, running multiple workloads on the same GPU can increase its overall throughput. However, co-runners interfere with each other in unpredictable ways, especially when sharing memory resources. The final part of this thesis controls this interference, allowing a GPU to be shared between two tiers of workloads: one tier with a high performance target and another suitable for batch jobs without deadlines. At a 90% performance target, this technique increased GPU throughput by 9.3%. GPUs' high efficiency and performance makes them a valuable accelerator in the data center. The contributions in this thesis further increase their efficiency by removing data movement and storage overheads and unlock additional performance by enabling resources to be shared between workloads while controlling interference.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146122/1/jklooste_1.pd

Deep Blue Documents at the University of Michigan

Mothers\u27 Work and Children\u27s Lives: Low-Income Families after Welfare Reform

Author: Dunifon Rachel E.
Johnson Rucker C.
Kalil Ariel
Publication venue: Upjohn Research
Publication date: 17/02/2010
Field of study

This book examines the effects of work requirements imposed by welfare reform on low-income women and their families. The authors pay particular attention to the nature of work—whether it is stable or unstable, the number of hours worked in a week and the regularity and flexibility of work schedules. They also show how these factors make it more difficult for low-income women to balance their work and family requirements.https://research.upjohn.org/up_press/1026/thumbnail.jp

Upjohn Research

Housing the homeless: housing crisis and caravan parks – a Bourdieusian perspective

Author: Mallinson Geraldine
Publication venue
Publication date: 01/01/2020
Field of study

Geraldine Mallinson considers housing under a Bourdieusian perspective. Housing is situated as a key social structure and repercussions of dwelling marginality are exemplified, by findings from North Queensland caravan parks. She warns of a crisis in access, adequacy and security, and makes recommendations to guide Australian housing reforms

ResearchOnline@JCU

ResearchOnline at James Cook University

Effective Use of SSDs in Database Systems

Author: Ghodsnia Pedram
Publication venue: 'University of Waterloo'
Publication date: 03/05/2018
Field of study

With the advent of solid state drives (SSDs), the storage industry has experienced a revolutionary improvement in I/O performance. Compared to traditional hard disk drives (HDDs), SSDs benefit from shorter I/O latency, better power efficiency, and cheaper random I/Os. Because of these superior properties, SSDs are gradually replacing HDDs. For decades, database management systems have been designed, architected, and optimized based on the performance characteristics of HDDs. In order to utilize the superior performance of SSDs, new methods should be developed, some database components should be redesigned, and architectural decisions should be revisited. In this thesis, novel methods are proposed to exploit the new capabilities of modern SSDs to improve the performance of database systems. The first is a new method for using SSDs as a fully persistent second level memory buffer pool. This method uses SSDs as a supplementary storage device to improve transactional throughput and to reduce the checkpoint and recovery times. A prototype of the proposed method is compared with its closest existing competitor. The second considers the impact of the parallel I/O capability of modern SSDs on the database query optimizer. It is shown that a query optimizer that is unaware of the parallel I/O capability of SSDs can make significantly sub-optimal decisions. In addition, a practical method for making the query optimizer parallel-I/O-aware is introduced and evaluated empirically. The third technique is an SSD-friendly external merge sort. This sorting technique has better performance than other common external sorting techniques. It also improves the SSD's lifespan by reducing the number of write operations required during sorting

University of Waterloo's Institutional Repository

From the Margins to the Centre:A Feminist Intersectionality Perspective of Jua Kali Rural Women Entrepreneurship in Kenya.

Author: Sindani Tabitha
Publication venue
Publication date: 14/07/2022
Field of study

Roehampton University Research Repository

Kyoto University International Symposium 2022 on Education and Research in Global Environmental Studies in Asia --20 Years of GSGES Achievements and Future Opportunities--

Author
Publication venue: 京都大学大学院地球環境学堂・地球環境学舎・三才学林
Publication date: 01/01/2022
Field of study

Kyoto University International Symposium 2022 on Education and Research in Global Environmental Studies in Asia --20 Years of GSGES Achievements and Future Opportunities--ONLINE + KYOTO UNIVERSITY, YOSHIDA CAMPUS NOV 24 - 25 , 2022Organaized by Graduate School of Global Environmental Studies (GSGES), Kyoto University1. Global Ecology pp.52. Environmental Technology pp.263. Natural Resources pp.6

Kyoto University Research Information Repository

Revisiting the essence and purpose of regional planning:A historical analysis of regional questions - going forward?

Author: Galland Daniel
Harrison John
Tewdwr-Jones Mark
Publication venue
Publication date: 01/01/2021
Field of study

VBN