18 research outputs found
Towards Scalable, Private and Practical Deep Learning
Deep Learning (DL) models have drastically improved the performance of Artificial Intelligence (AI) tasks such as image recognition, word prediction, translation, among many others, on which traditional Machine Learning (ML) models fall short. However, DL models are costly to design, train, and deploy due to their computing and memory demands. Designing DL models usually requires extensive expertise and significant manual tuning efforts. Even with the latest accelerators such as Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU), training DL models can take prohibitively long time, therefore training large DL models in a distributed manner is a norm. Massive amount of data is made available thanks to the prevalence of mobile and internet-of-things (IoT) devices. However, regulations such as HIPAA and GDPR limit the access and transmission of personal data to protect security and privacy. Therefore, enabling DL model training in a decentralized but private fashion is urgent and critical. Deploying trained DL models in a real world environment usually requires meeting Quality of Service (QoS) standards, which makes adaptability of DL models an important yet challenging matter. In this dissertation, we aim to address the above challenges to make a step towards scalable, private, and practical deep learning. To simplify DL model design, we propose Efficient Progressive Neural-Architecture Search (EPNAS) and FedCust to automatically design model architectures and tune hyperparameters, respectively. To provide efficient and robust distributed training while preserving privacy, we design LEASGD, TiFL, and HDFL. We further conduct a study on the security aspect of distributed learning by focusing on how data heterogeneity affects backdoor attacks and how to mitigate such threats. Finally, we use super resolution (SR) as an example application to explore model adaptability for cross platform deployment and dynamic runtime environment. Specifically, we propose DySR and AdaSR frameworks which enable SR models to meet QoS by dynamically adapting to available resources instantly and seamlessly without excessive memory overheads
Improving Data Management and Data Movement Efficiency in Hybrid Storage Systems
University of Minnesota Ph.D. dissertation.July 2017. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); ix, 116 pages.In the big data era, large volumes of data being continuously generated drive the emergence of high performance large capacity storage systems. To reduce the total cost of ownership, storage systems are built in a more composite way with many different types of emerging storage technologies/devices including Storage Class Memory (SCM), Solid State Drives (SSD), Shingle Magnetic Recording (SMR), Hard Disk Drives (HDD), and even across off-premise cloud storage. To make better utilization of each type of storage, industries have provided multi-tier storage through dynamically placing hot data in the faster tiers and cold data in the slower tiers. Data movement happens between devices on one single device and as well as between devices connected via various networks. Toward improving data management and data movement efficiency in such hybrid storage systems, this work makes the following contributions: To bridge the giant semantic gap between applications and modern storage systems, passing a piece of tiny and useful information (I/O access hints) from upper layers to the block storage layer may greatly improve application performance or ease data management in heterogeneous storage systems. We present and develop a generic and flexible framework, called HintStor, to execute and evaluate various I/O access hints on heterogeneous storage systems with minor modifications to the kernel and applications. The design of HintStor contains a new application/user level interface, a file system plugin and a block storage data manager. With HintStor, storage systems composed of various storage devices can perform pre-devised data placement, space reallocation and data migration polices assisted by the added access hints. Each storage device/technology has its own unique price-performance tradeoffs and idiosyncrasies with respect to workload characteristics they prefer to support. To explore the internal access patterns and thus efficiently place data on storage systems with fully connected (i.e., data can move from one device to any other device instead of moving tier by tier) differential pools (each pool consists of storage devices of a particular type), we propose a chunk-level storage-aware workload analyzer framework, simplified as ChewAnalyzer. With ChewAnalzyer, the storage manager can adequately distribute and move the data chunks across different storage pools. To reduce the duplicate content transferred between local storage devices and devices in remote data centers, an inline Network Redundancy Elimination (NRE) process with Content-Defined Chunking (CDC) policy can obtain a higher Redundancy Elimination (RE) ratio but may suffer from a considerably higher computational requirement than fixed-size chunking. We build an inline NRE appliance which incorporates an improved FPGA based scheme to speed up CDC processing. To efficiently utilize the hardware resources, the whole NRE process is handled by a Virtualized NRE (VNRE) controller. The uniqueness of this VNRE that we developed lies in its ability to exploit the redundancy patterns of different TCP flows and customize the chunking process to achieve a higher RE ratio
Recommended from our members
An Emergent Architecture for Scaling Decentralized Communication Systems (DCS)
With recent technological advancements now accelerating the mobile and wireless Internet solution space, a ubiquitous computing Internet is well within the research and industrial community's design reach - a decentralized system design, which is not solely driven by static physical models and sound engineering principals, but more dynamically, perhaps sub-optimally at initial deployment and socially-influenced in its evolution. To complement today's Internet system, this thesis proposes a Decentralized Communication System (DCS) architecture with the following characteristics: flat physical topologies with numerous compute oriented and communication intensive nodes in the network with many of these nodes operating in multiple functional roles; self-organizing virtual structures formed through alternative mobility scenarios and capable of serving ad hoc networking formations; emergent operations and control with limited dependency on centralized control and management administration. Today, decentralized systems are not commercially scalable or viable for broad adoption in the same way we have to come to rely on the Internet or telephony systems. The premise in this thesis is that DCS can reach high levels of resilience, usefulness, scale that the industry has come to experience with traditional centralized systems by exploiting the following properties: (i.) network density and topological diversity; (ii.) self-organization and emergent attributes; (iii.) cooperative and dynamic infrastructure; and (iv.) node role diversity. This thesis delivers key contributions towards advancing the current state of the art in decentralized systems. First, we present the vision and a conceptual framework for DCS. Second, the thesis demonstrates that such a framework and concept architecture is feasible by prototyping a DCS platform that exhibits the above properties or minimally, demonstrates that these properties are feasible through prototyped network services. Third, this work expands on an alternative approach to network clustering using hierarchical virtual clusters (HVC) to facilitate self-organizing network structures. With increasing network complexity, decentralized systems can generally lead to unreliable and irregular service quality, especially given unpredictable node mobility and traffic dynamics. The HVC framework is an architectural strategy to address organizational disorder associated with traditional decentralized systems. The proposed HVC architecture along with the associated promotional methodology organizes distributed control and management services by leveraging alternative organizational models (e.g., peer-to-peer (P2P), centralized or tiered) in hierarchical and virtual fashion. Through simulation and analytical modeling, we demonstrate HVC efficiencies in DCS structural scalability and resilience by comparing static and dynamic HVC node configurations against traditional physical configurations based on P2P, centralized or tiered structures. Next, an emergent management architecture for DCS exploiting HVC for self-organization, introduces emergence as an operational approach to scaling DCS services for state management and policy control. In this thesis, emergence scales in hierarchical fashion using virtual clustering to create multiple tiers of local and global separation for aggregation, distribution and network control. Emergence is an architectural objective, which HVC introduces into the proposed self-management design for scaling and stability purposes. Since HVC expands the clustering model hierarchically and virtually, a clusterhead (CH) node, positioned as a proxy for a specific cluster or grouped DCS nodes, can also operate in a micro-capacity as a peer member of an organized cluster in a higher tier. As the HVC promotional process continues through the hierarchy, each tier of the hierarchy exhibits emergent behavior. With HVC as the self-organizing structural framework, a multi-tiered, emergent architecture enables the decentralized management strategy to improve scaling objectives that traditionally challenge decentralized systems. The HVC organizational concept and the emergence properties align with and the view of the human brain's neocortex layering structure of sensory storage, prediction and intelligence. It is the position in this thesis, that for DCS to scale and maintain broad stability, network control and management must strive towards an emergent or natural approach. While today's models for network control and management have proven to lack scalability and responsiveness based on pure centralized models, it is unlikely that singular organizational models can withstand the operational complexities associated with DCS. In this work, we integrate emergence and learning-based methods in a cooperative computing manner towards realizing DCS self-management. However, unlike many existing work in these areas which break down with increased network complexity and dynamics, the proposed HVC framework is utilized to offset these issues through effective separation, aggregation and asynchronous processing of both distributed state and policy. Using modeling techniques, we demonstrate that such architecture is feasible and can improve the operational robustness of DCS. The modeling emphasis focuses on demonstrating the operational advantages of an HVC-based organizational strategy for emergent management services (i.e., reachability, availability or performance). By integrating the two approaches, the DCS architecture forms a scalable system to address the challenges associated with traditional decentralized systems. The hypothesis is that the emergent management system architecture will improve the operational scaling properties of DCS-based applications and services. Additionally, we demonstrate structural flexibility of HVC as an underlying service infrastructure to build and deploy DCS applications and layered services. The modeling results demonstrate that an HVC-based emergent management and control system operationally outperforms traditional structural organizational models. In summary, this thesis brings together the above contributions towards delivering a scalable, decentralized system for Internet mobile computing and communications
Workload Prediction for Efficient Performance Isolation and System Reliability
In large-scaled and distributed systems, like multi-tier storage systems and cloud data centers, resource sharing among workloads brings multiple benefits while introducing many performance challenges. The key to effective workload multiplexing is accurate workload prediction. This thesis focuses on how to capture the salient characteristics of the real-world workloads to develop workload prediction methods and to drive scheduling and resource allocation policies, in order to achieve efficient and in-time resource isolation among applications. For a multi-tier storage system, high-priority user work is often multiplexed with low-priority background work. This brings the challenge of how to strike a balance between maintaining the user performance and maximizing the amount of finished background work. In this thesis, we propose two resource isolation policies based on different workload prediction methods: one is a Markovian model-based and the other is a neural networks-based. These policies aim at, via workload prediction, discovering the opportune time to schedule background work with minimum impact on user performance. Trace-driven simulations verify the efficiency of the two pro- posed resource isolation policies. The Markovian model-based policy successfully schedules the background work at the appropriate periods with small impact on the user performance. The neural networks-based policy adaptively schedules user and background work, resulting in meeting both performance requirements consistently. This thesis also proposes an accurate while efficient neural networks-based pre- diction method for data center usage series, called PRACTISE. Different from the traditional neural networks for time series prediction, PRACTISE selects the most informative features from the past observations of the time series itself. Testing on a large set of usage series in production data centers illustrates the accuracy (e.g., prediction error) and efficiency (e.g., time cost) of PRACTISE. The superiority of the usage prediction also allows a proactive resource management in the highly virtualized cloud data centers. In this thesis, we analyze on the performance tickets in the cloud data centers, and propose an active sizing algorithm, named ATM, that predicts the usage workloads and re-allocates capacity to work- loads to avoid VM performance tickets. Moreover, driven by cheap prediction of usage tails, we also present TailGuard in this thesis, which dynamically clones VMs among co-located boxes, in order to efficiently reduce the performance violations of physical boxes in cloud data centers
Big Data Security (Volume 3)
After a short description of the key concepts of big data the book explores on the secrecy and security threats posed especially by cloud based data storage. It delivers conceptual frameworks and models along with case studies of recent technology
Distributed control architecture for multiservice networks
The research focuses in devising decentralised and distributed control system architecture for the management of internetworking systems to provide improved service delivery and network control. The theoretical basis, results of simulation and implementation in a real-network are presented. It is demonstrated that better performance, utilisation and fairness can be achieved for network customers as well as network/service operators with a value based control system.
A decentralised control system framework for analysing networked and shared resources is developed and demonstrated. This fits in with the fundamental principles of the Internet. It is demonstrated that distributed, multiple control loops can be run on shared resources and achieve proportional fairness in their allocation, without a central control. Some of the specific characteristic behaviours of the service and network layers are identified. The network and service layers are isolated such that each layer can evolve independently to fulfil their functions better. A common architecture pattern is devised to serve the different layers independently. The decision processes require no co-ordination between peers and hence improves scalability of the solution. The proposed architecture can readily fit into a clearinghouse mechanism for integration with business logic. This architecture can provide improved QoS and better revenue from both reservation-less and reservation-based networks. The limits on resource usage for different types of flows are analysed. A method that can sense and modify user utilities and support dynamic price offers is devised. An optimal control system (within the given conditions), automated provisioning, a packet scheduler to enforce the control and a measurement system etc are developed. The model can be extended to enhance the autonomicity of the computer communication networks in both client-server and P2P networks and can be introduced on the Internet in an incremental fashion. The ideas presented in the model built with the model-view-controller and electronic enterprise architecture frameworks are now independently developed elsewhere into common service delivery platforms for converged networks.
Four US/EU patents were granted based on the work carried out for this thesis, for the cross-layer architecture, multi-layer scheme, measurement system and scheduler. Four conference papers were published and presented
Adding Machine Intelligence to Hybrid Memory Management
Computing platforms increasingly incorporate heterogeneous memory hardware technologies, as a way to scale application performance, memory capacities and achieve cost effectiveness. However, this heterogeneity, along with the greater irregularity in the behavior of emerging workloads, render existing hybrid memory management approaches ineffective, calling for more intelligent methods. To this end, this thesis reveals new insights, develops novel methods and contributes system-level mechanisms towards the practical integration of machine learning to hybrid memory management, boosting application performance and system resource efficiency.
First, this thesis builds Kleio; a hybrid memory page scheduler with machine intelligence. Kleio deploys Recurrent Neural Networks to learn memory access patterns at a page granularity and to improve upon the selection of dynamic page migrations across the memory hardware components. Kleio cleverly focuses the machine learning on the page subset whose timely movement will reveal most application performance improvement, while preserving history-based lightweight management for the rest of the pages. In this way, Kleio bridges on average 80% of the relative existing performance gap, while laying the grounds for practical machine intelligent data management with manageable learning overheads.
In addition, this thesis contributes three system-level mechanisms to further boost application performance and reduce the operational and learning overheads of machine learning-based hybrid memory management. First, this thesis builds Cori; a system-level solution for tuning the operational frequency of periodic page schedulers for hybrid memories. Cori leverages insights on data reuse times to fine tune the page migration frequency in a lightweight manner. Second, this thesis contributes Coeus; a page grouping mechanism for page schedulers like Kleio. Coeus leverages Cori’s data reuse insights to tune the granularity at which patterns are interpreted by the page scheduler and enable the training of a single Recurrent Neural Network per page cluster, reducing by 3x the model training times. The combined effects of Cori and Coeus provide 3x additional performance improvements to Kleio. Finally, this thesis proposes Cronus; an image-based page selector for page schedulers like Kleio. Cronus uses visualization to accelerate the process of selecting which page patterns should be managed with machine learning, reducing by 75x the operational overheads of Kleio. Cronus lays the foundations for future use of visualization and computer vision methods in memory management, such as image-based memory access pattern classification, recognition and prediction.Ph.D
Recommended from our members
Making Data Storage Efficient in the Era of Cloud Computing
We enter the era of cloud computing in the last decade, as many paradigm shifts are happening on how people write and deploy applications. Despite the advancement of cloud computing, data storage abstractions have not evolved much, causing inefficiencies in performance, cost, and security.
This dissertation proposes a novel approach to make data storage efficient in the era of cloud computing by building new storage abstractions and systems that bridge the gap between cloud computing and data storage and simplify development. We build four systems to address four data inefficiencies in cloud computing.
The first system, Grandet, solves the data storage inefficiency caused by the paradigm shift from upfront provisioning to a variety of pay-as-you-go cloud services. Grandet is an extensible storage system that significantly reduces storage costs for web applications deployed in the cloud. Under the hood, it supports multiple heterogeneous stores and unifies them by placing each data object at the store deemed most economical. Our results show that Grandet reduces their costs by an average of 42.4%, and it is fast, scalable, and easy to use.
The second system, Unic, solves the data inefficiency caused by the paradigm shift from single-tenancy to multi-tenancy. Unic securely deduplicates general computations. It exports a cache service that allows cloud applications running on behalf of mutually distrusting users to memoize and reuse computation results, thereby improving performance. Unic achieves both integrity and secrecy through a novel use of code attestation, and it provides a simple yet expressive API that enables applications to deduplicate their own rich computations. Our results show that Unic is easy to use, speeds up applications by an average of 7.58x, and with little storage overhead.
The third system, Lambdata, solves the data inefficiency caused by the paradigm shift to serverless computing, where developers only write core business logic, and cloud service providers maintain all the infrastructure. Lambdata is a novel serverless computing system that enables developers to declare a cloud function's data intents, including both data read and data written. Once data intents are made explicit, Lambdata performs a variety of optimizations to improve speed, including caching data locally and scheduling functions based on code and data locality. Our results show that Lambdata achieves an average speedup of 1.51x on the turnaround time of practical workloads and reduces monetary cost by 16.5%.
The fourth system, CleanOS, solves the data inefficiency caused by the paradigm shift from desktop computers to smartphones always connected to the cloud. CleanOS is a new Android-based operating system that manages sensitive data rigorously and maintains a clean environment at all times. It identifies and tracks sensitive data, encrypts it with a key, and evicts that key to the cloud when the data is not in active use on the device. Our results show that CleanOS limits sensitive-data exposure drastically while incurring acceptable overheads on mobile networks
Advances in Artificial Intelligence: Models, Optimization, and Machine Learning
The present book contains all the articles accepted and published in the Special Issue “Advances in Artificial Intelligence: Models, Optimization, and Machine Learning” of the MDPI Mathematics journal, which covers a wide range of topics connected to the theory and applications of artificial intelligence and its subfields. These topics include, among others, deep learning and classic machine learning algorithms, neural modelling, architectures and learning algorithms, biologically inspired optimization algorithms, algorithms for autonomous driving, probabilistic models and Bayesian reasoning, intelligent agents and multiagent systems. We hope that the scientific results presented in this book will serve as valuable sources of documentation and inspiration for anyone willing to pursue research in artificial intelligence, machine learning and their widespread applications