65 research outputs found
Continual Learning in Practice
This paper describes a reference architecture for self-maintaining systems
that can learn continually, as data arrives. In environments where data
evolves, we need architectures that manage Machine Learning (ML) models in
production, adapt to shifting data distributions, cope with outliers, retrain
when necessary, and adapt to new tasks. This represents continual AutoML or
Automatically Adaptive Machine Learning. We describe the challenges and
proposes a reference architecture.Comment: Presented at the NeurIPS 2018 workshop on Continual Learning
https://sites.google.com/view/continual2018/hom
Recommended from our members
Attribute-Based Prediction of File Properties
We present evidence that attributes that are known to the file system when a file is created, such as its name, permission mode, and owner, are often strongly related to future properties of the file such as its ultimate size, lifespan, and access pattern. More importantly, we show that we can exploit these relationships to automatically generate predictive models for these properties, and that these predictions are sufficiently accurate to enable optimizations.Engineering and Applied Science
Enabling What-if Explorations in Systems (CMU-PDL-07-103)
With a large percentage of total system cost going to system administration tasks, ease of system management remains a difficult and important goal. As a step towards that goal, this dissertation presents a success story on building systems that are self-predicting. Self-predicting systems continuously monitor themselves and provide quantitative answers to What...if questions about hypothetical workload or resource changes. Self-prediction has the potential to simplify administrators' decision making, such as acquisition planning and performance tuning, by reducing the detailed workload and internal system knowledge required.
Self-prediction has as the primary building block mathematical models, that, once built into the system, analyze past, and predict future behavior. Because of the traditional disconnect between systems researchers and theoretical researchers, however, there are fundamental difficulties in enabling existing mathematical models to make meaningful predictions in real systems. In part, this dissertation serves as a bridge between research in theory (e.g., queuing theory and statistical theory) and research in systems (e.g., database and storage systems). It identifies ways to build systems to support use of mathematical models and addresses fundamental show-stoppers that keep models from being useful in practice. For example, we explore many opportunities to deeply understand workload-system interactions by having models be first-class system components, rather than developing and deploying them separately from the system, as is traditionally done. As another example, lack of good measurement information in a distributed system can be a show-stopper for models based on queuing analysis. This dissertation introduces a measurement framework that replaces performance counters with end-to-end activity tracing. End-to-end tracing allows contextual information to be propagated with requests so that queuing models can attribute resource demands to the correct workloads. In addition, this dissertation presents a first step towards a robust, hybrid mathematical modeling framework, based on models that reflect domain expertise and models that guide model designers to discover new, unforeseen system behavior once the system is deployed. Such robust models could continuously evaluate their accuracy and adjust their predictions accordingly. Self-evaluation can enable confidence values to be provided with predictions, including identification of situations where no trustworthy predictions can be produced.
Through an analysis of positive and negative lessons learned, in a storage system that we designed from scratch as well as in a legacy commercial database system, this dissertation makes the case that systems can be built to accommodate mathematical models efficiently, but cautions that mathematical models are not a panacea. Models are as good as the system is; to make predictions more meaningful, systems should be built so that they are inherently more predictable to start with
Multi-structured redundancy
One-size-fits-all solutions have not worked well in storage systems. This is true in the enterprise where noSQL, Map-Reduce and column-stores have added value to traditional database workloads. This is also true outside the enterprise. A recent paper [7] illustrated that even the single-desktop store is a rich mixture of file systems, databases and key-value stores. Yet, in research one-size-fits-all solutions are always tempting and pointoptimizations emerge, with the current theme du jour being key-value stores [8]. Workloads naturally change their requirements over time (e.g., from update-intensive to query-intensive). This paper proposes research around a multi-structured storage architecture. Such architecture is composed of many lightweight data structures such as BTrees, keyvalue stores, graph stores and chunk stores. The call for modular storage and systems is not dissimilar to the Exokernel [4] or Anvil [10] approaches. The key difference that this paper argues about is that we want these data structures to co-exist in the same system. The system should then automatically use the right one at the right workload phase. To enable this technically, we propose to leverage the existing N-way redundancy in the data center and have each of N replicas embody a different data structure. 1 Introduction an
- …