27 research outputs found
GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection
Detecting out-of-distribution (OOD) examples is crucial to guarantee the
reliability and safety of deep neural networks in real-world settings. In this
paper, we offer an innovative perspective on quantifying the disparities
between in-distribution (ID) and OOD data -- analyzing the uncertainty that
arises when models attempt to explain their predictive decisions. This
perspective is motivated by our observation that gradient-based attribution
methods encounter challenges in assigning feature importance to OOD data,
thereby yielding divergent explanation patterns. Consequently, we investigate
how attribution gradients lead to uncertain explanation outcomes and introduce
two forms of abnormalities for OOD detection: the zero-deflation abnormality
and the channel-wise average abnormality. We then propose GAIA, a simple and
effective approach that incorporates Gradient Abnormality Inspection and
Aggregation. The effectiveness of GAIA is validated on both commonly utilized
(CIFAR) and large-scale (ImageNet-1k) benchmarks. Specifically, GAIA reduces
the average FPR95 by 23.10% on CIFAR10 and by 45.41% on CIFAR100 compared to
advanced post-hoc methods.Comment: Accepted by NeurIPS202
Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning
This paper proposes Shoggoth, an efficient edge-cloud collaborative
architecture, for boosting inference performance on real-time video of changing
scenes. Shoggoth uses online knowledge distillation to improve the accuracy of
models suffering from data drift and offloads the labeling process to the
cloud, alleviating constrained resources of edge devices. At the edge, we
design adaptive training using small batches to adapt models under limited
computing power, and adaptive sampling of training frames for robustness and
reducing bandwidth. The evaluations on the realistic dataset show 15%-20% model
accuracy improvement compared to the edge-only strategy and fewer network costs
than the cloud-only strategy.Comment: Accepted by 60th ACM/IEEE Design Automation Conference (DAC2023
EdgeMA: Model Adaptation System for Real-Time Video Analytics on Edge Devices
Real-time video analytics on edge devices for changing scenes remains a
difficult task. As edge devices are usually resource-constrained, edge deep
neural networks (DNNs) have fewer weights and shallower architectures than
general DNNs. As a result, they only perform well in limited scenarios and are
sensitive to data drift. In this paper, we introduce EdgeMA, a practical and
efficient video analytics system designed to adapt models to shifts in
real-world video streams over time, addressing the data drift problem. EdgeMA
extracts the gray level co-occurrence matrix based statistical texture feature
and uses the Random Forest classifier to detect the domain shift. Moreover, we
have incorporated a method of model adaptation based on importance weighting,
specifically designed to update models to cope with the label distribution
shift. Through rigorous evaluation of EdgeMA on a real-world dataset, our
results illustrate that EdgeMA significantly improves inference accuracy.Comment: Accepted by 30th International Conference on Neural Information
Processing (ICONIP 2023
BMCloud: Minimizing Repair Bandwidth and Maintenance Cost in Cloud Storage
To protect data in cloud storage, fault tolerance and efficient recovery become very important. Recent studies have developed numerous solutions based on erasure code techniques to solve this problem using functional repairs. However, there are two limitations to address. The first one is consistency since the Encoding Matrix (EM) is different among clouds. The other one is repairing bandwidth, which is a concern for most of us. We addressed these two problems from both theoretical and practical perspectives. We developed BMCloud, a new low repair bandwidth, low maintenance cost cloud storage system, which aims to reduce repair bandwidth and maintenance cost. The system employs both functional repair and exact repair while it inherits advantages from the both. We propose the JUDGE_STYLE algorithm, which can judge whether the system should adopt exact repair or functional repair. We implemented a networked storage system prototype and demonstrated our findings. Compared with existing solutions, BMCloud can be used in engineering to save repair bandwidth and degrade maintenance significantly
Metadata Management For Distributed Multimedia Storage System
As a result of the multimedia rapid growth, there has been a huge increase in the amount of information generated and shared by people all over the world. Demand for large-scale multimedia storage system is growing rapidly. This paper describes the design and implementation of the Two-level metadata server for Distributed Multimedia Storage System (DMSS). The DMSS divides the logical view of the stored data from the physical view. The logical view is managed by the metadata server which is called GMS (Global Metadata Server), and the physical view is managed by a component of storage servers which is called LMS (Local Metadata Server). Adopting LMS, each storage server can maintain its own storage resources and metadata and data independently, and can offer storage service independently. The DMSS allows the application servers to access the storage servers directly and in parallel providing very high performance. © 2008 IEEE
Greencht: A Power-Proportional Replication Scheme For Consistent Hashing Based Key Value Storage Systems
Distributed key value storage systems are widely used by many popular networking corporations. Nevertheless, server power consumption has become a growing concern for key value storage system designers since the power consumption of servers contributes substantially to a data center\u27s power bills. In this paper, we propose GreenCHT, a power-proportional replication scheme for consistent hashing based key value storage systems. GreenCHT consists of a power-aware replication strategy - multi-tier replication strategy and a centralized power control service - predictive power-mode scheduler. The multitier replication provides power-proportionality and ensures data availability, reliability, consistency, as well as fault-tolerance of the whole system. The predictive power-mode scheduler component predicts workloads and exploits load fluctuation to schedule nodes to be powered-up and powered-down. GreenCHT is implemented based on Sheepdog, a distributed key value system that uses consistent hashing as an underlying distributed hash table. By replicating twelve real workload traces collected from Microsoft, the evaluation results show that GreenCHT can provide significant power savings while maintaining an acceptable performance. We observed that GreenCHT can reduce power consumption by up to 35%-61%
The Research And Design For High Availability Object Storage System
With the growing scale of the computer storage systems, the likelihood of multi-disk failures happening in the storage systems has increased dramatically. Based on a thorough analysis on the fault-tolerance capability on various existing storage systems, we propose a new hierarchical, highly reliable, multi-disk fault-tolerant storage system architecture: High Availability Object Storage System (HAOSS). In the HAOSS, each object has an attribute field for reliability level, which can be set by the user according to the importance of data. Higher reliability level corresponds to better data survivability in case of multi-device failure. The HAOSS is composed of two layers: the upper-layer and the lower-layer. The upper-layer achieves the high availability by storing multiple replicas for each storage object in a set of storage devices. The individual replicas can service the I/O requests in parallel so as to obtain high performance. The lower-layer deploys RAID5, RAID6 or RAID-Blaum coding schemes to tolerate multi-disk failures. In addition, the disk utilization rate of RAID-Blaum is higher than that of multiple replicas, and it can be further improved by growing the RAID group size. These advantages come at the price of more complicated fault-tolerant coding schemes, which involve a large amount of calculation for encoding and cause an adverse impact on the I/O performance, especially on the write performance. Results from both our internal experiments and third-party independent tests have shown that HAOSS servers have better multi-disk- failure tolerance than existing similar products. In a 1000Mb Ethernet interconnection environment, with a request block size of 1024KB, the sequential read performance for a HAOSS server reaches 104MB/s, which is very close to the theoretical maximum effective bandwidth of Ethernet networks. The HAOSS offers a complete storage solution for high availability applications without the compromises that today\u27s storage systems require in either performance or fault-tolerance. © 2009 SPIE
S2-RAID: A New RAID Architecture for Fast Data Recovery
Abstract--As disk volume grows rapidly with terabyte disk becoming a norm, RAID reconstruction time in case of a failure takes prohibitively long time. This paper presents a new RAID architecture, S 2-RAID, allowing the disk array to reconstruct very quickly in case of a disk failure. The idea is to form skewed sub RAIDs (S 2-RAID) in the RAID structure so that reconstruction can be done in parallel dramatically speeding up data reconstruction time and hence minimizing the chance of data loss. To make such parallel reconstruction conflict-free, each sub-RAID is formed by selecting one logic partition from each disk group with size being a prime number. We have implemented a prototype S 2-RAID system in Linux operating system for the purpose of evaluating its performance potential. SPC IO traces and standard benchmarks have been used to measure the performance of S 2-RAID as compared to existing baseline software RAID, MD. Experimental results show that our new S 2-RAID speeds up data reconstruction time by a factor of 3 to 6 compared to the traditional RAID. At the same time, S 2-RAID shows similar or better production performance than baseline RAID while online RAID reconstruction is in progress.
A New High-Performance, Energy-Efficient Replication Storage System With Reliability Guarantee
In modern replication storage systems where data carries two or more multiple copies, a primary group of disks is always up to service incoming requests while other disks are often spun down to sleep states to save energy during slack periods. However, since new writes cannot be immediately synchronized onto all disks, system reliability is degraded. This paper develops PERAID, a new high-performance, energy-efficient replication storage system, which aims to improve both performance and energy efficiency without compromising reliability. It employs a parity software RAID as a virtual write buffer disk at the front end to absorb new writes. Since extra parity redundancy supplies two or more copies, PERAID guarantees comparable reliability with that of a replication storage system. In addition, PERAID offers better write performance compared to the replication system by avoiding the classical small-write problem in traditional parity RAID: buffering many small random writes into few large writes and writing to storage in a parallel fashion. By evaluating our PERAID prototype using two benchmarks and two real-life traces, we found that PERAID significantly improves write performance and saves more energy than existing solutions such as GRAID, eRAID. © 2012 IEEE
S\u3csup\u3e2\u3c/sup\u3e-RAID: Parallel RAID architecture for fast data recovery
As disk volume grows rapidly with terabyte disk becoming a norm, RAID reconstruction process in case of a failure takes prohibitively long time. This paper presents a new RAID architecture, S2-RAID, allowing the disk array to reconstruct very quickly in case of a disk failure. The idea is to form skewed sub-arrays in the RAID structure so that reconstruction can be done in parallel dramatically speeding up data reconstruction process and hence minimizing the chance of data loss. We analyse the data recovery ability of this architecture and show its good scalability. A prototype S2-RAID system has been built and implemented in the Linux operating system for the purpose of evaluating its performance potential. Real world I/O traces including SPC, Microsoft, and a collection of a production environment have been used to measure the performance of S2-RAID as compared to existing baseline software RAID5, Parity Declustering, and RAID50. Experimental results show that our new S2-RAID speeds up data reconstruction time by a factor 2 to 4 compared to the traditional RAID. Meanwhile, S2-RAID keeps comparable production performance to that of the baseline RAID layouts while online RAID reconstruction is in progress. © 1990-2012 IEEE