42 research outputs found
LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs
Question-answering (QA) is a significant application of Large Language Models
(LLMs), shaping chatbot capabilities across healthcare, education, and customer
service. However, widespread LLM integration presents a challenge for small
businesses due to the high expenses of LLM API usage. Costs rise rapidly when
domain-specific data (context) is used alongside queries for accurate
domain-specific LLM responses. One option is to summarize the context by using
LLMs and reduce the context. However, this can also filter out useful
information that is necessary to answer some domain-specific queries. In this
paper, we shift from human-oriented summarizers to AI model-friendly summaries.
Our approach, LeanContext, efficiently extracts key sentences from the
context that are closely aligned with the query. The choice of is neither
static nor random; we introduce a reinforcement learning technique that
dynamically determines based on the query and context. The rest of the less
important sentences are reduced using a free open source text reduction method.
We evaluate LeanContext against several recent query-aware and query-unaware
context reduction approaches on prominent datasets (arxiv papers and BBC news
articles). Despite cost reductions of to , LeanContext's
ROUGE-1 score decreases only by to compared to a baseline
that retains the entire context (no summarization). Additionally, if free
pretrained LLM-based summarizers are used to reduce context (into human
consumable summaries), LeanContext can further modify the reduced context to
enhance the accuracy (ROUGE-1 score) by to .Comment: The paper is under revie
Differentiable JPEG: The Devil is in the Details
JPEG remains one of the most widespread lossy image coding methods. However,
the non-differentiable nature of JPEG restricts the application in deep
learning pipelines. Several differentiable approximations of JPEG have recently
been proposed to address this issue. This paper conducts a comprehensive review
of existing diff. JPEG approaches and identifies critical details that have
been missed by previous methods. To this end, we propose a novel diff. JPEG
approach, overcoming previous limitations. Our approach is differentiable
w.r.t. the input image, the JPEG quality, the quantization tables, and the
color conversion parameters. We evaluate the forward and backward performance
of our diff. JPEG approach against existing methods. Additionally, extensive
ablations are performed to evaluate crucial design choices. Our proposed diff.
JPEG resembles the (non-diff.) reference implementation best, significantly
surpassing the recent-best diff. approach by dB (PSNR) on average. For
strong compression rates, we can even improve PSNR by dB. Strong
adversarial attack results are yielded by our diff. JPEG, demonstrating the
effective gradient approximation. Our code is available at
https://github.com/necla-ml/Diff-JPEG.Comment: Accepted at WACV 2024. Project page:
https://christophreich1996.github.io/differentiable_jpeg/ WACV paper:
https://openaccess.thecvf.com/content/WACV2024/html/Reich_Differentiable_JPEG_The_Devil_Is_in_the_Details_WACV_2024_paper.htm
Deep Video Codec Control
Lossy video compression is commonly used when transmitting and storing video
data. Unified video codecs (e.g., H.264 or H.265) remain the de facto standard,
despite the availability of advanced (neural) compression approaches.
Transmitting videos in the face of dynamic network bandwidth conditions
requires video codecs to adapt to vastly different compression strengths. Rate
control modules augment the codec's compression such that bandwidth constraints
are satisfied and video distortion is minimized. While, both standard video
codes and their rate control modules are developed to minimize video distortion
w.r.t. human quality assessment, preserving the downstream performance of deep
vision models is not considered. In this paper, we present the first end-to-end
learnable deep video codec control considering both bandwidth constraints and
downstream vision performance, while not breaking existing standardization. We
demonstrate for two common vision tasks (semantic segmentation and optical flow
estimation) and on two different datasets that our deep codec control better
preserves downstream performance than using 2-pass average bit rate control
while meeting dynamic bandwidth constraints and adhering to standardizations.Comment: 22 pages, 26 figures, 6 table
A survey and classification of storage deduplication systems
The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development.
The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope.
This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.This work is funded by the European Regional Development Fund (EDRF) through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the Fundacao para a Ciencia e a Tecnologia (FCT; Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and the FCT by PhD scholarship SFRH-BD-71372-2010
Integrating flash memory into the storage hierarchy.
University of Minnesota Ph.D. dissertation. October 2010. Major: Electrical engineering. Advisors: David J. Lilja, Mohamed F. Mokbel. 1 computer file (PDF); xii, 158 pagesWith the continually accelerating growth of data, the performance of storage systems is increasingly becoming a bottleneck to improving overall system performance. Many applications, such as transaction processing systems, weather forecasting, large-scale scientific simulations, and on-demand services are limited by the performance of the underlying storage systems. The limited bandwidth, high power consumption, and low reliability of widely used magnetic disk-based storage systems impose a significant hurdle in scaling these applications to satisfy the increasing growth of data. These limitations and bottlenecks are especially acute for large-scale high-performance computing systems.
Flash memory is an emerging storage technology that shows tremendous promise to compensate for the limitations of current storage devices. Flash memory's relatively high cost, however, combined with its slow write performance and limited number of erase cycles requires new and innovative solutions to integrate flash memory-based storage devices into a high-performance storage hierarchy. The first part of this thesis develops new algorithms, data structures, and storage architectures to address the fundamental issues that limit the use of flash-based storage devices in high-performance computing systems. The second part of the thesis demonstrates two innovative applications of the flash-based storage.
In particular, the first part addresses a set of fundamental issues including new write caching techniques, sampling-based RAM-space efficient garbage collection scheme, and writing strategies for improving the performance of flash memory for write-intensive applications. This effort will improve the fundamental understanding of flash memory, will remedy the major limitations of using flash-based storage devices, and will extend the capability of flash memory to support many critical applications. On the other hand, the second part demonstrates how flash memory can be used to speed up server applications including Bloom Filter and online deduplication system. This effort will use flash-aware data structures and algorithms, and will show innovative uses of flash-based storage.Debnath, Biplob Kumar. (2010). Integrating flash memory into the storage hierarchy.. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/117595
CFTL: A Convertible Flash Translation Layer with Consideration of Data Access Patterns
NAND flash memory-based storage devices are increasingly adopted as one of the main alternatives for magnetic disk drives. The flash translation layer (FTL) is a software/hardware interface inside NAND flash memory, which allows existing disk-based applications to use it without any significant modifications. Since FTL has a critical impact on the performance of NAND flash-based devices, a variety of FTL schemes have been proposed to improve their performance. However, existing FTLs perform well for either a read intensive workload or a write intensive workload, not for both of them due to their static address mapping schemes. To overcome this limitation, in this paper, we propose a novel FTL addressing scheme named Convertible Flash Translation Layer (CFTL, for short). CFTL is adaptive to data access patterns so that it can dynamically switch the mapping of a data block to either read-optimized or write-optimized mapping scheme in order to fully exploit the benefits of both schemes. By judiciously taking advantage of both schemes, CFTL resolves the intrinsic problems of the existing FTLs. In addition to this convertible scheme, we propose an efficient caching strategy so as to considerably improve the CFTL performance further with only a simple hint. Consequently, both of the convertible feature and caching strategy empower CFTL to achieve good read performance as well as good write performance. Our experimental evaluation with a variety of realistic workloads demonstrates that the proposed CFTL scheme outperforms other FTL schemes
