26 research outputs found

    LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs

    Full text link
    Question-answering (QA) is a significant application of Large Language Models (LLMs), shaping chatbot capabilities across healthcare, education, and customer service. However, widespread LLM integration presents a challenge for small businesses due to the high expenses of LLM API usage. Costs rise rapidly when domain-specific data (context) is used alongside queries for accurate domain-specific LLM responses. One option is to summarize the context by using LLMs and reduce the context. However, this can also filter out useful information that is necessary to answer some domain-specific queries. In this paper, we shift from human-oriented summarizers to AI model-friendly summaries. Our approach, LeanContext, efficiently extracts kk key sentences from the context that are closely aligned with the query. The choice of kk is neither static nor random; we introduce a reinforcement learning technique that dynamically determines kk based on the query and context. The rest of the less important sentences are reduced using a free open source text reduction method. We evaluate LeanContext against several recent query-aware and query-unaware context reduction approaches on prominent datasets (arxiv papers and BBC news articles). Despite cost reductions of 37.29%37.29\% to 67.81%67.81\%, LeanContext's ROUGE-1 score decreases only by 1.41%1.41\% to 2.65%2.65\% compared to a baseline that retains the entire context (no summarization). Additionally, if free pretrained LLM-based summarizers are used to reduce context (into human consumable summaries), LeanContext can further modify the reduced context to enhance the accuracy (ROUGE-1 score) by 13.22%13.22\% to 24.61%24.61\%.Comment: The paper is under revie

    Differentiable JPEG: The Devil is in the Details

    Full text link
    JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by 3.473.47dB (PSNR) on average. For strong compression rates, we can even improve PSNR by 9.519.51dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.Comment: Accepted at WACV 2024. Project page: https://christophreich1996.github.io/differentiable_jpeg

    Deep Video Codec Control

    Full text link
    Lossy video compression is commonly used when transmitting and storing video data. Unified video codecs (e.g., H.264 or H.265) remain the de facto standard, despite the availability of advanced (neural) compression approaches. Transmitting videos in the face of dynamic network bandwidth conditions requires video codecs to adapt to vastly different compression strengths. Rate control modules augment the codec's compression such that bandwidth constraints are satisfied and video distortion is minimized. While, both standard video codes and their rate control modules are developed to minimize video distortion w.r.t. human quality assessment, preserving the downstream performance of deep vision models is not considered. In this paper, we present the first end-to-end learnable deep video codec control considering both bandwidth constraints and downstream vision performance, while not breaking existing standardization. We demonstrate for two common vision tasks (semantic segmentation and optical flow estimation) and on two different datasets that our deep codec control better preserves downstream performance than using 2-pass average bit rate control while meeting dynamic bandwidth constraints and adhering to standardizations.Comment: 22 pages, 26 figures, 6 table

    A survey and classification of storage deduplication systems

    Get PDF
    The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development. The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.This work is funded by the European Regional Development Fund (EDRF) through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the Fundacao para a Ciencia e a Tecnologia (FCT; Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and the FCT by PhD scholarship SFRH-BD-71372-2010

    A Dynamic Switching Flash Translation Layer Based on Page-Level Mapping

    No full text

    CFTL: A Convertible Flash Translation Layer with Consideration of Data Access Patterns

    No full text
    NAND flash memory-based storage devices are increasingly adopted as one of the main alternatives for magnetic disk drives. The flash translation layer (FTL) is a software/hardware interface inside NAND flash memory, which allows existing disk-based applications to use it without any significant modifications. Since FTL has a critical impact on the performance of NAND flash-based devices, a variety of FTL schemes have been proposed to improve their performance. However, existing FTLs perform well for either a read intensive workload or a write intensive workload, not for both of them due to their static address mapping schemes. To overcome this limitation, in this paper, we propose a novel FTL addressing scheme named Convertible Flash Translation Layer (CFTL, for short). CFTL is adaptive to data access patterns so that it can dynamically switch the mapping of a data block to either read-optimized or write-optimized mapping scheme in order to fully exploit the benefits of both schemes. By judiciously taking advantage of both schemes, CFTL resolves the intrinsic problems of the existing FTLs. In addition to this convertible scheme, we propose an efficient caching strategy so as to considerably improve the CFTL performance further with only a simple hint. Consequently, both of the convertible feature and caching strategy empower CFTL to achieve good read performance as well as good write performance. Our experimental evaluation with a variety of realistic workloads demonstrates that the proposed CFTL scheme outperforms other FTL schemes

    A Forest-structured Bloom Filter with Flash Memory

    No full text
    Abstract—A Bloom Filter (BF) is a data structure based on probability to compactly represent/record a set of elements (keys). It has wide applications on efficiently identifying a key that has been seen before with minimum amount of recording space used. BF is heavily used in chunking based data de-duplication. Traditionally, a BF is implemented as in-RAM data structure; hence its size is limited by the available RAM space on the machine. For certain applications like data de-duplication that require a big BF beyond the size of available RAM space, it becomes necessary to store a BF into a secondary storage device. Since BF operations are inherently random in nature, magnetic disk provides worse performance for the random read and write operations. It will not be a good fit for storing the large BF. Flash memory based Solid State Drive (SSD) has been considered as an emerging storage device that has superior performance and can potentially replace disks as the preferred secondary storage devices. However, several special characteristics of flash memory make designing a flash memory based BF very challenging. In this paper, our goal is to design an efficient flash memory based BF that is fully aware of these physical characteristics. To this end, we propose a Forest-structured BF design (FBF). FBF uses a combination of RAM and flash memory to design a BF. BF is stored on the flash, while RAM helps to mitigate the impact of slow write performance of flash memory. In addition, in-flash BF is organized in a forest-like structure in order to improve the lookup performance. Our experimental results show that FBF design achieves 2 times faster processing speed with 50% less number of flash write operations when compared with the existing flash memory based BF designs. I

    Large Block CLOCK (LB-CLOCK): A Write Caching Algorithm for Solid State Disks

    No full text
    Abstract—Solid State Disks (SSDs) using NAND flash memory are increasingly being adopted in the high-end servers of datacenters to improve performance of the I/O-intensive applications. Compared to the traditional enterprise class hard disks, SSDs provide faster read performance, lower cooling cost, and higher power efficiency. However, write performance of a flash based SSD can be up to an order of magnitude slower than its read performance. Furthermore, frequent write operations degrade the lifetime of flash memory. A nonvolatile cache can greatly help to solve these problems. Although a RAM cache is relative high in cost, it has successfully eliminated the performance gap between fast CPU and slow magnetic disk. Similarly, a nonvolatile cache in an SSD can alleviate the disparity between the flash memory’s read and write performance. A small write cache that reduces the number of flash block erase operations, can lead to substantial performance gain for write-intensive applications and can extend the overall lifetime of flash based SSDs. This paper presents a novel write caching algorithm, the Large Block CLOCK (LB-CLOCK) algorithm, which considers ‘recency ’ and ‘block space utilization ’ metrics to make cache management decisions. LB-CLOCK dynamically varies the priority between these two metrics to adapt to changes in workload characteristics. Our simulation based experimental results show that LB-CLOCK outperforms the best known existing flash caching algorithms for a wide range of workloads. I
    corecore