60 research outputs found
A solvable class of quadratic 0–1 programming
AbstractWe show that the minimum of the pseudo-Boolean quadratic function Æ’(x) = xTQx + cTx can be found in linear time when the graph defined by Q is transformable into a combinatorial circuit of AND, OR, NAND, NOR or NOT logic gates. A novel modeling technique is used to transform the graph defined by Q into a logic circuit. A consistent labeling of the signals in the logic circuit from the set {0, 1} corresponds to the global minimum of Æ’ and the labeling is determined through logic simulation of the circuit. Our approach establishes a direct and constructive relationship between pseudo-Boolean functions and logic circuits.In the restricted case when all the elements of Q are nonpositive, the minimum of Æ’ can be obtained in polynomial time [15]. We show that the problem of finding the minimum of Æ’, even in the special case when all the elements of Q are positive, is NP-complete
LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs
Question-answering (QA) is a significant application of Large Language Models
(LLMs), shaping chatbot capabilities across healthcare, education, and customer
service. However, widespread LLM integration presents a challenge for small
businesses due to the high expenses of LLM API usage. Costs rise rapidly when
domain-specific data (context) is used alongside queries for accurate
domain-specific LLM responses. One option is to summarize the context by using
LLMs and reduce the context. However, this can also filter out useful
information that is necessary to answer some domain-specific queries. In this
paper, we shift from human-oriented summarizers to AI model-friendly summaries.
Our approach, LeanContext, efficiently extracts key sentences from the
context that are closely aligned with the query. The choice of is neither
static nor random; we introduce a reinforcement learning technique that
dynamically determines based on the query and context. The rest of the less
important sentences are reduced using a free open source text reduction method.
We evaluate LeanContext against several recent query-aware and query-unaware
context reduction approaches on prominent datasets (arxiv papers and BBC news
articles). Despite cost reductions of to , LeanContext's
ROUGE-1 score decreases only by to compared to a baseline
that retains the entire context (no summarization). Additionally, if free
pretrained LLM-based summarizers are used to reduce context (into human
consumable summaries), LeanContext can further modify the reduced context to
enhance the accuracy (ROUGE-1 score) by to .Comment: The paper is under revie
Differentiable JPEG: The Devil is in the Details
JPEG remains one of the most widespread lossy image coding methods. However,
the non-differentiable nature of JPEG restricts the application in deep
learning pipelines. Several differentiable approximations of JPEG have recently
been proposed to address this issue. This paper conducts a comprehensive review
of existing diff. JPEG approaches and identifies critical details that have
been missed by previous methods. To this end, we propose a novel diff. JPEG
approach, overcoming previous limitations. Our approach is differentiable
w.r.t. the input image, the JPEG quality, the quantization tables, and the
color conversion parameters. We evaluate the forward and backward performance
of our diff. JPEG approach against existing methods. Additionally, extensive
ablations are performed to evaluate crucial design choices. Our proposed diff.
JPEG resembles the (non-diff.) reference implementation best, significantly
surpassing the recent-best diff. approach by dB (PSNR) on average. For
strong compression rates, we can even improve PSNR by dB. Strong
adversarial attack results are yielded by our diff. JPEG, demonstrating the
effective gradient approximation. Our code is available at
https://github.com/necla-ml/Diff-JPEG.Comment: Accepted at WACV 2024. Project page:
https://christophreich1996.github.io/differentiable_jpeg
Deep Video Codec Control
Lossy video compression is commonly used when transmitting and storing video
data. Unified video codecs (e.g., H.264 or H.265) remain the de facto standard,
despite the availability of advanced (neural) compression approaches.
Transmitting videos in the face of dynamic network bandwidth conditions
requires video codecs to adapt to vastly different compression strengths. Rate
control modules augment the codec's compression such that bandwidth constraints
are satisfied and video distortion is minimized. While, both standard video
codes and their rate control modules are developed to minimize video distortion
w.r.t. human quality assessment, preserving the downstream performance of deep
vision models is not considered. In this paper, we present the first end-to-end
learnable deep video codec control considering both bandwidth constraints and
downstream vision performance, while not breaking existing standardization. We
demonstrate for two common vision tasks (semantic segmentation and optical flow
estimation) and on two different datasets that our deep codec control better
preserves downstream performance than using 2-pass average bit rate control
while meeting dynamic bandwidth constraints and adhering to standardizations.Comment: 22 pages, 26 figures, 6 table
Semantic Multi-Resolution Communications
Deep learning based joint source-channel coding (JSCC) has demonstrated
significant advancements in data reconstruction compared to separate
source-channel coding (SSCC). This superiority arises from the suboptimality of
SSCC when dealing with finite block-length data. Moreover, SSCC falls short in
reconstructing data in a multi-user and/or multi-resolution fashion, as it only
tries to satisfy the worst channel and/or the highest quality data. To overcome
these limitations, we propose a novel deep learning multi-resolution JSCC
framework inspired by the concept of multi-task learning (MTL). This proposed
framework excels at encoding data for different resolutions through
hierarchical layers and effectively decodes it by leveraging both current and
past layers of encoded data. Moreover, this framework holds great potential for
semantic communication, where the objective extends beyond data reconstruction
to preserving specific semantic attributes throughout the communication
process. These semantic features could be crucial elements such as class
labels, essential for classification tasks, or other key attributes that
require preservation. Within this framework, each level of encoded data can be
carefully designed to retain specific data semantics. As a result, the
precision of a semantic classifier can be progressively enhanced across
successive layers, emphasizing the preservation of targeted semantics
throughout the encoding and decoding stages. We conduct experiments on MNIST
and CIFAR10 dataset. The experiment with both datasets illustrates that our
proposed method is capable of surpassing the SSCC method in reconstructing data
with different resolutions, enabling the extraction of semantic features with
heightened confidence in successive layers. This capability is particularly
advantageous for prioritizing and preserving more crucial semantic features
within the datasets
Why is the video analytics accuracy fluctuating, and what can we do about it?
It is a common practice to think of a video as a sequence of images (frames),
and re-use deep neural network models that are trained only on images for
similar analytics tasks on videos. In this paper, we show that this leap of
faith that deep learning models that work well on images will also work well on
videos is actually flawed. We show that even when a video camera is viewing a
scene that is not changing in any human-perceptible way, and we control for
external factors like video compression and environment (lighting), the
accuracy of video analytics application fluctuates noticeably. These
fluctuations occur because successive frames produced by the video camera may
look similar visually, but these frames are perceived quite differently by the
video analytics applications. We observed that the root cause for these
fluctuations is the dynamic camera parameter changes that a video camera
automatically makes in order to capture and produce a visually pleasing video.
The camera inadvertently acts as an unintentional adversary because these
slight changes in the image pixel values in consecutive frames, as we show,
have a noticeably adverse impact on the accuracy of insights from video
analytics tasks that re-use image-trained deep learning models. To address this
inadvertent adversarial effect from the camera, we explore the use of transfer
learning techniques to improve learning in video analytics tasks through the
transfer of knowledge from learning on image analytics tasks. In particular, we
show that our newly trained Yolov5 model reduces fluctuation in object
detection across frames, which leads to better tracking of objects(40% fewer
mistakes in tracking). Our paper also provides new directions and techniques to
mitigate the camera's adversarial effect on deep learning models used for video
analytics applications
- …