7 research outputs found
Painter: Teaching Auto-regressive Language Models to Draw Sketches
Large language models (LLMs) have made tremendous progress in natural
language understanding and they have also been successfully adopted in other
domains such as computer vision, robotics, reinforcement learning, etc. In this
work, we apply LLMs to image generation tasks by directly generating the
virtual brush strokes to paint an image. We present Painter, an LLM that can
convert user prompts in text description format to sketches by generating the
corresponding brush strokes in an auto-regressive way. We construct Painter
based on off-the-shelf LLM that is pre-trained on a large text corpus, by
fine-tuning it on the new task while preserving language understanding
capabilities. We create a dataset of diverse multi-object sketches paired with
textual prompts that covers several object types and tasks. Painter can
generate sketches from text descriptions, remove objects from canvas, and
detect and classify objects in sketches. Although this is an unprecedented
pioneering work in using LLMs for auto-regressive image generation, the results
are very encouraging
A generic controller for managing TCP transfers in IEEE 802.11 infrastructure WLANs
In this paper, we present a generic controller that ensures fair and efficient operation of IEEE 802.11 infrastructure wireless local area networks (WLANs) with multiple co-channel access points. Our controller addresses performance issues of long-lived TCP transfers in multi-AP WLANs, by overlaying a coarse time slicing scheduler on top of a cascaded fair queuing scheduler. The time slices and queue weights, used in our controller, are obtained from the solution of a constrained utility optimization formulation. A study of the impact of coarse time-slicing on TCP is also presented in this paper. We also present a methodology to improve the performance of co-existing short-lived and interactive TCP flows. Finally, we report the results of experiments performed on a real testbed, demonstrating the efficacy of our controller
QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms
In this work, we present QuickSRNet, an efficient super-resolution
architecture for real-time applications on mobile platforms. Super-resolution
clarifies, sharpens, and upscales an image to higher resolution. Applications
such as gaming and video playback along with the ever-improving display
capabilities of TVs, smartphones, and VR headsets are driving the need for
efficient upscaling solutions. While existing deep learning-based
super-resolution approaches achieve impressive results in terms of visual
quality, enabling real-time DL-based super-resolution on mobile devices with
compute, thermal, and power constraints is challenging. To address these
challenges, we propose QuickSRNet, a simple yet effective architecture that
provides better accuracy-to-latency trade-offs than existing neural
architectures for single-image super resolution. We present training tricks to
speed up existing residual-based super-resolution architectures while
maintaining robustness to quantization. Our proposed architecture produces
1080p outputs via 2x upscaling in 2.2 ms on a modern smartphone, making it
ideal for high-fps real-time applications.Comment: Camera-ready version (CVPR workshop - MAI'23
Is end-to-end learning enough for fitness activity recognition?
End-to-end learning has taken hold of many computer vision tasks, in
particular, related to still images, with task-specific optimization yielding
very strong performance. Nevertheless, human-centric action recognition is
still largely dominated by hand-crafted pipelines, and only individual
components are replaced by neural networks that typically operate on individual
frames. As a testbed to study the relevance of such pipelines, we present a new
fully annotated video dataset of fitness activities. Any recognition
capabilities in this domain are almost exclusively a function of human poses
and their temporal dynamics, so pose-based solutions should perform well. We
show that, with this labelled data, end-to-end learning on raw pixels can
compete with state-of-the-art action recognition pipelines based on pose
estimation. We also show that end-to-end learning can support temporally
fine-grained tasks such as real-time repetition counting.Comment: 9 pages, 4 figures, 4 table