706 research outputs found
Statistical Analysis of a Posteriori Channel and Noise Distribution Based on HARQ Feedback
In response to a comment on one of our manuscript, this work studies the
posterior channel and noise distributions conditioned on the NACKs and ACKs of
all previous transmissions in HARQ system with statistical approaches. Our main
result is that, unless the coherence interval (time or frequency) is large as
in block-fading assumption, the posterior distribution of the channel and noise
either remains almost identical to the prior distribution, or it mostly follows
the same class of distribution as the prior one. In the latter case, the
difference between the posterior and prior distribution can be modeled as some
parameter mismatch, which has little impact on certain type of applications.Comment: 15 pages, 2 figures, 4 table
High-Performance Matrix Multiplication: Hierarchical Data Structures, Optimized Kernel Routines, and Qualitative Performance Modeling
The optimal implementation of matrix multiplication on modern computer architectures is of great importance for scientific and engineering applications. However, achieving the optimal performance for matrix multiplication has been continuously challenged both by the ever-widening performance gap between the processor and memory hierarchy and the introduction of new architectural features in modern architectures. The conventional way of dealing with these challenges benefits significantly from the blocking algorithm, which improves the data locality in the cache memory, and from the highly tuned inner kernel routines, which in turn exploit the architectural aspects on the specific processor to deliver near peak performance. A state-of-art improvement of the blocking algorithm is the self-tuning approach that utilizes heroic combinatorial optimization of parameters spaces. Other recent research approaches include the approach that explicitly blocks for the TLB (Translation Lookaside Buffer) and the hierarchical formulation that employs memoryriendly Morton Ordering (a spaceilling curve methodology). This thesis compares and contrasts the TLB-blocking-based and Morton-Order-based methods for dense matrix multiplication, and offers a qualitative model to explain the performance behavior. Comparisons to the performance of self-tuning library and the vendor library are also offered for the Alpha architecture. The practical benchmark experiments demonstrate that neither conventional blocking-based implementations nor the self-tuning libraries are optimal to achieve consistent high performance in dense matrix multiplication of relatively large square matrix size. Instead, architectural constraints and issues evidently restrict the critical path and options available for optimal performance, so that the relatively simple strategy and framework presented in this study offers higher and flatter overall performance. Interestingly, maximal inner kernel efficiency is not a guarantee of global minimal multiplication time. Also, efficient and flat performance is possible at all problem sizes that fit in main memory, rather than jagged performance curves often observed in blocking and self-tuned blocking libraries
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Large pre-trained vision models achieve impressive success in computer
vision. However, fully fine-tuning large models for downstream tasks,
particularly in video understanding, can be prohibitively computationally
expensive. Recent studies turn their focus towards efficient image-to-video
transfer learning. Nevertheless, existing efficient fine-tuning methods lack
attention to training memory usage and exploration of transferring a larger
model to the video domain. In this paper, we present a novel Spatial-Temporal
Side Network for memory-efficient fine-tuning large image models to video
understanding, named Side4Video. Specifically, we introduce a lightweight
spatial-temporal side network attached to the frozen vision model, which avoids
the backpropagation through the heavy pre-trained model and utilizes
multi-level spatial features from the original image model. Extremely
memory-efficient architecture enables our method to reduce 75% memory usage
than previous adapter-based methods. In this way, we can transfer a huge ViT-E
(4.4B) for video understanding tasks which is 14x larger than ViT-L (304M). Our
approach achieves remarkable performance on various video datasets across
unimodal and cross-modal tasks (i.e., action recognition and text-video
retrieval), especially in Something-Something V1&V2 (67.3% & 74.6%),
Kinetics-400 (88.6%), MSR-VTT (52.3%), MSVD (56.1%) and VATEX (68.8%). We
release our code at https://github.com/HJYao00/Side4Video.Comment: Technical repor
Lorentz Quantum Computer
A theoretical model of computation is proposed based on Lorentz quantum
mechanics. Besides the standard qubits, this model has an additional bit, which
we call hyperbolic bit (or hybit in short). A set of basic logical gates are
constructed and their universality is proved. As an application, a search
algorithm is designed for this computer model and is found to be exponentially
faster than the Grover's search algorithm
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Transferring knowledge from task-agnostic pre-trained deep models for
downstream tasks is an important topic in computer vision research. Along with
the growth of computational capacity, we now have open-source vision-language
pre-trained models in large scales of the model architecture and amount of
data. In this study, we focus on transferring knowledge for video
classification tasks. Conventional methods randomly initialize the linear
classifier head for vision classification, but they leave the usage of the text
encoder for downstream visual recognition tasks undiscovered. In this paper, we
revise the role of the linear classifier and replace the classifier with the
different knowledge from pre-trained model. We utilize the well-pretrained
language model to generate good semantic target for efficient transferring
learning. The empirical study shows that our method improves both the
performance and the training speed of video classification, with a negligible
change in the model. Our simple yet effective tuning paradigm achieves
state-of-the-art performance and efficient training on various video
recognition scenarios, i.e., zero-shot, few-shot, general recognition. In
particular, our paradigm achieves the state-of-the-art accuracy of 87.8% on
Kinetics-400, and also surpasses previous methods by 20~50% absolute top-1
accuracy under zero-shot, few-shot settings on five popular video datasets.
Code and models can be found at https://github.com/whwu95/Text4Vis .Comment: Accepted by AAAI-2023. Camera Ready Versio
In-situ electrochemical fabrication of natural contacts on single nanowires
We report a template-based in-situ electrochemical method for fabricating
natural electric contacts on single nanowires using a pair of cross-patterned
electrodes. Such electric contacts are highly stable upon thermal cycling
between room temperature and milli-Kelvin temperatures. Direct imaging of the
single-nanowire contacts using scanning electron microscopy is also
demonstrated.Comment: 13 pages, 4 figure
- …