9,949 research outputs found
Gloss Attention for Gloss-free Sign Language Translation
Most sign language translation (SLT) methods to date require the use of gloss
annotations to provide additional supervision information, however, the
acquisition of gloss is not easy. To solve this problem, we first perform an
analysis of existing models to confirm how gloss annotations make SLT easier.
We find that it can provide two aspects of information for the model, 1) it can
help the model implicitly learn the location of semantic boundaries in
continuous sign language videos, 2) it can help the model understand the sign
language video globally. We then propose \emph{gloss attention}, which enables
the model to keep its attention within video segments that have the same
semantics locally, just as gloss helps existing models do. Furthermore, we
transfer the knowledge of sentence-to-sentence similarity from the natural
language model to our gloss attention SLT network (GASLT) to help it understand
sign language videos at the sentence level. Experimental results on multiple
large-scale sign language datasets show that our proposed GASLT model
significantly outperforms existing methods. Our code is provided in
\url{https://github.com/YinAoXiong/GASLT}
Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques
The rapid growth of demanding applications in domains applying multimedia
processing and machine learning has marked a new era for edge and cloud
computing. These applications involve massive data and compute-intensive tasks,
and thus, typical computing paradigms in embedded systems and data centers are
stressed to meet the worldwide demand for high performance. Concurrently, the
landscape of the semiconductor field in the last 15 years has constituted power
as a first-class design concern. As a result, the community of computing
systems is forced to find alternative design approaches to facilitate
high-performance and/or power-efficient computing. Among the examined
solutions, Approximate Computing has attracted an ever-increasing interest,
with research works applying approximations across the entire traditional
computing stack, i.e., at software, hardware, and architectural levels. Over
the last decade, there is a plethora of approximation techniques in software
(programs, frameworks, compilers, runtimes, languages), hardware (circuits,
accelerators), and architectures (processors, memories). The current article is
Part I of our comprehensive survey on Approximate Computing, and it reviews its
motivation, terminology and principles, as well it classifies and presents the
technical details of the state-of-the-art software and hardware approximation
techniques.Comment: Under Review at ACM Computing Survey
Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
Deriving sophisticated 3D motions from sparse keyframes is a particularly
challenging problem, due to continuity and exceptionally skeletal precision.
The action features are often derivable accurately from the full series of
keyframes, and thus, leveraging the global context with transformers has been a
promising data-driven embedding approach. However, existing methods are often
with inputs of interpolated intermediate frame for continuity using basic
interpolation methods with keyframes, which result in a trivial local minimum
during training. In this paper, we propose a novel framework to formulate
latent motion manifolds with keyframe-based constraints, from which the
continuous nature of intermediate token representations is considered.
Particularly, our proposed framework consists of two stages for identifying a
latent motion subspace, i.e., a keyframe encoding stage and an intermediate
token generation stage, and a subsequent motion synthesis stage to extrapolate
and compose motion data from manifolds. Through our extensive experiments
conducted on both the LaFAN1 and CMU Mocap datasets, our proposed method
demonstrates both superior interpolation accuracy and high visual similarity to
ground truth motions.Comment: Accepted by CVPR 202
The regulation of digital platforms: the case of pagoPA
How can EU regulation affect innovation. Digital revolution: How big data have changed the world and the legal landscape. The regulation of digital platforms in Europe. Digital revolution: How distributed ledger technologies are changing the world and the legal landscape. Regulation of digital payments: the case of pagopa
Synthetic Aperture Radar (SAR) Meets Deep Learning
This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports
A Differential Datalog Interpreter
The core reasoning task for datalog engines is materialization, the
evaluation of a datalog program over a database alongside its physical
incorporation into the database itself. The de-facto method of computing it, is
through the recursive application of inference rules. Due to it being a costly
operation, it is a must for datalog engines to provide incremental
materialization, that is, to adjust the computation to new data, instead of
restarting from scratch. One of the major caveats, is that deleting data is
notoriously more involved than adding, since one has to take into account all
possible data that has been entailed from what is being deleted. Differential
Dataflow is a computational model that provides efficient incremental
maintenance, notoriously with equal performance between additions and
deletions, and work distribution, of iterative dataflows. In this paper we
investigate the performance of materialization with three reference datalog
implementations, out of which one is built on top of a lightweight relational
engine, and the two others are differential-dataflow and non-differential
versions of the same rewrite algorithm, with the same optimizations
LMTuner: An user-friendly and highly-integrable Training Framework for fine-tuning Large Language Models
With the burgeoning development in the realm of large language models (LLMs),
the demand for efficient incremental training tailored to specific industries
and domains continues to increase. Currently, the predominantly employed
frameworks lack modular design, it often takes a lot of coding work to
kickstart the training of LLM. To address this, we present "LMTuner", a highly
usable, integrable, and scalable system for training LLMs expeditiously and
with minimal user-input. LMTuner comprises three main modules - the
Interaction, Training, and Inference Modules. We advocate that LMTuner's
usability and integrality alleviate the complexities in training large language
models. Remarkably, even a novice user could commence training large language
models within five minutes. Furthermore, it integrates DeepSpeed frameworks and
supports Efficient Fine-Tuning methodologies like Low Rank Adaptation (LoRA),
Quantized LoRA (QLoRA), etc., enabling the training of language models scaling
from 300M to a whopping 130B parameters using a single server. The LMTuner's
homepage (https://wengsyx.github.io/LMTuner/)and screencast video
(https://youtu.be/nsXmWOmN3rE) are now publicly available
Decision-making with gaussian processes: sampling strategies and monte carlo methods
We study Gaussian processes and their application to decision-making in the real world. We begin by reviewing the foundations of Bayesian decision theory and show how these ideas give rise to methods such as Bayesian optimization. We investigate practical techniques for carrying out these strategies, with an emphasis on estimating and maximizing acquisition functions. Finally, we introduce pathwise approaches to conditioning Gaussian processes and demonstrate key benefits for representing random variables in this manner.Open Acces
TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest
Sequential models that encode user activity for next action prediction have
become a popular design choice for building web-scale personalized
recommendation systems. Traditional methods of sequential recommendation either
utilize end-to-end learning on realtime user actions, or learn user
representations separately in an offline batch-generated manner. This paper (1)
presents Pinterest's ranking architecture for Homefeed, our personalized
recommendation product and the largest engagement surface; (2) proposes
TransAct, a sequential model that extracts users' short-term preferences from
their realtime activities; (3) describes our hybrid approach to ranking, which
combines end-to-end sequential modeling via TransAct with batch-generated user
embeddings. The hybrid approach allows us to combine the advantages of
responsiveness from learning directly on realtime user activity with the
cost-effectiveness of batch user representations learned over a longer time
period. We describe the results of ablation studies, the challenges we faced
during productionization, and the outcome of an online A/B experiment, which
validates the effectiveness of our hybrid ranking model. We further demonstrate
the effectiveness of TransAct on other surfaces such as contextual
recommendations and search. Our model has been deployed to production in
Homefeed, Related Pins, Notifications, and Search at Pinterest.Comment: \c{opyright} {ACM} {2023}. This is the author's version of the work.
It is posted here for your personal use. Not for redistribution. The
definitive Version of Record was published in KDD'23,
http://dx.doi.org/10.1145/3580305.359991
- …