314 research outputs found
Theoretic Analysis and Extremely Easy Algorithms for Domain Adaptive Feature Learning
Domain adaptation problems arise in a variety of applications, where a
training dataset from the \textit{source} domain and a test dataset from the
\textit{target} domain typically follow different distributions. The primary
difficulty in designing effective learning models to solve such problems lies
in how to bridge the gap between the source and target distributions. In this
paper, we provide comprehensive analysis of feature learning algorithms used in
conjunction with linear classifiers for domain adaptation. Our analysis shows
that in order to achieve good adaptation performance, the second moments of the
source domain distribution and target domain distribution should be similar.
Based on our new analysis, a novel extremely easy feature learning algorithm
for domain adaptation is proposed. Furthermore, our algorithm is extended by
leveraging multiple layers, leading to a deep linear model. We evaluate the
effectiveness of the proposed algorithms in terms of domain adaptation tasks on
the Amazon review dataset and the spam dataset from the ECML/PKDD 2006
discovery challenge.Comment: ijca
Automatización del diseño de las superficies limitadoras de obstáculos
Air safety is one of the most sensitive issues in the fascinating world of aeronautics. Even though the rate of victims to accident in case of any aircraft catastrophe is relatively low in comparison with other means of transport such as a car, the echo of the news and its economic impact is far away high with respect to the others. Because of that, the international and local organizations of aviation have created a set of regulations in order to reduce and prevent any possible causes that could lead to any kind of disaster. Aeronautical servitude with an objective of protecting the airspace of airport facilities and its vicinity, applying a series of restrictions that ban construction or presence of any object that infringe aircrafts operation safety are the principal terms of the cited regulations. This document is development of an application from which airports servitude is designed in accordance with ICAO's and BOE's regulations and standards that leads to analyse and understand these regulations. The application is designed in the programming language C# using Microsoft Visual Studio 2012 as an interface for the user interaction. Finally, it is necessary to assimilate and implement the commands and .dll libraries in applications's code in order to be able to represent servitudes in the AUTOCAD environment
Visualizing topological edge states of single and double bilayer Bi supported on multibilayer Bi(111) films
Freestanding single-bilayer Bi(111) is a two-dimensional topological
insulator with edge states propagating along its perimeter. Given the
interlayer coupling experimentally, the topological nature of Bi(111) thin
films and the impact of the supporting substrate on the topmost Bi bilayer are
still under debate. Here, combined with scanning tunneling microscopy and
first-principles calculations, we systematically study the electronic
properties of Bi(111) thin films grown on a NbSe2 substrate. Two types of
non-magnetic edge structures, i.e., a conventional zigzag edge and a 2x1
reconstructed edge, coexist alternately at the boundaries of single bilayer
islands, the topological edge states of which exhibit remarkably different
energy and spatial distributions. Prominent edge states are persistently
visualized at the edges of both single and double bilayer Bi islands,
regardless of the underlying thickness of Bi(111) thin films. We provide an
explanation for the topological origin of the observed edge states that is
verified with first-principles calculations. Our paper clarifies the
long-standing controversy regarding the topology of Bi(111) thin films and
reveals the tunability of topological edge states via edge modifications.Comment: 36 pages, 10 figure
MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space
As an essential step towards computer creativity, automatic poetry generation
has gained increasing attention these years. Though recent neural models make
prominent progress in some criteria of poetry quality, generated poems still
suffer from the problem of poor diversity. Related literature researches show
that different factors, such as life experience, historical background, etc.,
would influence composition styles of poets, which considerably contributes to
the high diversity of human-authored poetry. Inspired by this, we propose
MixPoet, a novel model that absorbs multiple factors to create various styles
and promote diversity. Based on a semi-supervised variational autoencoder, our
model disentangles the latent space into some subspaces, with each conditioned
on one influence factor by adversarial training. In this way, the model learns
a controllable latent variable to capture and mix generalized factor-related
properties. Different factor mixtures lead to diverse styles and hence further
differentiate generated poems from each other. Experiment results on Chinese
poetry demonstrate that MixPoet improves both diversity and quality against
three state-of-the-art models.Comment: 8 pages, 5 figures, published in AAAI 202
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Current captioning approaches tend to generate correct but "generic"
descriptions that lack real-world knowledge, e.g., named entities and
contextual information. Considering that Vision-Language Pre-Training (VLP)
models master massive such knowledge from large-scale web-harvested data, it is
promising to utilize the generalizability of VLP models to incorporate
knowledge into image descriptions. However, using VLP models faces challenges:
zero-shot inference suffers from knowledge hallucination that leads to
low-quality descriptions, but the generic bias in downstream task fine-tuning
hinders the VLP model from expressing knowledge. To address these concerns, we
propose a simple yet effective method called Knowledge-guided Replay
(K-Replay), which enables the retention of pre-training knowledge during
fine-tuning. Our approach consists of two parts: (1) a knowledge prediction
task on automatically collected replay exemplars to continuously awaken the VLP
model's memory about knowledge, thus preventing the model from collapsing into
the generic pattern; (2) a knowledge distillation constraint to improve the
faithfulness of generated descriptions hence alleviating the knowledge
hallucination. To evaluate knowledge-enhanced descriptions, we construct a
novel captioning benchmark KnowCap, containing knowledge of landmarks, famous
brands, special foods and movie characters. Experimental results show that our
approach effectively incorporates knowledge into descriptions, outperforming
strong VLP baseline by 20.9 points (78.7->99.6) in CIDEr score and 20.5
percentage points (34.0%->54.5%) in knowledge recognition accuracy. Our code
and data is available at https://github.com/njucckevin/KnowCap.Comment: Accepted at ACM Multimedia (ACMMM) 202
The power of question translation training in multilingual reasoning:Broadened scope and deepened insights
Bridging the significant gap between large language model's English and non-English performance presents a great challenge. While some previous studies attempt to mitigate this gap with translated training data, the recently proposed question alignment approach leverages the model's English expertise to improve multilingual performance with minimum usage of expensive, error-prone translation. In this paper, we explore how broadly this method can be applied by examining its effects in reasoning with executable code and reasoning with common sense. We also explore how to apply this approach efficiently to extremely large language models using proxy-tuning. Experiment results on multilingual reasoning benchmarks mGSM, mSVAMP and xCSQA demonstrate that the question alignment approach can be used to boost multilingual performance across diverse reasoning scenarios, model families, and sizes. For instance, when applied to the LLaMA2 models, our method brings an average accuracy improvements of 12.2% on mGSM even with the 70B model. To understand the mechanism of its success, we analyze representation space, chain-of-thought and translation data scales, which reveals how question translation training strengthens language alignment within LLMs and shapes their working patterns
A fundamental diagram based interpretable framework for traffic flow estimation and prediction by combining a Markovian model with deep learning
UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning
In recent times, there has been a growing interest in developing effective
perception techniques for combining information from multiple modalities. This
involves aligning features obtained from diverse sources to enable more
efficient training with larger datasets and constraints, as well as leveraging
the wealth of information contained in each modality. 2D and 3D Human Pose
Estimation (HPE) are two critical perceptual tasks in computer vision, which
have numerous downstream applications, such as Action Recognition,
Human-Computer Interaction, Object tracking, etc. Yet, there are limited
instances where the correlation between Image and 2D/3D human pose has been
clearly researched using a contrastive paradigm. In this paper, we propose
UniHPE, a unified Human Pose Estimation pipeline, which aligns features from
all three modalities, i.e., 2D human pose estimation, lifting-based and
image-based 3D human pose estimation, in the same pipeline. To align more than
two modalities at the same time, we propose a novel singular value based
contrastive learning loss, which better aligns different modalities and further
boosts the performance. In our evaluation, UniHPE achieves remarkable
performance metrics: MPJPE mm on the Human3.6M dataset and PAMPJPE
mm on the 3DPW dataset. Our proposed method holds immense potential to
advance the field of computer vision and contribute to various applications
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
Learning-based methods have dominated the 3D human pose estimation (HPE)
tasks with significantly better performance in most benchmarks than traditional
optimization-based methods. Nonetheless, 3D HPE in the wild is still the
biggest challenge of learning-based models, whether with 2D-3D lifting,
image-to-3D, or diffusion-based methods, since the trained networks implicitly
learn camera intrinsic parameters and domain-based 3D human pose distributions
and estimate poses by statistical average. On the other hand, the
optimization-based methods estimate results case-by-case, which can predict
more diverse and sophisticated human poses in the wild. By combining the
advantages of optimization-based and learning-based methods, we propose the
Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the
problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO
achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE mm
without training with any 2D-3D or image-3D pairs. Moreover, our
single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE
mm on cross-dataset evaluation, which even outperforms learning-based
methods trained on 3DPW
- …