314 research outputs found

    Theoretic Analysis and Extremely Easy Algorithms for Domain Adaptive Feature Learning

    Full text link
    Domain adaptation problems arise in a variety of applications, where a training dataset from the \textit{source} domain and a test dataset from the \textit{target} domain typically follow different distributions. The primary difficulty in designing effective learning models to solve such problems lies in how to bridge the gap between the source and target distributions. In this paper, we provide comprehensive analysis of feature learning algorithms used in conjunction with linear classifiers for domain adaptation. Our analysis shows that in order to achieve good adaptation performance, the second moments of the source domain distribution and target domain distribution should be similar. Based on our new analysis, a novel extremely easy feature learning algorithm for domain adaptation is proposed. Furthermore, our algorithm is extended by leveraging multiple layers, leading to a deep linear model. We evaluate the effectiveness of the proposed algorithms in terms of domain adaptation tasks on the Amazon review dataset and the spam dataset from the ECML/PKDD 2006 discovery challenge.Comment: ijca

    Automatización del diseño de las superficies limitadoras de obstáculos

    Get PDF
    Air safety is one of the most sensitive issues in the fascinating world of aeronautics. Even though the rate of victims to accident in case of any aircraft catastrophe is relatively low in comparison with other means of transport such as a car, the echo of the news and its economic impact is far away high with respect to the others. Because of that, the international and local organizations of aviation have created a set of regulations in order to reduce and prevent any possible causes that could lead to any kind of disaster. Aeronautical servitude with an objective of protecting the airspace of airport facilities and its vicinity, applying a series of restrictions that ban construction or presence of any object that infringe aircrafts operation safety are the principal terms of the cited regulations. This document is development of an application from which airports servitude is designed in accordance with ICAO's and BOE's regulations and standards that leads to analyse and understand these regulations. The application is designed in the programming language C# using Microsoft Visual Studio 2012 as an interface for the user interaction. Finally, it is necessary to assimilate and implement the commands and .dll libraries in applications's code in order to be able to represent servitudes in the AUTOCAD environment

    Visualizing topological edge states of single and double bilayer Bi supported on multibilayer Bi(111) films

    Full text link
    Freestanding single-bilayer Bi(111) is a two-dimensional topological insulator with edge states propagating along its perimeter. Given the interlayer coupling experimentally, the topological nature of Bi(111) thin films and the impact of the supporting substrate on the topmost Bi bilayer are still under debate. Here, combined with scanning tunneling microscopy and first-principles calculations, we systematically study the electronic properties of Bi(111) thin films grown on a NbSe2 substrate. Two types of non-magnetic edge structures, i.e., a conventional zigzag edge and a 2x1 reconstructed edge, coexist alternately at the boundaries of single bilayer islands, the topological edge states of which exhibit remarkably different energy and spatial distributions. Prominent edge states are persistently visualized at the edges of both single and double bilayer Bi islands, regardless of the underlying thickness of Bi(111) thin films. We provide an explanation for the topological origin of the observed edge states that is verified with first-principles calculations. Our paper clarifies the long-standing controversy regarding the topology of Bi(111) thin films and reveals the tunability of topological edge states via edge modifications.Comment: 36 pages, 10 figure

    MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space

    Full text link
    As an essential step towards computer creativity, automatic poetry generation has gained increasing attention these years. Though recent neural models make prominent progress in some criteria of poetry quality, generated poems still suffer from the problem of poor diversity. Related literature researches show that different factors, such as life experience, historical background, etc., would influence composition styles of poets, which considerably contributes to the high diversity of human-authored poetry. Inspired by this, we propose MixPoet, a novel model that absorbs multiple factors to create various styles and promote diversity. Based on a semi-supervised variational autoencoder, our model disentangles the latent space into some subspaces, with each conditioned on one influence factor by adversarial training. In this way, the model learns a controllable latent variable to capture and mix generalized factor-related properties. Different factor mixtures lead to diverse styles and hence further differentiate generated poems from each other. Experiment results on Chinese poetry demonstrate that MixPoet improves both diversity and quality against three state-of-the-art models.Comment: 8 pages, 5 figures, published in AAAI 202

    Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model

    Full text link
    Current captioning approaches tend to generate correct but "generic" descriptions that lack real-world knowledge, e.g., named entities and contextual information. Considering that Vision-Language Pre-Training (VLP) models master massive such knowledge from large-scale web-harvested data, it is promising to utilize the generalizability of VLP models to incorporate knowledge into image descriptions. However, using VLP models faces challenges: zero-shot inference suffers from knowledge hallucination that leads to low-quality descriptions, but the generic bias in downstream task fine-tuning hinders the VLP model from expressing knowledge. To address these concerns, we propose a simple yet effective method called Knowledge-guided Replay (K-Replay), which enables the retention of pre-training knowledge during fine-tuning. Our approach consists of two parts: (1) a knowledge prediction task on automatically collected replay exemplars to continuously awaken the VLP model's memory about knowledge, thus preventing the model from collapsing into the generic pattern; (2) a knowledge distillation constraint to improve the faithfulness of generated descriptions hence alleviating the knowledge hallucination. To evaluate knowledge-enhanced descriptions, we construct a novel captioning benchmark KnowCap, containing knowledge of landmarks, famous brands, special foods and movie characters. Experimental results show that our approach effectively incorporates knowledge into descriptions, outperforming strong VLP baseline by 20.9 points (78.7->99.6) in CIDEr score and 20.5 percentage points (34.0%->54.5%) in knowledge recognition accuracy. Our code and data is available at https://github.com/njucckevin/KnowCap.Comment: Accepted at ACM Multimedia (ACMMM) 202

    The power of question translation training in multilingual reasoning:Broadened scope and deepened insights

    Get PDF
    Bridging the significant gap between large language model's English and non-English performance presents a great challenge. While some previous studies attempt to mitigate this gap with translated training data, the recently proposed question alignment approach leverages the model's English expertise to improve multilingual performance with minimum usage of expensive, error-prone translation. In this paper, we explore how broadly this method can be applied by examining its effects in reasoning with executable code and reasoning with common sense. We also explore how to apply this approach efficiently to extremely large language models using proxy-tuning. Experiment results on multilingual reasoning benchmarks mGSM, mSVAMP and xCSQA demonstrate that the question alignment approach can be used to boost multilingual performance across diverse reasoning scenarios, model families, and sizes. For instance, when applied to the LLaMA2 models, our method brings an average accuracy improvements of 12.2% on mGSM even with the 70B model. To understand the mechanism of its success, we analyze representation space, chain-of-thought and translation data scales, which reveals how question translation training strengthens language alignment within LLMs and shapes their working patterns

    UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

    Full text link
    In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE 50.550.5mm on the Human3.6M dataset and PAMPJPE 51.651.6mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications

    Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

    Full text link
    Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge of learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE 51.451.4mm without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE 42.642.6mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW
    corecore