52 research outputs found
Return of Frustratingly Easy Domain Adaptation
Unlike human learning, machine learning often fails to handle changes between
training (source) and test (target) input distributions. Such domain shifts,
common in practical scenarios, severely damage the performance of conventional
machine learning methods. Supervised domain adaptation methods have been
proposed for the case when the target data have labels, including some that
perform very well despite being "frustratingly easy" to implement. However, in
practice, the target domain is often unlabeled, requiring unsupervised
adaptation. We propose a simple, effective, and efficient method for
unsupervised domain adaptation called CORrelation ALignment (CORAL). CORAL
minimizes domain shift by aligning the second-order statistics of source and
target distributions, without requiring any target labels. Even though it is
extraordinarily simple--it can be implemented in four lines of Matlab
code--CORAL performs remarkably well in extensive evaluations on standard
benchmark datasets.Comment: Fixed typos. Full paper to appear in AAAI-16. Extended Abstract of
the full paper to appear in TASK-CV 2015 worksho
Influence of Polygonal Wear on Dynamic Performance of Wheels on High-Speed Trains
With increases in train speed and traffic density, polygonal wear of railway wheels arises accordingly, induced by the high impacts between wheels and rails, which is mainly related to operation safety and ride comfort of vehicle system. This work evaluates the effect of wheel polygon shape on the dynamic performance of the wheel set through numerical simulations. The finite element model, which includes the wheel set and the slab track, was established using ANSYS software to study the effects of polygonal wear on the dynamic behavior of the railway wheel. In the model, wheelârail interaction forces caused by polygon wheel shape were solved using Universal Mechanisms of wear and were then entered into the finite element model. Using the simulation model, the influence of the harmonic order and out-of-roundness amplitude of wheel polygon on transient dynamic behaviors of the wheels namely, the displacement, acceleration, and von Misses equivalent stress were investigated. The results indicate that both the maximum dynamic displacement and Von Misses equivalent stress of the wheel plate show proportionality to the OOR amplitude, the harmonic order and the vehicle velocity. Besides, the maximum Von Misses equivalent stress occurs close to the wheel center, whereas the maximum displacement occurs close to the wheel tread. The findings will provide a theoretical basis for on-board detection methods of monitoring wheel polygonal wear
Learning Deep Object Detectors from 3D Models
Crowdsourced 3D CAD models are becoming easily accessible online, and can
potentially generate an infinite number of training images for almost any
object category.We show that augmenting the training data of contemporary Deep
Convolutional Neural Net (DCNN) models with such synthetic data can be
effective, especially when real training data is limited or not well matched to
the target domain. Most freely available CAD models capture 3D shape but are
often missing other low level cues, such as realistic object texture, pose, or
background. In a detailed analysis, we use synthetic CAD-rendered images to
probe the ability of DCNN to learn without these cues, with surprising
findings. In particular, we show that when the DCNN is fine-tuned on the target
detection task, it exhibits a large degree of invariance to missing low-level
cues, but, when pretrained on generic ImageNet classification, it learns better
when the low-level cues are simulated. We show that our synthetic DCNN training
approach significantly outperforms previous methods on the PASCAL VOC2007
dataset when learning in the few-shot scenario and improves performance in a
domain shift scenario on the Office benchmark
LOWA: Localize Objects in the Wild with Attributes
We present LOWA, a novel method for localizing objects with attributes
effectively in the wild. It aims to address the insufficiency of current
open-vocabulary object detectors, which are limited by the lack of
instance-level attribute classification and rare class names. To train LOWA, we
propose a hybrid vision-language training strategy to learn object detection
and recognition with class names as well as attribute information. With LOWA,
users can not only detect objects with class names, but also able to localize
objects by attributes. LOWA is built on top of a two-tower vision-language
architecture and consists of a standard vision transformer as the image encoder
and a similar transformer as the text encoder. To learn the alignment between
visual and text inputs at the instance level, we train LOWA with three training
steps: object-level training, attribute-aware learning, and free-text joint
training of objects and attributes. This hybrid training strategy first ensures
correct object detection, then incorporates instance-level attribute
information, and finally balances the object class and attribute sensitivity.
We evaluate our model performance of attribute classification and attribute
localization on the Open-Vocabulary Attribute Detection (OVAD) benchmark and
the Visual Attributes in the Wild (VAW) dataset, and experiments indicate
strong zero-shot performance. Ablation studies additionally demonstrate the
effectiveness of each training step of our approach
Railway Polygonized Wheel Detection Based on Numerical Time-Frequency Analysis of Axle-Box Acceleration
The increasing need for repairs of polygonized wheels on high-speed railways in China is becoming problematic. At high speeds, polygonized wheels cause abnormal vibrations at the wheel-rail interface that can be detected via axle-box accelerations. To investigate the quantitative relationship between axle-box acceleration and wheel polygonization in both the time and frequency domains and under high-speed conditions, a dynamics model was developed to simulate the vehicle-track coupling system and that considers both wheel and track flexibility. The calculated axle-box accelerations were analyzed by using the improved ensemble empirical mode decomposition and Wigner-Ville distribution time-frequency method. The numerical results show that the maximum axle-box accelerations and their frequencies are quantitatively related to the harmonic order and out-of-roundness amplitude of polygonized wheels. In addition, measuring the axle-box acceleration enables both the detection of wheel polygonization and the identification of the degree of damage.
Document type: Articl
Evaluation and Mitigation of Agnosia in Multimodal Large Language Models
While Multimodal Large Language Models (MLLMs) are widely used for a variety
of vision-language tasks, one observation is that they sometimes misinterpret
visual inputs or fail to follow textual instructions even in straightforward
cases, leading to irrelevant responses, mistakes, and ungrounded claims. This
observation is analogous to a phenomenon in neuropsychology known as Agnosia,
an inability to correctly process sensory modalities and recognize things
(e.g., objects, colors, relations). In our study, we adapt this similar concept
to define "agnosia in MLLMs", and our goal is to comprehensively evaluate and
mitigate such agnosia in MLLMs. Inspired by the diagnosis and treatment process
in neuropsychology, we propose a novel framework EMMA (Evaluation and
Mitigation of Multimodal Agnosia). In EMMA, we develop an evaluation module
that automatically creates fine-grained and diverse visual question answering
examples to assess the extent of agnosia in MLLMs comprehensively. We also
develop a mitigation module to reduce agnosia in MLLMs through multimodal
instruction tuning on fine-grained conversations. To verify the effectiveness
of our framework, we evaluate and analyze agnosia in seven state-of-the-art
MLLMs using 9K test samples. The results reveal that most of them exhibit
agnosia across various aspects and degrees. We further develop a fine-grained
instruction set and tune MLLMs to mitigate agnosia, which led to notable
improvement in accuracy
Phylophenetic properties of metabolic pathway topologies as revealed by global analysis
BACKGROUND: As phenotypic features derived from heritable characters, the topologies of metabolic pathways contain both phylogenetic and phenetic components. In the post-genomic era, it is possible to measure the "phylophenetic" contents of different pathways topologies from a global perspective. RESULTS: We reconstructed phylophenetic trees for all available metabolic pathways based on topological similarities, and compared them to the corresponding 16S rRNA-based trees. Similarity values for each pair of trees ranged from 0.044 to 0.297. Using the quartet method, single pathways trees were merged into a comprehensive tree containing information from a large part of the entire metabolic networks. This tree showed considerably higher similarity (0.386) to the corresponding 16S rRNA-based tree than any tree based on a single pathway, but was, on the other hand, sufficiently distinct to preserve unique phylogenetic information not reflected by the 16S rRNA tree. CONCLUSION: We observed that the topology of different metabolic pathways provided different phylogenetic and phenetic information, depicting the compromise between phylogenetic information and varying evolutionary pressures forming metabolic pathway topologies in different organisms. The phylogenetic information content of the comprehensive tree is substantially higher than that of any tree based on a single pathway, which also gave clues to constraints working on the topology of the global metabolic networks, information that is only partly reflected by the topologies of individual metabolic pathways
Joint Adversarial Domain Adaptation
Domain adaptation aims to transfer the enriched label knowledge from large amounts of source data to unlabeled target data. It has raised significant interest in multimedia analysis. Existing researches mainly focus on learning domain-wise transferable representations via statistical moment matching or adversarial adaptation techniques, while ignoring the class-wise mismatch across domains, resulting in inaccurate distribution alignment. To address this issue, we propose a Joint Adversarial Domain Adaptation (JADA) approach to simultaneously align domain-wise and class-wise distributions across source and target in a unified adversarial learning process. Specifically, JADA attempts to solve two complementary minimax problems jointly. The feature generator aims to not only fool the well-trained domain discriminator to learn domain-invariant features, but also minimize the disagreement between two distinct task-specific classifiers' predictions to synthesize target features near the support of source class-wisely. As a result, the learned transferable features will be equipped with more discriminative structures, and effectively avoid mode collapse. Additionally, JADA enables an efficient end-to-end training manner via a simple back-propagation scheme. Extensive experiments on several real-world cross-domain benchmarks, including VisDA-2017, ImageCLEF, Office-31 and digits, verify that JADA can gain remarkable improvements over other state-of-the-art deep domain adaptation approaches
- âŚ