799 research outputs found
The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification
Fine-grained classification is challenging because categories can only be
discriminated by subtle and local differences. Variances in the pose, scale or
rotation usually make the problem more difficult. Most fine-grained
classification systems follow the pipeline of finding foreground object or
object parts (where) to extract discriminative features (what).
In this paper, we propose to apply visual attention to fine-grained
classification task using deep neural network. Our pipeline integrates three
types of attention: the bottom-up attention that propose candidate patches, the
object-level top-down attention that selects relevant patches to a certain
object, and the part-level top-down attention that localizes discriminative
parts. We combine these attentions to train domain-specific deep nets, then use
it to improve both the what and where aspects. Importantly, we avoid using
expensive annotations like bounding box or part information from end-to-end.
The weak supervision constraint makes our work easier to generalize.
We have verified the effectiveness of the method on the subsets of ILSVRC2012
dataset and CUB200_2011 dataset. Our pipeline delivered significant
improvements and achieved the best accuracy under the weakest supervision
condition. The performance is competitive against other methods that rely on
additional annotations
Attentional Neural Network: Feature Selection Using Cognitive Feedback
Attentional Neural Network is a new framework that integrates top-down
cognitive bias and bottom-up feature extraction in one coherent architecture.
The top-down influence is especially effective when dealing with high noise or
difficult segmentation problems. Our system is modular and extensible. It is
also easy to train and cheap to run, and yet can accommodate complex behaviors.
We obtain classification accuracy better than or competitive with state of art
results on the MNIST variation dataset, and successfully disentangle overlaid
digits with high success rates. We view such a general purpose framework as an
essential foundation for a larger system emulating the cognitive abilities of
the whole brain.Comment: Poster in Neural Information Processing Systems (NIPS) 201
Spectral Unsupervised Domain Adaptation for Visual Recognition
Unsupervised domain adaptation (UDA) aims to learn a well-performed model in
an unlabeled target domain by leveraging labeled data from one or multiple
related source domains. It remains a great challenge due to 1) the lack of
annotations in the target domain and 2) the rich discrepancy between the
distributions of source and target data. We propose Spectral UDA (SUDA), an
efficient yet effective UDA technique that works in the spectral space and is
generic across different visual recognition tasks in detection, classification
and segmentation. SUDA addresses UDA challenges from two perspectives. First,
it mitigates inter-domain discrepancies by a spectrum transformer (ST) that
maps source and target images into spectral space and learns to enhance
domain-invariant spectra while suppressing domain-variant spectra
simultaneously. To this end, we design novel adversarial multi-head spectrum
attention that leverages contextual information to identify domain-variant and
domain-invariant spectra effectively. Second, it mitigates the lack of
annotations in target domain by introducing multi-view spectral learning which
aims to learn comprehensive yet confident target representations by maximizing
the mutual information among multiple ST augmentations capturing different
spectral views of each target sample. Extensive experiments over different
visual tasks (e.g., detection, classification and segmentation) show that SUDA
achieves superior accuracy and it is also complementary with state-of-the-art
UDA methods with consistent performance boosts but little extra computation
Facile Preparation of Bimetallic MOF-derived Supported Tungstophosphoric Acid Composites for Biodiesel Production
In this work, the novel TPA@C-NiZr-MOF catalyst is synthesized by the impregnation of tungstophosphoric acid (TPA) on the NiZr-based metal-organic framework (NiZr-MOF) followed by calcination up to 300 °C. The as-prepared catalyst materials were structurally, morphologically, and texturally characterized by XRD, FTIR, temperature programmed desorption of NH3 ( TPD-NH3 ), N2 physisorption, SEM, TEM, and XPS. The prepared catalyst can be used as an efficient heterogeneous catalyst for biodiesel production from oleic acid (OA) with methanol. The results indicated that, in comparison to TPA@NiZr-MOF, the TPA@C-NiZr-MOF catalyst calcined at 300 °C exhibits excellent catalytic performance probably owing to the synergistic effect between TPA and metal oxide skeletons, high acidity, as well as larger surface area and pore size. Additionally, the TPA@C-NiZr-MOF catalyst can be reused in up to six cycles with an acceptable conversion. This study showed that the bimetallic MOF-derived composite materials can be used as an alternative potential heterogeneous catalyst toward biorefinery applications
Vision-Language Models for Vision Tasks: A Survey
Most visual recognition studies rely heavily on crowd-labelled data in deep
neural networks (DNNs) training, and they usually train a DNN for each single
visual recognition task, leading to a laborious and time-consuming visual
recognition paradigm. To address the two challenges, Vision-Language Models
(VLMs) have been intensively investigated recently, which learns rich
vision-language correlation from web-scale image-text pairs that are almost
infinitely available on the Internet and enables zero-shot predictions on
various visual recognition tasks with a single VLM. This paper provides a
systematic review of visual language models for various visual recognition
tasks, including: (1) the background that introduces the development of visual
recognition paradigms; (2) the foundations of VLM that summarize the
widely-adopted network architectures, pre-training objectives, and downstream
tasks; (3) the widely-adopted datasets in VLM pre-training and evaluations; (4)
the review and categorization of existing VLM pre-training methods, VLM
transfer learning methods, and VLM knowledge distillation methods; (5) the
benchmarking, analysis and discussion of the reviewed methods; (6) several
research challenges and potential research directions that could be pursued in
the future VLM studies for visual recognition. A project associated with this
survey has been created at https://github.com/jingyi0000/VLM_survey
The Validity of CET-6 among Chinese Students Studying Overseas
This paper focuses on the validity of College English Test Band 6 (CET-6) in oversea life among Chinese students to find out whether the scores of CET-6 can truly reflect students’ English language ability and whether it is possible to use the scores of CET-6 as a proof for English language proficiency. To do the survey, we conducted the survey by quantitative research methods with 50 samples in Universiti Putra Malaysia(UPM). After the collection and analysis of data, some current issues about the assessment standards of CET-6 are found, and suggestions are also given to improve the validity of CET-6
- …