126 research outputs found
Learning Culture: Cultural Relationship in Masked Lanterns
Culture shock, or culture conflict, is the unfamiliarity or disorientation an individual experiences after encountering a culture different than their own. To better understand the people around us who share a different culture and the way of life it creates, we need to first respect and understand their culture. In general, Chinese culture stresses that individuals must see themselves as part of a larger group for the benefit of society, while American culture stresses the importance of individualism.
Based on my experiences in graphic design, I decided to further my studies in a studio art context to understand how the cultures of artists affect their artwork. It is important for us to have a basic sense of other cultures to appreciate the value of human development, as well as, appreciating the different forms of beauty that make the world more interesting to explore. This appreciation of beauty and human development is often encountered when experiencing works of art and design.
The issues which arise from cyber bullying reach across the globe, and I have seen them firsthand throughout my life. It is my goal to delve into this issue and compare how individuals from different cultural backgrounds react to this issue. For this project, I surveyed the responses regarding the issue of cyber bullying from Americans, Chinese-American, and Chinese international students, in order to understand how one’s culture influences the opinions people form on the issue. I created several art works to share with the viewer my results
The CLIP Model is Secretly an Image-to-Prompt Converter
The Stable Diffusion model is a prominent text-to-image generation model that
relies on a text prompt as its input, which is encoded using the Contrastive
Language-Image Pre-Training (CLIP). However, text prompts have limitations when
it comes to incorporating implicit information from reference images. Existing
methods have attempted to address this limitation by employing expensive
training procedures involving millions of training samples for image-to-image
generation. In contrast, this paper demonstrates that the CLIP model, as
utilized in Stable Diffusion, inherently possesses the ability to
instantaneously convert images into text prompts. Such an image-to-prompt
conversion can be achieved by utilizing a linear projection matrix that is
calculated in a closed form. Moreover, the paper showcases that this capability
can be further enhanced by either utilizing a small amount of similar-domain
training data (approximately 100 images) or incorporating several online
training steps (around 30 iterations) on the reference images. By leveraging
these approaches, the proposed method offers a simple and flexible solution to
bridge the gap between images and text prompts. This methodology can be applied
to various tasks such as image variation and image editing, facilitating more
effective and seamless interaction between images and textual prompts.Comment: Accepted by NeurIPS 2023, 21 pages, 28 figure
WordSup: Exploiting Word Annotations for Character based Text Detection
Imagery texts are usually organized as a hierarchy of several visual
elements, i.e. characters, words, text lines and text blocks. Among these
elements, character is the most basic one for various languages such as
Western, Chinese, Japanese, mathematical expression and etc. It is natural and
convenient to construct a common text detection engine based on character
detectors. However, training character detectors requires a vast of location
annotated characters, which are expensive to obtain. Actually, the existing
real text datasets are mostly annotated in word or line level. To remedy this
dilemma, we propose a weakly supervised framework that can utilize word
annotations, either in tight quadrangles or the more loose bounding boxes, for
character detector training. When applied in scene text detection, we are thus
able to train a robust character detector by exploiting word annotations in the
rich large-scale real scene text datasets, e.g. ICDAR15 and COCO-text. The
character detector acts as a key role in the pipeline of our text detection
engine. It achieves the state-of-the-art performance on several challenging
scene text detection benchmarks. We also demonstrate the flexibility of our
pipeline by various scenarios, including deformed text detection and math
expression recognition.Comment: 2017 International Conference on Computer Visio
Don't Stop Learning: Towards Continual Learning for the CLIP Model
The Contrastive Language-Image Pre-training (CLIP) Model is a recently
proposed large-scale pre-train model which attracts increasing attention in the
computer vision community. Benefiting from its gigantic image-text training
set, the CLIP model has learned outstanding capabilities in zero-shot learning
and image-text matching. To boost the recognition performance of CLIP on some
target visual concepts, it is often desirable to further update the CLIP model
by fine-tuning some classes-of-interest on extra training data. This operation,
however, raises an important concern: will the update hurt the zero-shot
learning or image-text matching capability of the CLIP, i.e., the catastrophic
forgetting issue? If yes, could existing continual learning algorithms be
adapted to alleviate the risk of catastrophic forgetting? To answer these
questions, this work conducts a systemic study on the continual learning issue
of the CLIP model. We construct evaluation protocols to measure the impact of
fine-tuning updates and explore different ways to upgrade existing continual
learning methods to mitigate the forgetting issue of the CLIP model. Our study
reveals the particular challenges of CLIP continual learning problem and lays a
foundation for further researches. Moreover, we propose a new algorithm, dubbed
Learning without Forgetting via Replayed Vocabulary (VR-LwF), which shows exact
effectiveness for alleviating the forgetting issue of the CLIP model.Comment: 12 pages, 5 figure
Dynamic Model Identification for 6-DOF Industrial Robots
A complete and systematic procedure for the dynamical parameters identification of industrial robot manipulator is presented. The system model of robot including joint friction model is linear with respect to the dynamical parameters. Identification experiments are carried out for a 6-degree-of-freedom (DOF) ER-16 robot. Relevant data is sampled while the robot is tracking optimal trajectories that excite the system. The artificial bee colony algorithm is introduced to estimate the unknown parameters. And we validate the dynamical model according to torque prediction accuracy. All the results are presented to demonstrate the efficiency of our proposed identification algorithm and the accuracy of the identified robot model
Awesome-META+: Meta-Learning Research and Learning Platform
Artificial intelligence technology has already had a profound impact in
various fields such as economy, industry, and education, but still limited.
Meta-learning, also known as "learning to learn", provides an opportunity for
general artificial intelligence, which can break through the current AI
bottleneck. However, meta learning started late and there are fewer projects
compare with CV, NLP etc. Each deployment requires a lot of experience to
configure the environment, debug code or even rewrite, and the frameworks are
isolated. Moreover, there are currently few platforms that focus exclusively on
meta-learning, or provide learning materials for novices, for which the
threshold is relatively high. Based on this, Awesome-META+, a meta-learning
framework integration and learning platform is proposed to solve the above
problems and provide a complete and reliable meta-learning framework
application and learning platform. The project aims to promote the development
of meta-learning and the expansion of the community, including but not limited
to the following functions: 1) Complete and reliable meta-learning framework,
which can adapt to multi-field tasks such as target detection, image
classification, and reinforcement learning. 2) Convenient and simple model
deployment scheme which provide convenient meta-learning transfer methods and
usage methods to lower the threshold of meta-learning and improve efficiency.
3) Comprehensive researches for learning. 4) Objective and credible performance
analysis and thinking
PiP: Planning-informed Trajectory Prediction for Autonomous Driving
It is critical to predict the motion of surrounding vehicles for self-driving
planning, especially in a socially compliant and flexible way. However, future
prediction is challenging due to the interaction and uncertainty in driving
behaviors. We propose planning-informed trajectory prediction (PiP) to tackle
the prediction problem in the multi-agent setting. Our approach is
differentiated from the traditional manner of prediction, which is only based
on historical information and decoupled with planning. By informing the
prediction process with the planning of ego vehicle, our method achieves the
state-of-the-art performance of multi-agent forecasting on highway datasets.
Moreover, our approach enables a novel pipeline which couples the prediction
and planning, by conditioning PiP on multiple candidate trajectories of the ego
vehicle, which is highly beneficial for autonomous driving in interactive
scenarios.Comment: European Conference on Computer Vision (ECCV) 2020; Project page at
http://haoran-song.github.io/planning-informed-predictio
Which Framework is Suitable for Online 3D Multi-Object Tracking for Autonomous Driving with Automotive 4D Imaging Radar?
Online 3D multi-object tracking (MOT) has recently received significant
research interests due to the expanding demand of 3D perception in advanced
driver assistance systems (ADAS) and autonomous driving (AD). Among the
existing 3D MOT frameworks for ADAS and AD, conventional point object tracking
(POT) framework using the tracking-by-detection (TBD) strategy has been well
studied and accepted for LiDAR and 4D imaging radar point clouds. In contrast,
extended object tracking (EOT), another important framework which accepts the
joint-detection-and-tracking (JDT) strategy, has rarely been explored for
online 3D MOT applications. This paper provides the first systematical
investigation of the EOT framework for online 3D MOT in real-world ADAS and AD
scenarios. Specifically, the widely accepted TBD-POT framework, the recently
investigated JDT-EOT framework, and our proposed TBD-EOT framework are compared
via extensive evaluations on two open source 4D imaging radar datasets:
View-of-Delft and TJ4DRadSet. Experiment results demonstrate that the
conventional TBD-POT framework remains preferable for online 3D MOT with high
tracking performance and low computational complexity, while the proposed
TBD-EOT framework has the potential to outperform it in certain situations.
However, the results also show that the JDT-EOT framework encounters multiple
problems and performs inadequately in evaluation scenarios. After analyzing the
causes of these phenomena based on various evaluation metrics and
visualizations, we provide possible guidelines to improve the performance of
these MOT frameworks on real-world data. These provide the first benchmark and
important insights for the future development of 4D imaging radar-based online
3D MOT.Comment: 8 pages, 5 figures, submitted to the 2024 IEEE International
Conference on Robotics and Automation (ICRA2024
CFGPT: Chinese Financial Assistant with Large Language Model
Large language models (LLMs) have demonstrated great potential in natural
language processing tasks within the financial domain. In this work, we present
a Chinese Financial Generative Pre-trained Transformer framework, named CFGPT,
which includes a dataset~(CFData) for pre-training and supervised fine-tuning,
a financial LLM~(CFLLM) to adeptly manage financial texts, and a deployment
framework~(CFAPP) designed to navigate real-world financial applications. The
CFData comprising both a pre-training dataset and a supervised fine-tuning
dataset, where the pre-training dataset collates Chinese financial data and
analytics, alongside a smaller subset of general-purpose text with 584M
documents and 141B tokens in total, and the supervised fine-tuning dataset is
tailored for six distinct financial tasks, embodying various facets of
financial analysis and decision-making with 1.5M instruction pairs and 1.5B
tokens in total. The CFLLM, which is based on InternLM-7B to balance the model
capability and size, is trained on CFData in two stage, continued pre-training
and supervised fine-tuning. The CFAPP is centered on large language models
(LLMs) and augmented with additional modules to ensure multifaceted
functionality in real-world application. Our codes are released at
https://github.com/TongjiFinLab/CFGPT.Comment: 12 pages, 5 figure
- …