148 research outputs found
Live American Sign Language Letter Classification with Convolutional Neural Networks
This project is centered around building a neural network that is able to
recognize ASL letters in images, particularly within the scope of a live video
feed. Initial testing results came up short of expectations when both the
convolutional network and VGG16 transfer learning approaches failed to
generalize in settings of different backgrounds. The use of a pre-trained hand
joint detection model was then adopted with the produced joint locations being
fed into a fully-connected neural network. The results of this approach
exceeded those of prior methods and generalized well to a live video feed
application.Comment: 10 pages, 10 figure
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
Vision-language models such as CLIP learn a generic text-image embedding from
large-scale training data. A vision-language model can be adapted to a new
classification task through few-shot prompt tuning. We find that such a prompt
tuning process is highly robust to label noises. This intrigues us to study the
key reasons contributing to the robustness of the prompt tuning paradigm. We
conducted extensive experiments to explore this property and find the key
factors are: 1) the fixed classname tokens provide a strong regularization to
the optimization of the model, reducing gradients induced by the noisy samples;
2) the powerful pre-trained image-text embedding that is learned from diverse
and generic web data provides strong prior knowledge for image classification.
Further, we demonstrate that noisy zero-shot predictions from CLIP can be used
to tune its own prompt, significantly enhancing prediction accuracy in the
unsupervised setting. The code is available at https://github.com/CEWu/PTNL.Comment: Accepted by ICCV202
Wizundry: A Cooperative Wizard of Oz Platform for Simulating Future Speech-based Interfaces with Multiple Wizards
Wizard of Oz (WoZ) as a prototyping method has been used to simulate
intelligent user interfaces, particularly for speech-based systems. However, as
our societies' expectations on artificial intelligence (AI) grows, the question
remains whether a single Wizard is sufficient for it to simulate smarter
systems and more complex interactions. Optimistic visions of 'what artificial
intelligence (AI) can do' places demands on WoZ platforms to simulate smarter
systems and more complex interactions. This raises the question of whether the
typical approach of employing a single Wizard is sufficient. Moreover, while
existing work has employed multiple Wizards in WoZ studies, a multi-Wizard
approach has not been systematically studied in terms of feasibility,
effectiveness, and challenges. We offer Wizundry, a real-time, web-based WoZ
platform that allows multiple Wizards to collaboratively operate a
speech-to-text based system remotely. We outline the design and technical
specifications of our open-source platform, which we iterated over two design
phases. We report on two studies in which participant-Wizards were tasked with
negotiating how to cooperatively simulate an interface that can handle natural
speech for dictation and text editing as well as other intelligent text
processing tasks. We offer qualitative findings on the Multi-Wizard experience
for Dyads and Triads of Wizards. Our findings reveal the promises and
challenges of the multi-Wizard approach and open up new research questions.Comment: 34 page
Summation invariant and its application to shape recognition
ABSTRACT A novel summation invariant of curves under transformation group action is proposed. This new invariant is less sensitive to noise than the differential invariant and does not require an analytical expression for the curve as the integral invariant does. We exploit this summation invariant to define a shape descriptor called a semi-local summation invariant and use it as a new feature for shape recognition. Tested on a database of noisy shapes of fishes, it was observed that the summation invariant feature exhibited superior discriminating power than that of wavelet-based invariant features
Inform, involve, and interact - Bring the society and its members together
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder
- …