Search CORE

71 research outputs found

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

Author: He Kun
Plummer Bryan A.
Saenko Kate
Sclaroff Stan
Sigal Leonid
Xu Huijuan
Publication venue
Publication date: 25/12/2018
Field of study

We address the problem of text-based activity retrieval in video. Given a sentence describing an activity, our task is to retrieve matching clips from an untrimmed video. To capture the inherent structures present in both text and video, we introduce a multilevel model that integrates vision and language features earlier and more tightly than prior work. First, we inject text features early on when generating clip proposals, to help eliminate unlikely clips and thus speed up processing and boost performance. Second, to learn a fine-grained similarity metric for retrieval, we use visual features to modulate the processing of query sentences at the word level in a recurrent neural network. A multi-task loss is also employed by adding query re-generation as an auxiliary task. Our approach significantly outperforms prior work on two challenging benchmarks: Charades-STA and ActivityNet Captions.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Joint Alignment and Modeling of Correlated Behavior Streams

Author: LO PRESTI L.
Rozga A.
Sclaroff S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

The Variable Time-Shift Hidden Markov Model (VTS- HMM) is proposed for learning and modeling pairs of cor- related streams. Unlike previous coupled models for time series, the VTS-HMM accounts for varying time shifts be- tween correlated events in pairs of streams having different properties. The VTS-HMM is learned on a set of pairs of unaligned streams and, thus, learning entails simultaneous estimation of the varying time shifts and of the parameters of the model. The formulation is demonstrated in the analysis of videos of dyadic social interactions between children and adults in the Multimodal Dyadic Behavior Dataset (MMDB). In dyadic social interactions, an agent starts an interaction with one or more \u201cinitiating behaviors\u201d that elicit one or more \u201cresponding behaviors\u201d from the partner within a temporal window. The proposed VTS-HMM explicitly accounts for varying time shifts between initiating and responding behaviors in these behavior streams. The experiments confirm that modeling of these varying time shifts in the VTS-HMM can yield improved estimation of the level of engagement of the child and adult and more accurate dis- crimination among complex activities

Crossref

Archivio istituzionale della ricerca - Università di Palermo

MULE: Multimodal Universal Language Embedding

Author: Kim Donghyun
Plummer Bryan A.
Saenko Kate
Saito Kuniaki
Sclaroff Stan
Publication venue
Publication date: 28/12/2019
Field of study

Existing vision-language methods typically support two languages at a time at most. In this paper, we present a modular approach which can easily be incorporated into existing vision-language methods in order to support many languages. We accomplish this by learning a single shared Multimodal Universal Language Embedding (MULE) which has been visually-semantically aligned across all languages. Then we learn to relate MULE to visual data as if it were a single language. Our method is not architecture specific, unlike prior work which typically learned separate branches for each language, enabling our approach to easily be adapted to many vision-language methods and tasks. Since MULE learns a single language branch in the multimodal model, we can also scale to support many languages, and languages with fewer annotations can take advantage of the good representation learned from other (more abundant) language data. We demonstrate the effectiveness of MULE on the bidirectional image-sentence retrieval task, supporting up to four languages in a single model. In addition, we show that Machine Translation can be used for data augmentation in multilingual learning, which, combined with MULE, improves mean recall by up to 21.9% on a single-language compared to prior work, with the most significant gains seen on languages with relatively few annotations. Our code is publicly available.Comment: Accepted as an oral at AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Leveraging Affect Transfer Learning for Behavior Prediction in an Intelligent Tutoring System

Author: Ablavsky Vitaly
Allessio Danielle A.
Arroyo Ivon
Betke Margrit
Jalal Mona
Joshi Ajjen
Magee John J.
Murray Thomas
Ruiz Nataniel
Sclaroff Stan
Whitehill Jacob R.
Woolf Beverly P.
Yu Hao
Publication venue
Publication date: 12/02/2020
Field of study

In the context of building an intelligent tutoring system (ITS), which improves student learning outcomes by intervention, we set out to improve prediction of student problem outcome. In essence, we want to predict the outcome of a student answering a problem in an ITS from a video feed by analyzing their face and gestures. For this, we present a novel transfer learning facial affect representation and a user-personalized training scheme that unlocks the potential of this representation. We model the temporal structure of video sequences of students solving math problems using a recurrent neural network architecture. Additionally, we extend the largest dataset of student interactions with an intelligent online math tutor by a factor of two. Our final model, coined ATL-BP (Affect Transfer Learning for Behavior Prediction) achieves an increase in mean F-score over state-of-the-art of 45% on this new dataset in the general case and 50% in a more challenging leave-users-out experimental setting when we use a user-personalized training scheme

arXiv.org e-Print Archive

Clark University

Memetic electromagnetism algorithm for surface reconstruction with rational bivariate Bernstein basis functions

Author: A Gálvez
A Gálvez
A Gálvez
A Gálvez
A Gálvez
A Gálvez
A Iglesias
A Iglesias
Akemi Gálvez
Andrés Iglesias
D Meyers
DR Forsey
E Castillo
E Castillo
F Schmitt
F Yoshimoto
G Farin
H Akaike
H Akaike
H Fuchs
H Park
H Park
H Park
H Pottmann
I Maekawa
J Barhak
L Piegl
M Hoffmann
M Jones
MC Leu
MG Cox
NM Patrikalakis
NR Draper
P Gu
R Luus
RE Barnhill
RH Franke
RM Bolle
S Sclaroff
SI Birbil
SI Birbil
T Varady
TA Foley
V Pratt
V Savchenko
W Li
WJ Gordon
WY Ma
X Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Surface reconstruction is a very important issue with outstanding applications in fields such as medical imaging (computer tomography, magnetic resonance), biomedical engineering (customized prosthesis and medical implants), computer-aided design and manufacturing (reverse engineering for the automotive, aerospace and shipbuilding industries), rapid prototyping (scale models of physical parts from CAD data), computer animation and film industry (motion capture, character modeling), archaeology (digital representation and storage of archaeological sites and assets), virtual/augmented reality, and many others. In this paper we address the surface reconstruction problem by using rational Bézier surfaces. This problem is by far more complex than the case for curves we solved in a previous paper. In addition, we deal with data points subjected to measurement noise and irregular sampling, replicating the usual conditions of real-world applications. Our method is based on a memetic approach combining a powerful metaheuristic method for global optimization (the electromagnetism algorithm) with a local search method. This method is applied to a benchmark of five illustrative examples exhibiting challenging features. Our experimental results show that the method performs very well, and it can recover the underlying shape of surfaces with very good accuracy.This research is kindly supported by the Computer Science National Program of the Spanish Ministry of Economy and Competitiveness, Project #TIN2012-30768, Toho University, and the University of Cantabria. The authors are particularly grateful to the Department of Information Science of Toho University for all the facilities given to carry out this work. We also thank the Editor and the two anonymous reviewers who helped us to improve our paper with several constructive comments and suggestions

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UCrea