26 research outputs found
A computational framework for unsupervised analysis of everyday human activities
In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner.
A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity.
Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify
a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments.Ph.D.Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Reh
Compact Random Feature Maps
Kernel approximation using randomized feature maps has recently gained a lot
of interest. In this work, we identify that previous approaches for polynomial
kernel approximation create maps that are rank deficient, and therefore do not
utilize the capacity of the projected feature space effectively. To address
this challenge, we propose compact random feature maps (CRAFTMaps) to
approximate polynomial kernels more concisely and accurately. We prove the
error bounds of CRAFTMaps demonstrating their superior kernel reconstruction
performance compared to the previous approximation schemes. We show how
structured random matrices can be used to efficiently generate CRAFTMaps, and
present a single-pass algorithm using CRAFTMaps to learn non-linear multi-class
classifiers. We present experiments on multiple standard data-sets with
performance competitive with state-of-the-art results.Comment: 9 page
Audio-Visual Flow - A Variational Approach to Multi-Modal Flow Estimation
© 2004 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Presented at the 2004 IEEE International Conference on Image Processing (ICIP 2004), 24-27 October 2004, Singapore.DOI: 101109/ICIP.2004.1421626Just as a motion field is associated to a moving object, an
audio field can he associated to an object that can behave
as a sound source. The flow field of such a sound source
which moves over time would not only have an optical component,
but also an audio component; something we call
audio-visual How. In this paper we present a common structure
tensor based variational framework for dense audiovisual
flow-field estimation. The proposed scheme improves
the rank of the local structure tensor by incorporating an BUdio
information channel which is substantially un-correlated
from the complementing visual information channel. The
scheme allows ascribing weights to individual sensor modalities
based on the confidence in their corresponding measurements.
Uesults arc presented to demonstrate how combining
multiple modalities in our proposed framework can
provide a possible solution to temporary full visual occlusions
LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization
We present a simple yet effective self-supervised pre-training method for
image harmonization which can leverage large-scale unannotated image datasets.
To achieve this goal, we first generate pre-training data online with our
Label-Efficient Masked Region Transform (LEMaRT) pipeline. Given an image,
LEMaRT generates a foreground mask and then applies a set of transformations to
perturb various visual attributes, e.g., defocus blur, contrast, saturation, of
the region specified by the generated mask. We then pre-train image
harmonization models by recovering the original image from the perturbed image.
Secondly, we introduce an image harmonization model, namely SwinIH, by
retrofitting the Swin Transformer [27] with a combination of local and global
self-attention mechanisms. Pre-training SwinIH with LEMaRT results in a new
state of the art for image harmonization, while being label-efficient, i.e.,
consuming less annotated data for fine-tuning than existing methods. Notably,
on iHarmony4 dataset [8], SwinIH outperforms the state of the art, i.e., SCS-Co
[16] by a margin of 0.4 dB when it is fine-tuned on only 50% of the training
data, and by 1.0 dB when it is trained on the full training dataset.Comment: Accepted by CVPR'23, 19 page
A Visualization Framework for Team Sports Captured using Multiple Static Cameras
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.DOI: http://dx.doi.org/
10.1016/j.cviu.2013.09.006We present a novel approach for robust localization of multiple people observed using a set of static cameras. We use this
location information to generate a visualization of the virtual offside line in soccer games. To compute the position of the offside line,
we need to localize players' positions, and identify their team roles. We solve the problem of fusing corresponding players' positional
information by finding minimum weight K-length cycles in a complete K-partite graph. Each partite of the graph corresponds to one of
the K cameras, whereas each node of a partite encodes the position and appearance of a player observed from a particular camera.
To find the minimum weight cycles in this graph, we use a dynamic programming based approach that varies over a continuum from
maximally to minimally greedy in terms of the number of graph-paths explored at each iteration. We present proofs for the efficiency
and performance bounds of our algorithms. Finally, we demonstrate the robustness of our framework by testing it on 82,000 frames of
soccer footage captured over eight different illumination conditions, play types, and team attire. Our framework runs in near-real time,
and processes video from 3 full HD cameras in about 0.4 seconds for each set of corresponding 3 frames
Unsupervised Activity Discovery and Characterization for Sensor-Rich Environments
This thesis presents an unsupervised method for discovering and analyzing the different
kinds of activities in an active environment. Drawing from natural language processing, a
novel representation of activities as bags of event n-grams is introduced, where the global
structural information of activities using their local event statistics is analyzed. It is demonstrated how maximal cliques in an undirected edge-weighted graph of activities, can be used in an unsupervised manner, to discover the different activity-classes. Taking on some work done in computer networks and bio-informatics, it is shown how to characterize these discovered activity-classes from a wholestic as well as a by-parts view-point. A definition of anomalous activities is formulated along with a way to detect them based on the difference of an activity instance from each of the discovered activity-classes. Finally, an information theoretic method to explain the detected anomalies in a human-interpretable form is presented. Results over extensive data-sets, collected from multiple active environments are
presented, to show the competence and generalizability of the proposed framework.M.S.Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: Irfan Ess