11,859 research outputs found
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
Spectatorsâ aesthetic experiences of sound and movement in dance performance
In this paper we present a study of spectatorsâ aesthetic experiences of sound and movement in live dance performance. A multidisciplinary team comprising a choreographer, neuroscientists and qualitative researchers investigated the effects of different sound scores on dance spectators. What would be the impact of auditory stimulation on kinesthetic experience and/or aesthetic appreciation of the dance? What would be the effect of removing music altogether, so that spectators watched dance while hearing only the performersâ breathing and footfalls? We investigated audience experience through qualitative research, using post-performance focus groups, while a separately conducted functional brain imaging (fMRI) study measured the synchrony in brain activity across spectators when they watched dance with sound or breathing only. When audiences watched dance accompanied by music the fMRI data revealed evidence of greater intersubject synchronisation in a brain region consistent with complex auditory processing. The audience research found that some spectators derived pleasure from finding convergences between two complex stimuli (dance and music). The removal of music and the resulting audibility of the performersâ breathing had a significant impact on spectatorsâ aesthetic experience. The fMRI analysis showed increased synchronisation among observers, suggesting greater influence of the body when interpreting the dance stimuli. The audience research found evidence of similar corporeally focused experience. The paper discusses possible connections between the findings of our different approaches, and considers the implications of this study for interdisciplinary research collaborations between arts and sciences
The THUMOS Challenge on Action Recognition for Videos "in the Wild"
Automatically recognizing and localizing wide ranges of human actions has
crucial importance for video understanding. Towards this goal, the THUMOS
challenge was introduced in 2013 to serve as a benchmark for action
recognition. Until then, video action recognition, including THUMOS challenge,
had focused primarily on the classification of pre-segmented (i.e., trimmed)
videos, which is an artificial task. In THUMOS 2014, we elevated action
recognition to a more practical level by introducing temporally untrimmed
videos. These also include `background videos' which share similar scenes and
backgrounds as action videos, but are devoid of the specific actions. The three
editions of the challenge organized in 2013--2015 have made THUMOS a common
benchmark for action classification and detection and the annual challenge is
widely attended by teams from around the world.
In this paper we describe the THUMOS benchmark in detail and give an overview
of data collection and annotation procedures. We present the evaluation
protocols used to quantify results in the two THUMOS tasks of action
classification and temporal detection. We also present results of submissions
to the THUMOS 2015 challenge and review the participating approaches.
Additionally, we include a comprehensive empirical study evaluating the
differences in action recognition between trimmed and untrimmed videos, and how
well methods trained on trimmed videos generalize to untrimmed videos. We
conclude by proposing several directions and improvements for future THUMOS
challenges.Comment: Preprint submitted to Computer Vision and Image Understandin
The TREC-2002 video track report
TREC-2002 saw the second running of the Video Track, the goal of which was to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. The track used 73.3 hours of publicly available digital video (in MPEG-1/VCD format) downloaded by the participants directly from the Internet Archive (Prelinger Archives) (internetarchive, 2002) and some from the Open
Video Project (Marchionini, 2001). The material comprised advertising, educational, industrial, and amateur films produced between the 1930's and the 1970's by corporations, nonprofit organizations, trade associations, community and interest groups, educational institutions, and individuals. 17 teams representing 5 companies and 12 universities - 4 from Asia, 9 from Europe, and 4 from the US - participated in one or more of three tasks in the 2001 video track: shot boundary determination, feature extraction, and search (manual or interactive). Results were scored by NIST using manually created truth data for shot boundary determination and manual assessment of feature extraction and search results. This paper is an introduction to, and an overview
of, the track framework - the tasks, data, and measures - the approaches taken by the participating groups, the results, and issues regrading the evaluation. For detailed information about the approaches and results, the reader should see the various site reports in the final workshop proceedings
The TRECVID 2007 BBC rushes summarization evaluation pilot
This paper provides an overview of a pilot evaluation of
video summaries using rushes from several BBC dramatic series. It was carried out under the auspices of TRECVID.
Twenty-two research teams submitted video summaries of
up to 4% duration, of 42 individual rushes video files aimed
at compressing out redundant and insignificant material.
The output of two baseline systems built on straightforward
content reduction techniques was contributed by Carnegie
Mellon University as a control. Procedures for developing
ground truth lists of important segments from each video
were developed at Dublin City University and applied to
the BBC video. At NIST each summary was judged by
three humans with respect to how much of the ground truth
was included, how easy the summary was to understand,
and how much repeated material the summary contained.
Additional objective measures included: how long it took
the system to create the summary, how long it took the assessor to judge it against the ground truth, and what the
summary's duration was. Assessor agreement on finding desired segments averaged 78% and results indicate that while it is difficult to exceed the performance of baselines, a few systems did
PhD
dissertationThe purpose of this study is to develop a critical perspective which can be used to address the social reality of nursing. The study uses the tradition of the Frankfurt School and critical theory as the primary intellectual frameworks for developing this perspective. The investigation is primarily an exploratory, reflective study which seeks to develop a critical consciousness about nursing. The methodology used in this study differs significantly from empirical-analytic modes of inquiry. The investigation proceeds via the process of reflection. Radical reflection, the method used in this study, contains five stages or steps. They include: bracketing, historical recovery, critique, dialectical imagination and negotiation. The study proceeds in an exploratory way through each of these steps. As in other forms of reflection, findings produced in this study take the form of hypotheses. The study generates insights or interpretive hypotheses about the social construction of reality in nursing. As in other examples of reflection, these are hypotheses whose confirmation depends upon continued negotiation among nurses. Specific findings generated in this study include 1) a critique of scientistic consciousness in nursing, 2) a critique of bourgeois professional ideology in nursing and 3) a critique of sexism in nursing
The arts of action
The theory and culture of the arts has largely focused on the arts of objects, and neglected the arts of action â the âprocess artsâ. In the process arts, artists create artifacts to engender activity in their audience, for the sake of the audienceâs aesthetic appreciation of their own activity. This includes appreciating their own deliberations, choices, reactions, and movements. The process arts include games, urban planning, improvised social dance, cooking, and social food rituals. In the traditional object arts, the central aesthetic properties occur in the artistic artifact itself. It is the painting that is beautiful; the novel that is dramatic. In the process arts, the aesthetic properties occur in the activity of the appreciator. It is the game playerâs own decisions that are elegant, the rock climberâs own movement that is graceful, and the tango dancersâ rapport that is beautiful. The artifactâs role is to call forth and shape that activity, guiding it along aesthetic lines. I offer a theory of the process arts. Crucially, we must distinguish between the designed artifact and the prescribed focus of aesthetic appreciation. In the object arts, these are one and the same. The designed artifact is the painting, which is also the prescribed focus of appreciation. In the process arts, they are different. The designed artifact is the game, but the appreciator is prescribed to appreciate their own activity in playing the game. Next, I address the complex question of who the artist really is in a piece of process art â the designer or the active appreciator? Finally, I diagnose the lowly status of the process arts
Multiple view human activity recognition
Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2012.Thesis (Ph. D.) -- Bilkent University, 2012.Includes bibliographical references leaves 94-100.This thesis explores the human activity recognition problem when multiple
views are available. We follow two main directions: we first present a system
that performs volume matching using constructed 3D volumes from calibrated
cameras, then we present a flexible system based on frame matching directly
using multiple views. We examine the multiple view systems compared to single
view systems, and measure the performance improvements in recognition using
more views by various experiments.
Initial part of the thesis introduces compact representations for volumetric
data gained through reconstruction. The video frames recorded by many cameras
with significant overlap are fused by reconstruction, and the reconstructed
volumes are used as substitutes of action poses. We propose new pose descriptors
over these three dimensional volumes. Our first descriptor is based on the histogram
of oriented cylinders in various sizes and orientations. We then propose
another descriptor which is view-independent, and which does not require pose
alignment. We show the importance of discriminative pose representations within
simpler activity classification schemes. Activity recognition framework based on
volume matching presents promising results compared to the state-of-the-art.
Volume reconstruction is one natural approach for multi camera data fusion,
but there can be few cameras with overlapping views. In the second part of
the thesis, we introduce an architecture that is adaptable to various number of
cameras and features. The system collects and fuses activity judgments from
cameras using a voting scheme. The architecture requires no camera calibration.
Performance generally improves when there are more cameras and more features;
training and test cameras do not need to overlap; camera drop in or drop out is
handled easily with little penalty. Experiments support the performance penalties,
and advantages for using multiple views versus single view.Pehlivan, SelenPh.D
- âŠ