148,046 research outputs found
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Unsupervised Action Proposal Ranking through Proposal Recombination
Recently, action proposal methods have played an important role in action
recognition tasks, as they reduce the search space dramatically. Most
unsupervised action proposal methods tend to generate hundreds of action
proposals which include many noisy, inconsistent, and unranked action
proposals, while supervised action proposal methods take advantage of
predefined object detectors (e.g., human detector) to refine and score the
action proposals, but they require thousands of manual annotations to train.
Given the action proposals in a video, the goal of the proposed work is to
generate a few better action proposals that are ranked properly. In our
approach, we first divide action proposal into sub-proposal and then use
Dynamic Programming based graph optimization scheme to select the optimal
combinations of sub-proposals from different proposals and assign each new
proposal a score. We propose a new unsupervised image-based actioness detector
that leverages web images and employs it as one of the node scores in our graph
formulation. Moreover, we capture motion information by estimating the number
of motion contours within each action proposal patch. The proposed method is an
unsupervised method that neither needs bounding box annotations nor video level
labels, which is desirable with the current explosion of large-scale action
datasets. Our approach is generic and does not depend on a specific action
proposal method. We evaluate our approach on several publicly available trimmed
and un-trimmed datasets and obtain better performance compared to several
proposal ranking methods. In addition, we demonstrate that properly ranked
proposals produce significantly better action detection as compared to
state-of-the-art proposal based methods
Pre and Post-hoc Diagnosis and Interpretation of Malignancy from Breast DCE-MRI
We propose a new method for breast cancer screening from DCE-MRI based on a
post-hoc approach that is trained using weakly annotated data (i.e., labels are
available only at the image level without any lesion delineation). Our proposed
post-hoc method automatically diagnosis the whole volume and, for positive
cases, it localizes the malignant lesions that led to such diagnosis.
Conversely, traditional approaches follow a pre-hoc approach that initially
localises suspicious areas that are subsequently classified to establish the
breast malignancy -- this approach is trained using strongly annotated data
(i.e., it needs a delineation and classification of all lesions in an image).
Another goal of this paper is to establish the advantages and disadvantages of
both approaches when applied to breast screening from DCE-MRI. Relying on
experiments on a breast DCE-MRI dataset that contains scans of 117 patients,
our results show that the post-hoc method is more accurate for diagnosing the
whole volume per patient, achieving an AUC of 0.91, while the pre-hoc method
achieves an AUC of 0.81. However, the performance for localising the malignant
lesions remains challenging for the post-hoc method due to the weakly labelled
dataset employed during training.Comment: Submitted to Medical Image Analysi
About the nature of Kansei information, from abstract to concrete
Designer’s expertise refers to the scientific fields of emotional design and kansei information. This paper aims to answer to a scientific major issue which is, how to formalize designer’s knowledge, rules, skills into kansei information systems. Kansei can be considered as a psycho-physiologic, perceptive, cognitive and affective process through a particular experience. Kansei oriented methods include various approaches which deal with semantics and emotions, and show the correlation with some design properties. Kansei words may include semantic, sensory, emotional descriptors, and also objects names and product attributes. Kansei levels of information can be seen on an axis going from abstract to concrete dimensions. Sociological value is the most abstract information positioned on this axis. Previous studies demonstrate the values the people aspire to drive their emotional reactions in front of particular semantics. This means that the value dimension should be considered in kansei studies. Through a chain of value-function-product attributes it is possible to enrich design generation and design evaluation processes. This paper describes some knowledge structures and formalisms we established according to this chain, which can be further used for implementing computer aided design tools dedicated to early design. These structures open to new formalisms which enable to integrate design information in a non-hierarchical way. The foreseen algorithmic implementation may be based on the association of ontologies and bag-of-words.AN
STV-based Video Feature Processing for Action Recognition
In comparison to still image-based processes, video features can provide rich and intuitive information about dynamic events occurred over a period of time, such as human actions, crowd behaviours, and other subject pattern changes. Although substantial progresses have been made in the last decade on image processing and seen its successful applications in face matching and object recognition, video-based event detection still remains one of the most difficult challenges in computer vision research due to its complex continuous or discrete input signals, arbitrary dynamic feature definitions, and the often ambiguous analytical methods. In this paper, a Spatio-Temporal Volume (STV) and region intersection (RI) based 3D shape-matching method has been proposed to facilitate the definition and recognition of human actions recorded in videos. The distinctive characteristics and the performance gain of the devised approach stemmed from a coefficient factor-boosted 3D region intersection and matching mechanism developed in this research. This paper also reported the investigation into techniques for efficient STV data filtering to reduce the amount of voxels (volumetric-pixels) that need to be processed in each operational cycle in the implemented system. The encouraging features and improvements on the operational performance registered in the experiments have been discussed at the end
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
- …