60,917 research outputs found
Recurrent 3D Pose Sequence Machines
3D human articulated pose recovery from monocular image sequences is very
challenging due to the diverse appearances, viewpoints, occlusions, and also
the human 3D pose is inherently ambiguous from the monocular imagery. It is
thus critical to exploit rich spatial and temporal long-range dependencies
among body joints for accurate 3D pose sequence prediction. Existing approaches
usually manually design some elaborate prior terms and human body kinematic
constraints for capturing structures, which are often insufficient to exploit
all intrinsic structures and not scalable for all scenarios. In contrast, this
paper presents a Recurrent 3D Pose Sequence Machine(RPSM) to automatically
learn the image-dependent structural constraint and sequence-dependent temporal
context by using a multi-stage sequential refinement. At each stage, our RPSM
is composed of three modules to predict the 3D pose sequences based on the
previously learned 2D pose representations and 3D poses: (i) a 2D pose module
extracting the image-dependent pose representations, (ii) a 3D pose recurrent
module regressing 3D poses and (iii) a feature adaption module serving as a
bridge between module (i) and (ii) to enable the representation transformation
from 2D to 3D domain. These three modules are then assembled into a sequential
prediction framework to refine the predicted poses with multiple recurrent
stages. Extensive evaluations on the Human3.6M dataset and HumanEva-I dataset
show that our RPSM outperforms all state-of-the-art approaches for 3D pose
estimation.Comment: Published in CVPR 201
End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis
Beyond current conversational chatbots or task-oriented dialogue systems that
have attracted increasing attention, we move forward to develop a dialogue
system for automatic medical diagnosis that converses with patients to collect
additional symptoms beyond their self-reports and automatically makes a
diagnosis. Besides the challenges for conversational dialogue systems (e.g.
topic transition coherency and question understanding), automatic medical
diagnosis further poses more critical requirements for the dialogue rationality
in the context of medical knowledge and symptom-disease relations. Existing
dialogue systems (Madotto, Wu, and Fung 2018; Wei et al. 2018; Li et al. 2017)
mostly rely on data-driven learning and cannot be able to encode extra expert
knowledge graph. In this work, we propose an End-to-End Knowledge-routed
Relational Dialogue System (KR-DS) that seamlessly incorporates rich medical
knowledge graph into the topic transition in dialogue management, and makes it
cooperative with natural language understanding and natural language
generation. A novel Knowledge-routed Deep Q-network (KR-DQN) is introduced to
manage topic transitions, which integrates a relational refinement branch for
encoding relations among different symptoms and symptom-disease pairs, and a
knowledge-routed graph branch for topic decision-making. Extensive experiments
on a public medical dialogue dataset show our KR-DS significantly beats
state-of-the-art methods (by more than 8% in diagnosis accuracy). We further
show the superiority of our KR-DS on a newly collected medical dialogue system
dataset, which is more challenging retaining original self-reports and
conversational data between patients and doctors.Comment: 8 pages, 5 figues, AAA
An Expressive Deep Model for Human Action Parsing from A Single Image
This paper aims at one newly raising task in vision and multimedia research:
recognizing human actions from still images. Its main challenges lie in the
large variations in human poses and appearances, as well as the lack of
temporal motion information. Addressing these problems, we propose to develop
an expressive deep model to naturally integrate human layout and surrounding
contexts for higher level action understanding from still images. In
particular, a Deep Belief Net is trained to fuse information from different
noisy sources such as body part detection and object detection. To bridge the
semantic gap, we used manually labeled data to greatly improve the
effectiveness and efficiency of the pre-training and fine-tuning stages of the
DBN training. The resulting framework is shown to be robust to sometimes
unreliable inputs (e.g., imprecise detections of human parts and objects), and
outperforms the state-of-the-art approaches.Comment: 6 pages, 8 figures, ICME 201
- …