4,541 research outputs found
Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks
Automatic body part recognition for CT slices can benefit various medical
image applications. Recent deep learning methods demonstrate promising
performance, with the requirement of large amounts of labeled images for
training. The intrinsic structural or superior-inferior slice ordering
information in CT volumes is not fully exploited. In this paper, we propose a
convolutional neural network (CNN) based Unsupervised Body part Regression
(UBR) algorithm to address this problem. A novel unsupervised learning method
and two inter-sample CNN loss functions are presented. Distinct from previous
work, UBR builds a coordinate system for the human body and outputs a
continuous score for each axial slice, representing the normalized position of
the body part in the slice. The training process of UBR resembles a
self-organization process: slice scores are learned from inter-slice
relationships. The training samples are unlabeled CT volumes that are abundant,
thus no extra annotation effort is needed. UBR is simple, fast, and accurate.
Quantitative and qualitative experiments validate its effectiveness. In
addition, we show two applications of UBR in network initialization and anomaly
detection.Comment: Oral presentation in ISBI1
Generic 3D Representation via Pose Estimation and Matching
Though a large body of computer vision research has investigated developing
generic semantic representations, efforts towards developing a similar
representation for 3D has been limited. In this paper, we learn a generic 3D
representation through solving a set of foundational proxy 3D tasks:
object-centric camera pose estimation and wide baseline feature matching. Our
method is based upon the premise that by providing supervision over a set of
carefully selected foundational tasks, generalization to novel tasks and
abstraction capabilities can be achieved. We empirically show that the internal
representation of a multi-task ConvNet trained to solve the above core problems
generalizes to novel 3D tasks (e.g., scene layout estimation, object pose
estimation, surface normal estimation) without the need for fine-tuning and
shows traits of abstraction abilities (e.g., cross-modality pose estimation).
In the context of the core supervised tasks, we demonstrate our representation
achieves state-of-the-art wide baseline feature matching results without
requiring apriori rectification (unlike SIFT and the majority of learned
features). We also show 6DOF camera pose estimation given a pair local image
patches. The accuracy of both supervised tasks come comparable to humans.
Finally, we contribute a large-scale dataset composed of object-centric street
view scenes along with point correspondences and camera pose information, and
conclude with a discussion on the learned representation and open research
questions.Comment: Published in ECCV16. See the project website
http://3drepresentation.stanford.edu/ and dataset website
https://github.com/amir32002/3D_Street_Vie
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
Predictive Coding for Dynamic Visual Processing: Development of Functional Hierarchy in a Multiple Spatio-Temporal Scales RNN Model
The current paper proposes a novel predictive coding type neural network
model, the predictive multiple spatio-temporal scales recurrent neural network
(P-MSTRNN). The P-MSTRNN learns to predict visually perceived human whole-body
cyclic movement patterns by exploiting multiscale spatio-temporal constraints
imposed on network dynamics by using differently sized receptive fields as well
as different time constant values for each layer. After learning, the network
becomes able to proactively imitate target movement patterns by inferring or
recognizing corresponding intentions by means of the regression of prediction
error. Results show that the network can develop a functional hierarchy by
developing a different type of dynamic structure at each layer. The paper
examines how model performance during pattern generation as well as predictive
imitation varies depending on the stage of learning. The number of limit cycle
attractors corresponding to target movement patterns increases as learning
proceeds. And, transient dynamics developing early in the learning process
successfully perform pattern generation and predictive imitation tasks. The
paper concludes that exploitation of transient dynamics facilitates successful
task performance during early learning periods.Comment: Accepted in Neural Computation (MIT press
- …