198,295 research outputs found
Model-based clustering via linear cluster-weighted models
A novel family of twelve mixture models with random covariates, nested in the
linear cluster-weighted model (CWM), is introduced for model-based
clustering. The linear CWM was recently presented as a robust alternative
to the better known linear Gaussian CWM. The proposed family of models provides
a unified framework that also includes the linear Gaussian CWM as a special
case. Maximum likelihood parameter estimation is carried out within the EM
framework, and both the BIC and the ICL are used for model selection. A simple
and effective hierarchical random initialization is also proposed for the EM
algorithm. The novel model-based clustering technique is illustrated in some
applications to real data. Finally, a simulation study for evaluating the
performance of the BIC and the ICL is presented
Synthetic-Neuroscore: Using A Neuro-AI Interface for Evaluating Generative Adversarial Networks
Generative adversarial networks (GANs) are increasingly attracting attention
in the computer vision, natural language processing, speech synthesis and
similar domains. Arguably the most striking results have been in the area of
image synthesis. However, evaluating the performance of GANs is still an open
and challenging problem. Existing evaluation metrics primarily measure the
dissimilarity between real and generated images using automated statistical
methods. They often require large sample sizes for evaluation and do not
directly reflect human perception of image quality. In this work, we describe
an evaluation metric we call Neuroscore, for evaluating the performance of
GANs, that more directly reflects psychoperceptual image quality through the
utilization of brain signals. Our results show that Neuroscore has superior
performance to the current evaluation metrics in that: (1) It is more
consistent with human judgment; (2) The evaluation process needs much smaller
numbers of samples; and (3) It is able to rank the quality of images on a per
GAN basis. A convolutional neural network (CNN) based neuro-AI interface is
proposed to predict Neuroscore from GAN-generated images directly without the
need for neural responses. Importantly, we show that including neural responses
during the training phase of the network can significantly improve the
prediction capability of the proposed model. Materials related to this work are
provided at https://github.com/villawang/Neuro-AI-Interface
Generic 3D Representation via Pose Estimation and Matching
Though a large body of computer vision research has investigated developing
generic semantic representations, efforts towards developing a similar
representation for 3D has been limited. In this paper, we learn a generic 3D
representation through solving a set of foundational proxy 3D tasks:
object-centric camera pose estimation and wide baseline feature matching. Our
method is based upon the premise that by providing supervision over a set of
carefully selected foundational tasks, generalization to novel tasks and
abstraction capabilities can be achieved. We empirically show that the internal
representation of a multi-task ConvNet trained to solve the above core problems
generalizes to novel 3D tasks (e.g., scene layout estimation, object pose
estimation, surface normal estimation) without the need for fine-tuning and
shows traits of abstraction abilities (e.g., cross-modality pose estimation).
In the context of the core supervised tasks, we demonstrate our representation
achieves state-of-the-art wide baseline feature matching results without
requiring apriori rectification (unlike SIFT and the majority of learned
features). We also show 6DOF camera pose estimation given a pair local image
patches. The accuracy of both supervised tasks come comparable to humans.
Finally, we contribute a large-scale dataset composed of object-centric street
view scenes along with point correspondences and camera pose information, and
conclude with a discussion on the learned representation and open research
questions.Comment: Published in ECCV16. See the project website
http://3drepresentation.stanford.edu/ and dataset website
https://github.com/amir32002/3D_Street_Vie
Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching
This paper presents a robotic pick-and-place system that is capable of
grasping and recognizing both known and novel objects in cluttered
environments. The key new feature of the system is that it handles a wide range
of object categories without needing any task-specific training data for novel
objects. To achieve this, it first uses a category-agnostic affordance
prediction algorithm to select and execute among four different grasping
primitive behaviors. It then recognizes picked objects with a cross-domain
image classification framework that matches observed images to product images.
Since product images are readily available for a wide range of objects (e.g.,
from the web), the system works out-of-the-box for novel objects without
requiring any additional training data. Exhaustive experimental results
demonstrate that our multi-affordance grasping achieves high success rates for
a wide variety of objects in clutter, and our recognition algorithm achieves
high accuracy for both known and novel grasped objects. The approach was part
of the MIT-Princeton Team system that took 1st place in the stowing task at the
2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are
available online at http://arc.cs.princeton.eduComment: Project webpage: http://arc.cs.princeton.edu Summary video:
https://youtu.be/6fG7zwGfIk
A framework for evaluating stereo-based pedestrian detection techniques
Automated pedestrian detection, counting, and tracking have received significant attention in the computer vision community of late. As such, a variety of techniques have been investigated using both traditional 2-D computer vision techniques and, more recently, 3-D stereo information. However, to date, a quantitative assessment of the performance of stereo-based pedestrian detection has been problematic, mainly due to the lack of standard stereo-based test data and an agreed methodology for carrying out the evaluation. This has forced researchers into making subjective comparisons between competing approaches. In this paper, we propose a framework for the quantitative evaluation of a short-baseline stereo-based pedestrian detection system. We provide freely available synthetic and real-world test data and recommend a set of evaluation metrics. This allows researchers to benchmark systems, not only with respect to other stereo-based approaches, but also with more traditional 2-D approaches. In order to illustrate its usefulness, we demonstrate the application of this framework to evaluate our own recently proposed technique for pedestrian detection and tracking
An Estimation and Analysis Framework for the Rasch Model
The Rasch model is widely used for item response analysis in applications
ranging from recommender systems to psychology, education, and finance. While a
number of estimators have been proposed for the Rasch model over the last
decades, the available analytical performance guarantees are mostly asymptotic.
This paper provides a framework that relies on a novel linear minimum
mean-squared error (L-MMSE) estimator which enables an exact, nonasymptotic,
and closed-form analysis of the parameter estimation error under the Rasch
model. The proposed framework provides guidelines on the number of items and
responses required to attain low estimation errors in tests or surveys. We
furthermore demonstrate its efficacy on a number of real-world collaborative
filtering datasets, which reveals that the proposed L-MMSE estimator performs
on par with state-of-the-art nonlinear estimators in terms of predictive
performance.Comment: To be presented at ICML 201
- …