3 research outputs found
Object-Proposal Evaluation Protocol is 'Gameable'
Object proposals have quickly become the de-facto pre-processing step in a
number of vision pipelines (for object detection, object discovery, and other
tasks). Their performance is usually evaluated on partially annotated datasets.
In this paper, we argue that the choice of using a partially annotated dataset
for evaluation of object proposals is problematic -- as we demonstrate via a
thought experiment, the evaluation protocol is 'gameable', in the sense that
progress under this protocol does not necessarily correspond to a "better"
category independent object proposal algorithm.
To alleviate this problem, we: (1) Introduce a nearly-fully annotated version
of PASCAL VOC dataset, which serves as a test-bed to check if object proposal
techniques are overfitting to a particular list of categories. (2) Perform an
exhaustive evaluation of object proposal methods on our introduced nearly-fully
annotated PASCAL dataset and perform cross-dataset generalization experiments;
and (3) Introduce a diagnostic experiment to detect the bias capacity in an
object proposal algorithm. This tool circumvents the need to collect a densely
annotated dataset, which can be expensive and cumbersome to collect. Finally,
we plan to release an easy-to-use toolbox which combines various publicly
available implementations of object proposal algorithms which standardizes the
proposal generation and evaluation so that new methods can be added and
evaluated on different datasets. We hope that the results presented in the
paper will motivate the community to test the category independence of various
object proposal methods by carefully choosing the evaluation protocol.Comment: 15 pages, 11 figures, 4 table
PaLM 2 Technical Report
We introduce PaLM 2, a new state-of-the-art language model that has better
multilingual and reasoning capabilities and is more compute-efficient than its
predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture
of objectives. Through extensive evaluations on English and multilingual
language, and reasoning tasks, we demonstrate that PaLM 2 has significantly
improved quality on downstream tasks across different model sizes, while
simultaneously exhibiting faster and more efficient inference compared to PaLM.
This improved efficiency enables broader deployment while also allowing the
model to respond faster, for a more natural pace of interaction. PaLM 2
demonstrates robust reasoning capabilities exemplified by large improvements
over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable
performance on a suite of responsible AI evaluations, and enables
inference-time control over toxicity without additional overhead or impact on
other capabilities. Overall, PaLM 2 achieves state-of-the-art performance
across a diverse set of tasks and capabilities.
When discussing the PaLM 2 family, it is important to distinguish between
pre-trained models (of various sizes), fine-tuned variants of these models, and
the user-facing products that use these models. In particular, user-facing
products typically include additional pre- and post-processing steps.
Additionally, the underlying models may evolve over time. Therefore, one should
not expect the performance of user-facing products to exactly match the results
reported in this report