Object recognition and localization are important to automatically interpret video and allow better querying
on its content. We propose a method for object localization that learns incrementally and addresses four key
aspects. Firstly, we show that for certain applications, recognition is feasible with only a few training samples.
Secondly, we show that novel objects can be added incrementally without retraining existing objects, which is
important for fast interaction. Thirdly, we show that an unbalanced number of positive training samples leads
to biased classi er scores that can be corrected by modifying weights. Fourthly, we show that the detector
performance can deteriorate due to hard-negative mining for similar or closely related classes (e.g., for Barbie
and dress, because the doll is wearing a dress). This can be solved by our hierarchical classi cation. We introduce
a new dataset, which we call TOSO, and use it to demonstrate the e ectiveness of the proposed method for the
localization and recognition of multiple objects in images.This research was performed in the GOOSE project, which is jointly funded by the enabling technology program
Adaptive Multi Sensor Networks (AMSN) and the MIST research program of the Dutch Ministry of Defense.
This publication was supported by the research program Making Sense of Big Data (MSoBD).peer-reviewe