2 research outputs found
Natural language image description: data, models, and evaluation
Automatically describing an image with a concise natural language description
is an ambitious and emerging task bringing together the Natural Language
and Computer Vision communities. With any emerging task, the
necessary groundwork developing appropriate datasets, strong baseline models,
and evaluation frameworks is key. In this thesis, we introduce the rst
large datasets speci cally designed with image description in mind, focusing
on concrete descriptions that can be gleaned from the image alone. Furthermore,
we develop strong baseline models that show the need to model
language beyond a simple bag-of-words approach to increase performance.
Most importantly, we introduce a ranking based framework for comparing
image description models. We show that this framework is more reliable and
accurate than the conventional wisdom of evaluating on novel model generated
text. As this task has gained popularity recently, we further analyze
the drawbacks of current evaluation methods, and put forth concrete extensions
to our ranking framework that will guide progress towards modeling
the association of natural language and the images the language describes
Structured prediction with indirect supervision
Structured tasks, which often involve many interdependent decisions for each example, are the backbone for many important applications such as natural language processing tasks. The models built for structured tasks need to be capable of assigning values to a set of interdependent variables. In this thesis, we point out that the strong dependencies between the decisions in structured tasks can be exploited to simplify both the learning task and the annotation effort --- it is sometimes possible to supply partial and indirect supervision to only some of the target variables or to other variables that are derivatives of the target variables and thus reduce the supervision effort significantly.
Based on this intuition, this thesis addresses the problem of reducing the cost of labeling for structural tasks. We tackle this problem by developing advanced machine learning algorithms that can learn and generalize from indirect supervision in addition to labeled data. Indirect supervision can come in the form of constraints or weaker supervision signals. Our proposed learning frameworks can handle both structured output problems and problems with latent structures. We demonstrate the effectiveness of the learning with indirect supervision framework for many natural language processing tasks