2 research outputs found

    Natural language image description: data, models, and evaluation

    Get PDF
    Automatically describing an image with a concise natural language description is an ambitious and emerging task bringing together the Natural Language and Computer Vision communities. With any emerging task, the necessary groundwork developing appropriate datasets, strong baseline models, and evaluation frameworks is key. In this thesis, we introduce the rst large datasets speci cally designed with image description in mind, focusing on concrete descriptions that can be gleaned from the image alone. Furthermore, we develop strong baseline models that show the need to model language beyond a simple bag-of-words approach to increase performance. Most importantly, we introduce a ranking based framework for comparing image description models. We show that this framework is more reliable and accurate than the conventional wisdom of evaluating on novel model generated text. As this task has gained popularity recently, we further analyze the drawbacks of current evaluation methods, and put forth concrete extensions to our ranking framework that will guide progress towards modeling the association of natural language and the images the language describes

    Structured prediction with indirect supervision

    Get PDF
    Structured tasks, which often involve many interdependent decisions for each example, are the backbone for many important applications such as natural language processing tasks. The models built for structured tasks need to be capable of assigning values to a set of interdependent variables. In this thesis, we point out that the strong dependencies between the decisions in structured tasks can be exploited to simplify both the learning task and the annotation effort --- it is sometimes possible to supply partial and indirect supervision to only some of the target variables or to other variables that are derivatives of the target variables and thus reduce the supervision effort significantly. Based on this intuition, this thesis addresses the problem of reducing the cost of labeling for structural tasks. We tackle this problem by developing advanced machine learning algorithms that can learn and generalize from indirect supervision in addition to labeled data. Indirect supervision can come in the form of constraints or weaker supervision signals. Our proposed learning frameworks can handle both structured output problems and problems with latent structures. We demonstrate the effectiveness of the learning with indirect supervision framework for many natural language processing tasks
    corecore