58,963 research outputs found

    Probabilistic approaches for modeling text structure and their application to text-to-text generation

    Get PDF
    Since the early days of generation research, it has been acknowledged that modeling the global structure of a document is crucial for producing coherent, readable output. However, traditional knowledge-intensive approaches have been of limited utility in addressing this problem since they cannot be effectively scaled to operate in domain-independent, large-scale applications. Due to this difficulty, existing text-to-text generation systems rarely rely on such structural information when producing an output text. Consequently, texts generated by these methods do not match the quality of those written by humans – they are often fraught with severe coherence violations and disfluencies. In this chapter, I will present probabilistic models of document structure that can be effectively learned from raw document collections. This feature distinguishes these new models from traditional knowledge intensive approaches used in symbolic concept-to-text generation. Our results demonstrate that these probabilistic models can be directly applied to content organization, and suggest that these models can prove useful in an even broader range of text-to-text applications than we have considered here.National Science Foundation (U.S.) (CAREER grant IIS- 0448168)Microsoft Research. New Faculty Fellowshi

    From media crossing to media mining

    Get PDF
    This paper reviews how the concept of Media Crossing has contributed to the advancement of the application domain of information access and explores directions for a future research agenda. These will include themes that could help to broaden the scope and to incorporate the concept of medium-crossing in a more general approach that not only uses combinations of medium-specific processing, but that also exploits more abstract medium-independent representations, partly based on the foundational work on statistical language models for information retrieval. Three examples of successful applications of media crossing will be presented, with a focus on the aspects that could be considered a first step towards a generalized form of media mining

    Joint Video and Text Parsing for Understanding Events and Answering Queries

    Full text link
    We propose a framework for parsing video and text jointly for understanding events and answering user queries. Our framework produces a parse graph that represents the compositional structures of spatial information (objects and scenes), temporal information (actions and events) and causal information (causalities between events and fluents) in the video and text. The knowledge representation of our framework is based on a spatial-temporal-causal And-Or graph (S/T/C-AOG), which jointly models possible hierarchical compositions of objects, scenes and events as well as their interactions and mutual contexts, and specifies the prior probabilistic distribution of the parse graphs. We present a probabilistic generative model for joint parsing that captures the relations between the input video/text, their corresponding parse graphs and the joint parse graph. Based on the probabilistic model, we propose a joint parsing system consisting of three modules: video parsing, text parsing and joint inference. Video parsing and text parsing produce two parse graphs from the input video and text respectively. The joint inference module produces a joint parse graph by performing matching, deduction and revision on the video and text parse graphs. The proposed framework has the following objectives: Firstly, we aim at deep semantic parsing of video and text that goes beyond the traditional bag-of-words approaches; Secondly, we perform parsing and reasoning across the spatial, temporal and causal dimensions based on the joint S/T/C-AOG representation; Thirdly, we show that deep joint parsing facilitates subsequent applications such as generating narrative text descriptions and answering queries in the forms of who, what, when, where and why. We empirically evaluated our system based on comparison against ground-truth as well as accuracy of query answering and obtained satisfactory results
    corecore