148,065 research outputs found

    AI2D-RST : A multimodal corpus of 1000 primary school science diagrams

    Get PDF
    This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural sciences, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowdsourced descriptions, which was originally developed to support research on automatic diagram understanding and visual question answering. Building on the segmentation of diagram layouts in AI2D, the AI2D-RST corpus presents a new multi-layer annotation schema that provides a rich description of their multimodal structure. Annotated by trained experts, the layers describe (1) the grouping of diagram elements into perceptual units, (2) the connections set up by diagrammatic elements such as arrows and lines, and (3) the discourse relations between diagram elements, which are described using Rhetorical Structure Theory (RST). Each annotation layer in AI2D-RST is represented using a graph. The corpus is freely available for research and teaching.Peer reviewe

    A topic-oriented syntactic component extraction model for social media

    Full text link
    Topic-oriented understanding is to extract information from various language instances, which reflects the characteristics or trends of semantic information related to the topic via statistical analysis. The syntax analysis and modeling is the basis of such work. Traditional syntactic formalization approaches widely used in natural language understanding could not be simply applied to the text modeling in the context of topic-oriented understanding. In this paper, we review the information extraction mode, and summarize its inherent relationship with the "Subject- Predicate" syntactic structure in Aryan language. And we propose a syntactic element extraction model based on the "topic-description" structure, which contains six kinds of core elements, satisfying the desired requirement for topic-oriented understanding. This paper also describes the model composition, the theoretical framework of understanding process, the extraction method of syntactic components, and the prototype system of generating syntax diagrams. The proposed model is evaluated on the Reuters 21578 and SocialCom2009 data sets, and the results show that the recall and precision of syntactic component extraction are up to 93.9% and 88%, respectively, which further justifies the feasibility of generating syntactic component through the word dependencies. © 2012 Springer Science+Business Media

    Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

    Full text link
    Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to this diversity, previous work has typically relied on domain-specific recipes with limited sharing of the underlying data, model architectures, and objectives. We present Pix2Struct, a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct is pretrained by learning to parse masked screenshots of web pages into simplified HTML. The web, with its richness of visual elements cleanly reflected in the HTML structure, provides a large source of pretraining data well suited to the diversity of downstream tasks. Intuitively, this objective subsumes common pretraining signals such as OCR, language modeling, image captioning. In addition to the novel pretraining strategy, we introduce a variable-resolution input representation and a more flexible integration of language and vision inputs, where language prompts such as questions are rendered directly on top of the input image. For the first time, we show that a single pretrained model can achieve state-of-the-art results in six out of nine tasks across four domains: documents, illustrations, user interfaces, and natural images

    A Diagram Is Worth A Dozen Images

    Full text link
    Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs

    Generating natural language specifications from UML class diagrams

    Get PDF
    Early phases of software development are known to be problematic, difficult to manage and errors occurring during these phases are expensive to correct. Many systems have been developed to aid the transition from informal Natural Language requirements to semistructured or formal specifications. Furthermore, consistency checking is seen by many software engineers as the solution to reduce the number of errors occurring during the software development life cycle and allow early verification and validation of software systems. However, this is confined to the models developed during analysis and design and fails to include the early Natural Language requirements. This excludes proper user involvement and creates a gap between the original requirements and the updated and modified models and implementations of the system. To improve this process, we propose a system that generates Natural Language specifications from UML class diagrams. We first investigate the variation of the input language used in naming the components of a class diagram based on the study of a large number of examples from the literature and then develop rules for removing ambiguities in the subset of Natural Language used within UML. We use WordNet,a linguistic ontology, to disambiguate the lexical structures of the UML string names and generate semantically sound sentences. Our system is developed in Java and is tested on an independent though academic case study

    The influence of conceptual user models on the creation and interpretation of diagrams representing reactive systems

    Get PDF
    In system design, many diagrams of many different types are used. Diagrams communicate design aspects between members of the development team, and between these experts and the non-expert customers and future users. Mastering the creation of diagrams is often a challenging task, judging by particular errors persistently found in diagrams created by undergraduate computer science students. We assume a possible misalignment between human perception and cognition on the one hand and the diagrams’ structure and syntax on the other. This article presents the results of an investigation of such a misalignment. We focus on the deployment of so-called 'conceptual user models' (mental models, created by users in their mind) at the creation of diagrams. We propose a taxonomy for mental mappings, used for categorization of representations. We describe an experiment where naive and novice subjects created one or several diagrams of a familiar task. We use our taxonomy for analysing these diagrams, both for the represented task structure and the symbols used. The results indeed show a mismatch between mental models and currently used diagram techniques

    What is a logical diagram?

    Get PDF
    Robert Brandom’s expressivism argues that not all semantic content may be made fully explicit. This view connects in interesting ways with recent movements in philosophy of mathematics and logic (e.g. Brown, Shin, Giaquinto) to take diagrams seriously - as more than a mere “heuristic aid” to proof, but either proofs themselves, or irreducible components of such. However what exactly is a diagram in logic? Does this constitute a semiotic natural kind? The paper will argue that such a natural kind does exist in Charles Peirce’s conception of iconic signs, but that fully understood, logical diagrams involve a structured array of normative reasoning practices, as well as just a “picture on a page”

    Layer by layer - Combining Monads

    Full text link
    We develop a method to incrementally construct programming languages. Our approach is categorical: each layer of the language is described as a monad. Our method either (i) concretely builds a distributive law between two monads, i.e. layers of the language, which then provides a monad structure to the composition of layers, or (ii) identifies precisely the algebraic obstacles to the existence of a distributive law and gives a best approximant language. The running example will involve three layers: a basic imperative language enriched first by adding non-determinism and then probabilistic choice. The first extension works seamlessly, but the second encounters an obstacle, which results in a best approximant language structurally very similar to the probabilistic network specification language ProbNetKAT
    corecore