Search CORE

148,065 research outputs found

AI2D-RST : A multimodal corpus of 1000 primary school science diagrams

Author: Alikhani Malihe
Bateman John A.
Haverinen Jonas
Hiippala Tuomo
Kalliokoski Timo
Logacheva Evanfiya
Orekhova Serafina
Stone Matthew
Tuomainen Aino
Publication venue
Publication date: 20/03/2020
Field of study

This article introduces AI2D-RST, a multimodal corpus of 1000 English-language diagrams that represent topics in primary school natural sciences, such as food webs, life cycles, moon phases and human physiology. The corpus is based on the Allen Institute for Artificial Intelligence Diagrams (AI2D) dataset, a collection of diagrams with crowdsourced descriptions, which was originally developed to support research on automatic diagram understanding and visual question answering. Building on the segmentation of diagram layouts in AI2D, the AI2D-RST corpus presents a new multi-layer annotation schema that provides a rich description of their multimodal structure. Annotated by trained experts, the layers describe (1) the grouping of diagram elements into perceptual units, (2) the connections set up by diagrammatic elements such as arrows and lines, and (3) the discourse relations between diagram elements, which are described using Rhetorical Structure Theory (RST). Each annotation layer in AI2D-RST is represented using a graph. The corpus is freely available for research and teaching.Peer reviewe

arXiv.org e-Print Archive

Helsingin yliopiston digitaalinen arkisto

A topic-oriented syntactic component extraction model for social media

Author: Luo T
Pan R
Xu G
Xu Y
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Topic-oriented understanding is to extract information from various language instances, which reflects the characteristics or trends of semantic information related to the topic via statistical analysis. The syntax analysis and modeling is the basis of such work. Traditional syntactic formalization approaches widely used in natural language understanding could not be simply applied to the text modeling in the context of topic-oriented understanding. In this paper, we review the information extraction mode, and summarize its inherent relationship with the "Subject- Predicate" syntactic structure in Aryan language. And we propose a syntactic element extraction model based on the "topic-description" structure, which contains six kinds of core elements, satisfying the desired requirement for topic-oriented understanding. This paper also describes the model composition, the theoretical framework of understanding process, the extraction method of syntactic components, and the prototype system of generating syntax diagrams. The proposed model is evaluated on the Reuters 21578 and SocialCom2009 data sets, and the results show that the recall and precision of syntactic component extraction are up to 93.9% and 88%, respectively, which further justifies the feasibility of generating syntactic component through the word dependencies. © 2012 Springer Science+Business Media

OPUS - University of Technology Sydney

VBN

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

Author: Chang Ming-Wei
Eisenschlos Julian
Hu Hexiang
Joshi Mandar
Khandelwal Urvashi
Lee Kenton
Liu Fangyu
Shaw Peter
Toutanova Kristina
Turc Iulia
Publication venue
Publication date: 07/10/2022
Field of study

Visually-situated language is ubiquitous -- sources range from textbooks with diagrams to web pages with images and tables, to mobile apps with buttons and forms. Perhaps due to this diversity, previous work has typically relied on domain-specific recipes with limited sharing of the underlying data, model architectures, and objectives. We present Pix2Struct, a pretrained image-to-text model for purely visual language understanding, which can be finetuned on tasks containing visually-situated language. Pix2Struct is pretrained by learning to parse masked screenshots of web pages into simplified HTML. The web, with its richness of visual elements cleanly reflected in the HTML structure, provides a large source of pretraining data well suited to the diversity of downstream tasks. Intuitively, this objective subsumes common pretraining signals such as OCR, language modeling, image captioning. In addition to the novel pretraining strategy, we introduce a variable-resolution input representation and a more flexible integration of language and vision inputs, where language prompts such as questions are rendered directly on top of the input image. For the first time, we show that a single pretrained model can achieve state-of-the-art results in six out of nine tasks across four domains: documents, illustrations, user interfaces, and natural images

arXiv.org e-Print Archive

A Diagram Is Worth A Dozen Images

Author: B Alexe
CL Zitnick
F Pedregosa
J von Engelhardt
JRR Uijlings
M Twyman
R Horn
R Koncel-Kedziorski
RK Srihari
RW Ferguson
S Antol
S Hochreiter
SC Zhu
SK Card
Publication venue
Publication date: 23/03/2016
Field of study

Diagrams are common tools for representing complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Understanding natural images has been extensively studied in computer vision, while diagram understanding has received little attention. In this paper, we study the problem of diagram interpretation and reasoning, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. We introduce Diagram Parse Graphs (DPG) as our representation to model the structure of diagrams. We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering. We devise an LSTM-based method for syntactic parsing of diagrams and introduce a DPG-based attention model for diagram question answering. We compile a new dataset of diagrams with exhaustive annotations of constituents and relationships for over 5,000 diagrams and 15,000 questions and answers. Our results show the significance of our models for syntactic parsing and question answering in diagrams using DPGs

arXiv.org e-Print Archive

Crossref

Generating natural language specifications from UML class diagrams

Author: A Abbott
AV Gervasi
CL Heitmeyer
E Brill
E Goldberg
Farid Meziane
G Booch
HM Harmain
K Walden
L Goldin
L Mich
MD Lubars
Nikos Athanasakis
P Martin-Löf
PPS Chen
PPS Chen
Sophia Ananiadou
SW Ambler
W Ahrendt
WC Mann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Early phases of software development are known to be problematic, difficult to manage and errors occurring during these phases are expensive to correct. Many systems have been developed to aid the transition from informal Natural Language requirements to semistructured or formal specifications. Furthermore, consistency checking is seen by many software engineers as the solution to reduce the number of errors occurring during the software development life cycle and allow early verification and validation of software systems. However, this is confined to the models developed during analysis and design and fails to include the early Natural Language requirements. This excludes proper user involvement and creates a gap between the original requirements and the updated and modified models and implementations of the system. To improve this process, we propose a system that generates Natural Language specifications from UML class diagrams. We first investigate the variation of the input language used in naming the components of a class diagram based on the study of a large number of examples from the literature and then develop rules for removing ambiguities in the subset of Natural Language used within UML. We use WordNet,a linguistic ontology, to disambiguate the lexical structures of the UML string names and generate semantically sound sentences. Our system is developed in Java and is tested on an independent though academic case study

CiteSeerX

University of Salford Institutional Repository

Crossref

The University of Manchester - Institutional Repository

The influence of conceptual user models on the creation and interpretation of diagrams representing reactive systems

Author: Tabachneck-Schijf H.J.M.
Verpoorten J.H.
Weg R.L.W. van de
Wieringa R.J.
Publication venue: Open Institute of Knowledge
Publication date: 01/01/2006
Field of study

In system design, many diagrams of many different types are used. Diagrams communicate design aspects between members of the development team, and between these experts and the non-expert customers and future users. Mastering the creation of diagrams is often a challenging task, judging by particular errors persistently found in diagrams created by undergraduate computer science students. We assume a possible misalignment between human perception and cognition on the one hand and the diagrams’ structure and syntax on the other. This article presents the results of an investigation of such a misalignment. We focus on the deployment of so-called 'conceptual user models' (mental models, created by users in their mind) at the creation of diagrams. We propose a taxonomy for mental mappings, used for categorization of representations. We describe an experiment where naive and novice subjects created one or several diagrams of a familiar task. We use our taxonomy for analysing these diagrams, both for the represented task structure and the symbols used. The results indeed show a mismatch between mental models and currently used diagram techniques

University of Twente Research Information

Utrecht University Repository

What is a logical diagram?

Author: Legg Catherine
Publication venue
Publication date: 01/01/2011
Field of study

Robert Brandom’s expressivism argues that not all semantic content may be made fully explicit. This view connects in interesting ways with recent movements in philosophy of mathematics and logic (e.g. Brown, Shin, Giaquinto) to take diagrams seriously - as more than a mere “heuristic aid” to proof, but either proofs themselves, or irreducible components of such. However what exactly is a diagram in logic? Does this constitute a semiotic natural kind? The paper will argue that such a natural kind does exist in Charles Peirce’s conception of iconic signs, but that fully understood, logical diagrams involve a structured array of normative reasoning practices, as well as just a “picture on a page”

Deakin Research Online

Research Commons@Waikato

Layer by layer - Combining Monads

Author: A Balan
A Kock
B Jacobs
B Jacobs
B Klin
DJ King
E Manes
E Moggi
G Plotkin
J Beck
M Hyland
M Hyland
N Benton
N Foster
N Gautam
S Awodey
S Liang
S Mac Lane
S Milius
T Sato
Publication venue
Publication date: 05/10/2018
Field of study

We develop a method to incrementally construct programming languages. Our approach is categorical: each layer of the language is described as a monad. Our method either (i) concretely builds a distributive law between two monads, i.e. layers of the language, which then provides a monad structure to the composition of layers, or (ii) identifies precisely the algebraic obstacles to the existence of a distributive law and gives a best approximant language. The running example will involve three layers: a basic imperative language enriched first by adding non-determinism and then probabilistic choice. The first extension works seamlessly, but the second encounters an obstacle, which results in a best approximant language structurally very similar to the probabilistic network specification language ProbNetKAT

arXiv.org e-Print Archive

Crossref

UCL Discovery