504 research outputs found

    Arrow R-CNN for handwritten diagram recognition

    Get PDF

    Sketch recognition of digital ink diagrams : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    Figures are either re-used with permission, or abstracted with permission from the source article.Sketch recognition of digital ink diagrams is the process of automatically identifying hand-drawn elements in a diagram. This research focuses on the simultaneous grouping and recognition of shapes in digital ink diagrams. In order to recognise a shape, we need to group strokes belonging to a shape, however, strokes cannot be grouped until the shape is identified. Therefore, we treat grouping and recognition as a simultaneous task. Our grouping technique uses spatial proximity to hypothesise shape candidates. Many of the hypothesised shape candidates are invalid, therefore we need a way to reject them. We present a novel rejection technique based on novelty detection. The rejection method uses proximity measures to validate a shape candidate. In addition, we investigate on improving the accuracy of the current shape recogniser by adding extra features. We also present a novel connector recognition system that localises connector heads around recognised shapes. We perform a full comparative study on two datasets. The results show that our approach is significantly more accurate in finding shapes and faster on process diagram compared to Stahovich et al. (2014), which the results show the superiority of our approach in terms of computation time and accuracy. Furthermore, we evaluate our system on two public datasets and compare our results with other approaches reported in the literature that have used these dataset. The results show that our approach is more accurate in finding and recognising the shapes in the FC dataset (by finding and recognising 91.7% of the shapes) compared to the reported results in the literature

    Phrasing Bimanual Interaction for Visual Design

    Get PDF
    Architects and other visual thinkers create external representations of their ideas to support early-stage design. They compose visual imagery with sketching to form abstract diagrams as representations. When working with digital media, they apply various visual operations to transform representations, often engaging in complex sequences. This research investigates how to build interactive capabilities to support designers in putting together, that is phrasing, sequences of operations using both hands. In particular, we examine how phrasing interactions with pen and multi-touch input can support modal switching among different visual operations that in many commercial design tools require using menus and tool palettes—techniques originally designed for the mouse, not pen and touch. We develop an interactive bimanual pen+touch diagramming environment and study its use in landscape architecture design studio education. We observe interesting forms of interaction that emerge, and how our bimanual interaction techniques support visual design processes. Based on the needs of architects, we develop LayerFish, a new bimanual technique for layering overlapping content. We conduct a controlled experiment to evaluate its efficacy. We explore the use of wearables to identify which user, and distinguish what hand, is touching to support phrasing together direct-touch interactions on large displays. From design and development of the environment and both field and controlled studies, we derive a set methods, based upon human bimanual specialization theory, for phrasing modal operations through bimanual interactions without menus or tool palettes

    Recognizing hand-drawn diagrams in images

    Full text link
    Diagrams are an essential tool in any organization. They are used to create conceptual models of anything ranging from business processes to software architectures. Despite the abundance of diagram modeling tools available, the creation of conceptual models often starts by sketching on a whiteboard or paper. However, starting with a hand-drawn diagram introduces the need to eventually digitize it, so that it can be further edited in modeling tools. To reduce the effort associated with the manual digitization of diagrams, research in hand-drawn diagram recognition aims to automate this task. While there is a large body of methods for recognizing diagrams drawn on tablets, there is a notable gap for recognizing diagrams sketched on paper or whiteboard. To close this research gap, this doctoral thesis addresses the problem of recognizing hand-drawn diagrams in images. In particular, it provides the following five main contributions. First, we collect and publish a dataset of business process diagrams sketched on paper. Given that the dataset originates from conceptual modeling tasks solved by 107 participants, it has a high degree of diversity, as reflected in various drawing styles, paper types, pens, and image-capturing methods. Second, we provide an overview of the challenges in recognizing conceptual diagrams sketched on paper. We find that conceptual modeling leads to diagrams with chaotic layouts, making the recognition of edges and labels especially challenging. Third, we propose an end-to-end system for recognizing diagrams modeled with BPMN, the standard language for modeling business processes. Given an image of a hand-drawn BPMN diagram, our system produces a BPMN XML file that can be imported into process modeling tools. The system consists of an object detection neural network, which we extend with network components for recognizing edges and labels. The following two contributions are related to these components. Fourth, we present several deep learning methods for edge recognition, which recognize the drawn path and connected shapes of each arrow. Last, we describe a label recognition method that consists of three steps, one of which features a network that predicts whether a label belongs to a specific shape or edge. To demonstrate the performance of the proposed methods, we evaluate them on both our collected and the existing diagram datasets

    User-directed sketch interpretation

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 91-92).I present a novel approach to creating structured diagrams (such as flow charts and object diagrams) by combining an off-line sketch recognition system with the user interface of a traditional structured graphics editor. The system, called UDSI (user-directed sketch interpretation), aims to provide drawing freedom by allowing the user to sketch entirely off-line using a pure pen-and-paper interface. The results of the drawing can then be presented to UDSI, which recognizes shapes and lines and text areas that the user can then polish as desired. The system can infer multiple interpretations for a given sketch, to aid during the user's polishing stage. The UDSI program offers three novel features. First, it implements a greedy algorithm for determing alternative interpretations of the user's original pen drawing. Second, it introduces a user interface for selecting from these multiple candidate interpretations. Third, it implements a circle recognizer using a novel circle-detection algorithm and combines it with other hand-coded recognizers to provide a robust sketch recognition system.by Matthew J. Notowidigdo.M.Eng

    Rethinking Pen Input Interaction: Enabling Freehand Sketching Through Improved Primitive Recognition

    Get PDF
    Online sketch recognition uses machine learning and artificial intelligence techniques to interpret markings made by users via an electronic stylus or pen. The goal of sketch recognition is to understand the intention and meaning of a particular user's drawing. Diagramming applications have been the primary beneficiaries of sketch recognition technology, as it is commonplace for the users of these tools to rst create a rough sketch of a diagram on paper before translating it into a machine understandable model, using computer-aided design tools, which can then be used to perform simulations or other meaningful tasks. Traditional methods for performing sketch recognition can be broken down into three distinct categories: appearance-based, gesture-based, and geometric-based. Although each approach has its advantages and disadvantages, geometric-based methods have proven to be the most generalizable for multi-domain recognition. Tools, such as the LADDER symbol description language, have shown to be capable of recognizing sketches from over 30 different domains using generalizable, geometric techniques. The LADDER system is limited, however, in the fact that it uses a low-level recognizer that supports only a few primitive shapes, the building blocks for describing higher-level symbols. Systems which support a larger number of primitive shapes have been shown to have questionable accuracies as the number of primitives increase, or they place constraints on how users must input shapes (e.g. circles can only be drawn in a clockwise motion; rectangles must be drawn starting at the top-left corner). This dissertation allows for a significant growth in the possibility of free-sketch recognition systems, those which place little to no drawing constraints on users. In this dissertation, we describe multiple techniques to recognize upwards of 18 primitive shapes while maintaining high accuracy. We also provide methods for producing confidence values and generating multiple interpretations, and explore the difficulties of recognizing multi-stroke primitives. In addition, we show the need for a standardized data repository for sketch recognition algorithm testing and propose SOUSA (sketch-based online user study application), our online system for performing and sharing user study sketch data. Finally, we will show how the principles we have learned through our work extend to other domains, including activity recognition using trained hand posture cues

    Automatic interpretation of clock drawings for computerised assessment of dementia

    Get PDF
    The clock drawing test (CDT) is a standard neurological test for detection of cognitive impairment. A computerised version of the test has potential to improve test accessibility and accuracy. CDT sketch interpretation is one of the first stages in the analysis of the computerised test. It produces a set of recognised digits and symbols together with their positions on the clock face. Subsequently, these are used in the test scoring. This is a challenging problem because the average CDT taker has a high likelihood of cognitive impairment, and writing is one of the first functional activities to be affected. Current interpretation systems perform less well on this kind of data due to its unintelligibility. In this thesis, a novel automatic interpretation system for CDT sketch is proposed and developed. The proposed interpretation system and all the related algorithms developed in this thesis are evaluated using a CDT data set collected for this study. This data consist of two sets, the first set consisting of 65 drawings made by healthy people, and the second consisting of 100 drawings reproduced from drawings of dementia patients. This thesis has four main contributions. The first is a conceptual model of the proposed CDT sketch interpretation system based on integrating prior knowledge of the expected CDT sketch structure and human reasoning into the drawing interpretation system. The second is a novel CDT sketch segmentation algorithm based on supervised machine learning and a new set of temporal and spatial features automatically extracted from the CDT data. The evaluation of the proposed method shows that it outperforms the current state-of-the-art method for CDT drawing segmentation. The third contribution is a new v handwritten digit recognition algorithm based on a set of static and dynamic features extracted from handwritten data. The algorithm combines two classifiers, fuzzy k-nearest neighbour’s classifier with a Convolutional Neural Network (CNN), which take advantage both of static and dynamic data representation. The proposed digit recognition algorithm is shown to outperform each classifier individually in terms of recognition accuracy. The final contribution of this study is the probabilistic Situational Bayesian Network (SBN), which is a new hierarchical probabilistic model for addressing the problem of fusing diverse data sources, such as CDT sketches created by healthy volunteers and dementia patients, in a probabilistic Bayesian network. The evaluation of the proposed SBN-based CDT sketch interpretation system on CDT data shows highly promising results, with 100% recognition accuracy for heathy CDT drawings and 97.15% for dementia data. To conclude, the proposed automatic CDT sketch interpretation system shows high accuracy in terms of recognising different sketch objects and thus paves the way for further research in dementia and clinical computer-assisted diagnosis of dementia

    A Formal Framework for Strategic Representations and Conceptual Reorganization

    Get PDF
    In this paper, we introduce a formal language for modeling the structure of strategic representations and operations that conceptualize change on basis of them. Strategic representations are lower dimensional representations of the world, that underlie the understanding of what business environments are, how they may change, and attempts to shape them. We start from discussing known strategic representations like Porter's five forces model or the strategy canvas. We elicit the conceptual structure underlying these representations by capturing them in our formal language. We demonstrate that our formal language can express operations of conceptual change of strategies such as stretching (the extension of value ranges), lifting (deleting dimensions), extending (adding dimensions), amalgamation (enabling new combinations of features by amalgamating different domains), and transferring structure (exploring analogies). These operations can be the basis for strategizing: for seeing possible reorganizations of strategies and even to become aware of new opportunities. We apply these operations to explain classical business cases, including a detailed study of the conceptual structure underling Steve Jobs' digital hub concept. Our formal language is, to our knowledge, the first attempt to capture the variety of conceptual operations underlying strategic change using one comprehensive mode
    corecore