26,482 research outputs found

    Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval

    Full text link
    We summarize math search engines and search interfaces produced by the Document and Pattern Recognition Lab in recent years, and in particular the min math search interface and the Tangent search engine. Source code for both systems are publicly available. "The Masses" refers to our emphasis on creating systems for mathematical non-experts, who may be looking to define unfamiliar notation, or browse documents based on the visual appearance of formulae rather than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer Mathematics (July, Washington DC

    Automated recognition of design patterns for framework understanding

    Get PDF
    System design is one of the most important tasks in the software development cycles but it is also one of the most complex and time-consuming tasks. Thus, reuse of existing designs becomes very important. Object-oriented frameworks are generic designs for specific application domains that enable the reuse of designs and domain expert experience. In spite of this, frameworks are not simple to reuse because they are difficult to comprehend, mainly due to a lack of good documentation and supporting tools. In this work, an approach to framework comprehension based on the automated recognition and visualization of design patterns is presented. A tool was built to support this approach, by trying to automatically identify and explain the potentia~ patterns existing in a given designo Experimental results and conclusions of tool utilization are also presented

    Creating the Perception-based LADDER sketch recognition language

    Get PDF
    Sketch recognition is automated understanding of hand-drawn diagrams. Current sketch recognition systems exist for only a handful of domains, which contain on the order of 10--20 shapes. Our goal was to create a generalized method for recognition that could work for many domains, increasing the number of shapes that could be recognized in real-time, while maintaining a high accuracy. In an effort to effectively recognize shapes while allowing drawing freedom (both drawing-style freedom and perceptually-valid variations), we created the shape description language modeled after the way people naturally describe shapes to 1) create an intuitive and easy to understand description, providing transparency to the underlying recognition process, and 2) to improve recognition by providing recognition flexibility (drawing freedom) that is aligned with how humans perceive shapes. This paper describes the results of a study performed to see how users naturally describe shapes. A sample of 35 subjects described or drew approximately 16 shapes each. Results show a common vocabulary related to Gestalt grouping and singularities. Results also show that perception, similarity, and context play an important role in how people describe shapes. This study resulted in a language (LADDER) that allows shape recognizers for any domain to be automatically generated from a single hand-drawn example of each shape. Sketch systems for over 30 different domains have been automatically generated based on this language. The largest domain contained 923 distinct shapes, and achieved a recognition accuracy of 83% (and a top-3 accuracy of 87%) on a corpus of over 11,000 sketches, which recognizes almost two orders of magnitude more shapes than any other existing system.National Science Foundation (U.S.) (grant 0757557)National Science Foundation (U.S.) (grant 0943499

    Visual Question Answering: A Survey of Methods and Datasets

    Full text link
    Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires reasoning over visual elements of the image and general knowledge to infer the correct answer. In the first part of this survey, we examine the state of the art by comparing modern approaches to the problem. We classify methods by their mechanism to connect the visual and textual modalities. In particular, we examine the common approach of combining convolutional and recurrent neural networks to map images and questions to a common feature space. We also discuss memory-augmented and modular architectures that interface with structured knowledge bases. In the second part of this survey, we review the datasets available for training and evaluating VQA systems. The various datatsets contain questions at different levels of complexity, which require different capabilities and types of reasoning. We examine in depth the question/answer pairs from the Visual Genome project, and evaluate the relevance of the structured annotations of images with scene graphs for VQA. Finally, we discuss promising future directions for the field, in particular the connection to structured knowledge bases and the use of natural language processing models.Comment: 25 page

    The Larch Environment - Python programs as visual, interactive literature

    Get PDF
    The Larch Environment' is designed for the creation of programs that take the form of interactive technical literature. We introduce a novel approach to combined textual and visual programming by allowing visual, interactive objects to be embedded within textual source code, and segments of source code to be further embedded within those objects. We retain the strengths of text-based source code, while enabling visual programming where it is bene�cial. Additionally, embedded objects and code provide a simple object-oriented approach to extending the syntax of a language, in a similar fashion to LISP macros. We provide a rapid prototyping and experimentation environment in the form of an active document system which mixes rich text with executable source code. Larch is supported by a simple type coercion based presentation protocol that displays normal Java and Python objects in a visual, interactive form. The ability to freely combine objects and source code within one another allows for the construction of rich interactive documents and experimentation with novel programming language extensions
    corecore