26,482 research outputs found
Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval
We summarize math search engines and search interfaces produced by the
Document and Pattern Recognition Lab in recent years, and in particular the min
math search interface and the Tangent search engine. Source code for both
systems are publicly available. "The Masses" refers to our emphasis on creating
systems for mathematical non-experts, who may be looking to define unfamiliar
notation, or browse documents based on the visual appearance of formulae rather
than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer
Mathematics (July, Washington DC
Automated recognition of design patterns for framework understanding
System design is one of the most important tasks in the software development cycles but it is also one of the most complex and time-consuming tasks. Thus, reuse of existing designs becomes very important. Object-oriented frameworks are generic designs for specific application domains that enable the reuse of designs and domain expert experience. In spite of this, frameworks are not simple to reuse because they are difficult to comprehend, mainly due to a lack of good documentation and supporting tools. In this work, an approach to framework comprehension based on the automated recognition and visualization of design patterns is presented. A tool was built to support this approach, by trying to automatically identify and explain the potentia~ patterns existing in a given designo Experimental results and conclusions of tool utilization are also presented
Creating the Perception-based LADDER sketch recognition language
Sketch recognition is automated understanding of hand-drawn diagrams. Current sketch recognition systems exist for only a handful of domains, which contain on the order of 10--20 shapes. Our goal was to create a generalized method for recognition that could work for many domains, increasing the number of shapes that could be recognized in real-time, while maintaining a high accuracy. In an effort to effectively recognize shapes while allowing drawing freedom (both drawing-style freedom and perceptually-valid variations), we created the shape description language modeled after the way people naturally describe shapes to 1) create an intuitive and easy to understand description, providing transparency to the underlying recognition process, and 2) to improve recognition by providing recognition flexibility (drawing freedom) that is aligned with how humans perceive shapes. This paper describes the results of a study performed to see how users naturally describe shapes. A sample of 35 subjects described or drew approximately 16 shapes each. Results show a common vocabulary related to Gestalt grouping and singularities. Results also show that perception, similarity, and context play an important role in how people describe shapes. This study resulted in a language (LADDER) that allows shape recognizers for any domain to be automatically generated from a single hand-drawn example of each shape. Sketch systems for over 30 different domains have been automatically generated based on this language. The largest domain contained 923 distinct shapes, and achieved a recognition accuracy of 83% (and a top-3 accuracy of 87%) on a corpus of over 11,000 sketches, which recognizes almost two orders of magnitude more shapes than any other existing system.National Science Foundation (U.S.) (grant 0757557)National Science Foundation (U.S.) (grant 0943499
Visual Question Answering: A Survey of Methods and Datasets
Visual Question Answering (VQA) is a challenging task that has received
increasing attention from both the computer vision and the natural language
processing communities. Given an image and a question in natural language, it
requires reasoning over visual elements of the image and general knowledge to
infer the correct answer. In the first part of this survey, we examine the
state of the art by comparing modern approaches to the problem. We classify
methods by their mechanism to connect the visual and textual modalities. In
particular, we examine the common approach of combining convolutional and
recurrent neural networks to map images and questions to a common feature
space. We also discuss memory-augmented and modular architectures that
interface with structured knowledge bases. In the second part of this survey,
we review the datasets available for training and evaluating VQA systems. The
various datatsets contain questions at different levels of complexity, which
require different capabilities and types of reasoning. We examine in depth the
question/answer pairs from the Visual Genome project, and evaluate the
relevance of the structured annotations of images with scene graphs for VQA.
Finally, we discuss promising future directions for the field, in particular
the connection to structured knowledge bases and the use of natural language
processing models.Comment: 25 page
The Larch Environment - Python programs as visual, interactive literature
The Larch Environment' is designed for the creation of programs that take the
form of interactive technical literature. We introduce a novel approach to combined
textual and visual programming by allowing visual, interactive objects
to be embedded within textual source code, and segments of source code to be
further embedded within those objects. We retain the strengths of text-based
source code, while enabling visual programming where it is bene�cial. Additionally,
embedded objects and code provide a simple object-oriented approach
to extending the syntax of a language, in a similar fashion to LISP macros. We
provide a rapid prototyping and experimentation environment in the form of
an active document system which mixes rich text with executable source code.
Larch is supported by a simple type coercion based presentation protocol that
displays normal Java and Python objects in a visual, interactive form. The
ability to freely combine objects and source code within one another allows for
the construction of rich interactive documents and experimentation with novel
programming language extensions
- …