36 research outputs found

    Sketch interpretation using multiscale stochastic models of temporal patterns

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 102-114).Sketching is a natural mode of interaction used in a variety of settings. For example, people sketch during early design and brainstorming sessions to guide the thought process; when we communicate certain ideas, we use sketching as an additional modality to convey ideas that can not be put in words. The emergence of hardware such as PDAs and Tablet PCs has enabled capturing freehand sketches, enabling the routine use of sketching as an additional human-computer interaction modality. But despite the availability of pen based information capture hardware, relatively little effort has been put into developing software capable of understanding and reasoning about sketches. To date, most approaches to sketch recognition have treated sketches as images (i.e., static finished products) and have applied vision algorithms for recognition. However, unlike images, sketches are produced incrementally and interactively, one stroke at a time and their processing should take advantage of this. This thesis explores ways of doing sketch recognition by extracting as much information as possible from temporal patterns that appear during sketching.(cont.) We present a sketch recognition framework based on hierarchical statistical models of temporal patterns. We show that in certain domains, stroke orderings used in the course of drawing individual objects contain temporal patterns that can aid recognition. We build on this work to show how sketch recognition systems can use knowledge of both common stroke orderings and common object orderings. We describe a statistical framework based on Dynamic Bayesian Networks that can learn temporal models of object-level and stroke-level patterns for recognition. Our framework supports multi-object strokes, multi-stroke objects, and allows interspersed drawing of objects - relaxing the assumption that objects are drawn one at a time. Our system also supports real-valued feature representations using a numerically stable recognition algorithm. We present recognition results for hand-drawn electronic circuit diagrams. The results show that modeling temporal patterns at multiple scales provides a significant increase in correct recognition rates, with no added computational penalties.by Tevfik Metin Sezgin.Ph.D

    Multi-domain sketch understanding

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 121-128).by Christine J. Alvarado.Ph.D

    Pen-based Methods For Recognition and Animation of Handwritten Physics Solutions

    Get PDF
    There has been considerable interest in constructing pen-based intelligent tutoring systems due to the natural interaction metaphor and low cognitive load afforded by pen-based interaction. We believe that pen-based intelligent tutoring systems can be further enhanced by integrating animation techniques. In this work, we explore methods for recognizing and animating sketched physics diagrams. Our methodologies enable an Intelligent Tutoring System (ITS) to understand the scenario and requirements posed by a given problem statement and to couple this knowledge with a computational model of the student\u27s handwritten solution. These pieces of information are used to construct meaningful animations and feedback mechanisms that can highlight errors in student solutions. We have constructed a prototype ITS that can recognize mathematics and diagrams in a handwritten solution and infer implicit relationships among diagram elements, mathematics and annotations such as arrows and dotted lines. We use natural language processing to identify the domain of a given problem, and use this information to select one or more of four domain-specific physics simulators to animate the user\u27s sketched diagram. We enable students to use their answers to guide animation behavior and also describe a novel algorithm for checking recognized student solutions. We provide examples of scenarios that can be modeled using our prototype system and discuss the strengths and weaknesses of our current prototype. Additionally, we present the findings of a user study that aimed to identify animation requirements for physics tutoring systems. We describe a taxonomy for categorizing different types of animations for physics problems and highlight how the taxonomy can be used to define requirements for 50 physics problems chosen from a university textbook. We also present a discussion of 56 handwritten solutions acquired from physics students and describe how suitable animations could be constructed for each of them

    An Algorithmic Walk from Static to Dynamic Graph Clustering

    Get PDF

    Perceptually-based language to simplify sketch recognition user interface development

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 473-495).Diagrammatic sketching is a natural modality of human-computer interaction that can be used for a variety of tasks, for example, conceptual design. Sketch recognition systems are currently being developed for many domains. However, they require signal-processing expertise if they are to handle the intricacies of each domain, and they are time-consuming to build. Our goal is to enable user interface designers and domain experts who may not have expertise in sketch recognition to be able to build these sketch systems. We created and implemented a new framework (FLUID - f acilitating user interface development) in which developers can specify a domain description indicating how domain shapes are to be recognized, displayed, and edited. This description is then automatically transformed into a sketch recognition user interface for that domain. LADDER, a language using a perceptual vocabulary based on Gestalt principles, was developed to describe how to recognize, display, and edit domain shapes. A translator and a customizable recognition system (GUILD - a generator of user interfaces using ladder descriptions) are combined with a domain description to automatically create a domain specific recognition system.(cont.) With this new technology, by writing a domain description, developers are able to create a new sketch interface for a domain, greatly reducing the time and expertise for the task Continuing in pursuit of our goal to facilitate UI development, we noted that 1) human generated descriptions contained syntactic and conceptual errors, and that 2) it is more natural for a user to specify a shape by drawing it than by editing text. However, computer generated descriptions from a single drawn example are also flawed, as one cannot express all allowable variations in a single example. In response, we created a modification of the traditional model of active learning in which the system selectively generates its own near-miss examples and uses the human teacher as a source of labels. System generated near-misses offer a number of advantages. Human generated examples are tedious to create and may not expose problems in the current concept. It seems most effective for the near-miss examples to be generated by whichever learning participant (teacher or student) knows better where the deficiencies lie; this will allow the concepts to be more quickly and effectively refined.(cont.) When working in a closed domain such as this one, the computer learner knows exactly which conceptual uncertainties remain, and which hypotheses need to be tested and confirmed. The system uses these labeled examples to automatically build a LADDER shape description, using a modification of the version spaces algorithm that handles interrelated constraints, and which also has the ability to learn negative and disjunctive constraints.by Tracy Anne Hammond.Ph.D

    Scalable software and models for large-scale extracellular recordings

    Get PDF
    The brain represents information about the world through the electrical activity of populations of neurons. By placing an electrode near a neuron that is firing (spiking), it is possible to detect the resulting extracellular action potential (EAP) that is transmitted down an axon to other neurons. In this way, it is possible to monitor the communication of a group of neurons to uncover how they encode and transmit information. As the number of recorded neurons continues to increase, however, so do the data processing and analysis challenges. It is crucial that scalable software and analysis tools are developed and made available to the neuroscience community to keep up with the large amounts of data that are already being gathered. This thesis is composed of three pieces of work which I develop in order to better process and analyze large-scale extracellular recordings. My work spans all stages of extracellular analysis from the processing of raw electrical recordings to the development of statistical models to reveal underlying structure in neural population activity. In the first work, I focus on developing software to improve the comparison and adoption of different computational approaches for spike sorting. When analyzing neural recordings, most researchers are interested in the spiking activity of individual neurons, which must be extracted from the raw electrical traces through a process called spike sorting. Much development has been directed towards improving the performance and automation of spike sorting. This continuous development, while essential, has contributed to an over-saturation of new, incompatible tools that hinders rigorous benchmarking and complicates reproducible analysis. To address these limitations, I develop SpikeInterface, an open-source, Python framework designed to unify preexisting spike sorting technologies into a single toolkit and to facilitate straightforward benchmarking of different approaches. With this framework, I demonstrate that modern, automated spike sorters have low agreement when analyzing the same dataset, i.e. they find different numbers of neurons with different activity profiles; This result holds true for a variety of simulated and real datasets. Also, I demonstrate that utilizing a consensus-based approach to spike sorting, where the outputs of multiple spike sorters are combined, can dramatically reduce the number of falsely detected neurons. In the second work, I focus on developing an unsupervised machine learning approach for determining the source location of individually detected spikes that are recorded by high-density, microelectrode arrays. By localizing the source of individual spikes, my method is able to determine the approximate position of the recorded neuriii ons in relation to the microelectrode array. To allow my model to work with large-scale datasets, I utilize deep neural networks, a family of machine learning algorithms that can be trained to approximate complicated functions in a scalable fashion. I evaluate my method on both simulated and real extracellular datasets, demonstrating that it is more accurate than other commonly used methods. Also, I show that location estimates for individual spikes can be utilized to improve the efficiency and accuracy of spike sorting. After training, my method allows for localization of one million spikes in approximately 37 seconds on a TITAN X GPU, enabling real-time analysis of massive extracellular datasets. In my third and final presented work, I focus on developing an unsupervised machine learning model that can uncover patterns of activity from neural populations associated with a behaviour being performed. Specifically, I introduce Targeted Neural Dynamical Modelling (TNDM), a statistical model that jointly models the neural activity and any external behavioural variables. TNDM decomposes neural dynamics (i.e. temporal activity patterns) into behaviourally relevant and behaviourally irrelevant dynamics; the behaviourally relevant dynamics constitute all activity patterns required to generate the behaviour of interest while behaviourally irrelevant dynamics may be completely unrelated (e.g. other behavioural or brain states), or even related to behaviour execution (e.g. dynamics that are associated with behaviour generally but are not task specific). Again, I implement TNDM using a deep neural network to improve its scalability and expressivity. On synthetic data and on real recordings from the premotor (PMd) and primary motor cortex (M1) of a monkey performing a center-out reaching task, I show that TNDM is able to extract low-dimensional neural dynamics that are highly predictive of behaviour without sacrificing its fit to the neural data

    Between syntax and morphology

    Get PDF
    Synopsis: This volume collects novel contributions to comparative generative linguistics that “rethink” existing approaches to an extensive range of phenomena, domains, and architectural questions in linguistic theory. At the heart of the contributions is the tension between descriptive and explanatory adequacy which has long animated generative linguistics and which continues to grow thanks to the increasing amount and diversity of data available to us. The chapters address research questions in comparative morphosyntax, including the modelling of syntactic categories, relative clauses, and demonstrative systems. Many of these contributions show the influence of research by Ian Roberts and collaborators and give the reader a sense of the lively nature of current discussion of topics in morphosyntax and morphosyntactic variation. This book is complemented by volume I available at https://langsci-press.org/catalog/book/275 and volume III available at https://langsci-press.org/catalog/book/277

    Syntactic architecture and its consequences II

    Get PDF
    This volume collects novel contributions to comparative generative linguistics that “rethink” existing approaches to an extensive range of phenomena, domains, and architectural questions in linguistic theory. At the heart of the contributions is the tension between descriptive and explanatory adequacy which has long animated generative linguistics and which continues to grow thanks to the increasing amount and diversity of data available to us. The chapters address research questions in comparative morphosyntax, including the modelling of syntactic categories, relative clauses, and demonstrative systems. Many of these contributions show the influence of research by Ian Roberts and collaborators and give the reader a sense of the lively nature of current discussion of topics in morphosyntax and morphosyntactic variation. This book is complemented by volume I available at https://langsci-press.org/catalog/book/275 and volume III available at https://langsci-press.org/catalog/book/277
    corecore