11 research outputs found

    DS-UCAT: Sistema de diálogo multimodal y multilingüe para un entorno educativo

    Full text link
    Actas de las IV Jornadas de Tecnología del Habla (JTH 2006)En este artículo presentamos un sistema de diálogo multimodal y multilingüe que estamos desarrollando para proporcionar asistencia a estudiantes y profesores en algunas de sus actividades habituales en un entorno educativo, p. e. en una Facultad de una Universidad. Tenemos previsto que además de interactuar con el usuario, el sistema pueda interactuar con el entorno en que éste se encuentra en un momento dado, el cual puede cambiar a lo largo de una interacción conforme el usuario se mueve dentro del centro educativo. El artículo describe la arquitectura del sistema, muestra cómo se realiza la interacción con la versión actual del mismo, y comenta cómo tenemos previsto utilizar técnicas de inteligencia ambiental para mejorar su funcionamiento.Este trabajo ha sido financiado por el Ministerio de Ciencia y Tecnología, mediante el proyecto TIN2004-03140 Ubiquitous Collaborative Training

    Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality

    Get PDF
    Sketch and speech are intuitive interaction methods that convey complementary information and have been independently used for 3D model retrieval in virtual environments. While sketch has been shown to be an effective retrieval method, not all collections are easily navigable using this modality alone. We design a new challenging database for sketch comprised of 3D chairs where each of the components (arms, legs, seat, back) are independently colored. To overcome this, we implement a multimodal interface for querying 3D model databases within a virtual environment. We base the sketch on the state-of-the-art for 3D Sketch Retrieval, and use a Wizard-of-Oz style experiment to process the voice input. In this way, we avoid the complexities of natural language processing which frequently requires fine-tuning to be robust. We conduct two user studies and show that hybrid search strategies emerge from the combination of interactions, fostering the advantages provided by both modalities

    Learning to describe multimodally from parallel unimodal data? A pilot study on verbal and sketched object descriptions

    Get PDF
    Han T, Zarrieß S, Komatani K, Schlangen D. Learning to describe multimodally from parallel unimodal data? A pilot study on verbal and sketched object descriptions. In: Proceedings of the 22nd Workshop on the Semantics and Pragmatics of Dialogue (AixDial). 2018

    Enabling collaboration in the sketching domain

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (leaf 47).Sketching, although deceptively simple and seemingly primitive, is a powerful paradigm for designing and understanding many types of engineering systems. Many problem domains, such as designing electrical circuits, developing flow charts, and modeling simple mechanical devices, rely heavily on the ability to produce sketches efficiently in order to bring out the most salient features. Engineers working in these domains usually rely on pen and paper to generate their design sketches. They do this because more advanced technologies (such as notebook computers) are often unavailable, hard to learn, or cumbersome. It is important for engineers to collaborate with their colleagues while working on their sketches. Unfortunately, collaboration on sketches that exist only as pen and paper often proves to be tedious, requiring a minimum of a fax machine and scanner. Engineers could benefit from a more efficient means of collaboration when dealing with pen and paper sketches. The technology exists to improve the current situation and make pen and paper sketches a more effective medium for collaborative design. This thesis presents an implementation of a system that achieves three goals. First, the system allows two users to collaborate on the production of a sketch in much the same way they would collaborate when composing a document (with one user composing a sketch, then accepting or rejecting the changes of his collaborator).(cont.) Second, it allows users to watch a collaborator's additions play in real time, like watching a movie. And finally, it links the sketch recognition and simulation software developed by the Design Rationale Group at MIT with a simple pen and paper interface, allowing engineers to run simulations of their design sketches. These goals are achieved by using a commercial pen produced by the Anoto Group that is capable of storing the strokes it draws. In essence, the user creates both a hard and soft copy of the sketch simultaneously, and can share the soft copy with any collaborator. Using this model of production, sketches can be collaboratively generated, edited, and reviewed quickly and easily, all using only a pen, paper, and a standard printer.by Jesse Michael Smithnosky.M.Eng

    Speech-controlled animation system

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 66-68).In order to make the task of animation creation easy, a different approach should be presented to the user. This thesis describes a new approach to animation creation. In this approach, the representation of animation is based on changes in state and not descriptions of the state. Users issue commands, which change the state of objects, to quickly and easily create new and interesting animations. The implemented system, called COMMANIMATION, also includes speech recognition technology, which allows for a more natural user interface. In addition to exploring animation, COMMANIMATION, a system that runs over three networked computers, provides a platform to study issues in pervasive computing, such as the use of multimodal inputs, error control, and error detection.by Nancy Ellen Kho.M.Eng

    Application of Machine Learning within Visual Content Production

    Get PDF
    We are living in an era where digital content is being produced at a dazzling pace. The heterogeneity of contents and contexts is so varied that a numerous amount of applications have been created to respond to people and market demands. The visual content production pipeline is the generalisation of the process that allows a content editor to create and evaluate their product, such as a video, an image, a 3D model, etc. Such data is then displayed on one or more devices such as TVs, PC monitors, virtual reality head-mounted displays, tablets, mobiles, or even smartwatches. Content creation can be simple as clicking a button to film a video and then share it into a social network, or complex as managing a dense user interface full of parameters by using keyboard and mouse to generate a realistic 3D model for a VR game. In this second example, such sophistication results in a steep learning curve for beginner-level users. In contrast, expert users regularly need to refine their skills via expensive lessons, time-consuming tutorials, or experience. Thus, user interaction plays an essential role in the diffusion of content creation software, primarily when it is targeted to untrained people. In particular, with the fast spread of virtual reality devices into the consumer market, new opportunities for designing reliable and intuitive interfaces have been created. Such new interactions need to take a step beyond the point and click interaction typical of the 2D desktop environment. The interactions need to be smart, intuitive and reliable, to interpret 3D gestures and therefore, more accurate algorithms are needed to recognise patterns. In recent years, machine learning and in particular deep learning have achieved outstanding results in many branches of computer science, such as computer graphics and human-computer interface, outperforming algorithms that were considered state of the art, however, there are only fleeting efforts to translate this into virtual reality. In this thesis, we seek to apply and take advantage of deep learning models to two different content production pipeline areas embracing the following subjects of interest: advanced methods for user interaction and visual quality assessment. First, we focus on 3D sketching to retrieve models from an extensive database of complex geometries and textures, while the user is immersed in a virtual environment. We explore both 2D and 3D strokes as tools for model retrieval in VR. Therefore, we implement a novel system for improving accuracy in searching for a 3D model. We contribute an efficient method to describe models through 3D sketch via an iterative descriptor generation, focusing both on accuracy and user experience. To evaluate it, we design a user study to compare different interactions for sketch generation. Second, we explore the combination of sketch input and vocal description to correct and fine-tune the search for 3D models in a database containing fine-grained variation. We analyse sketch and speech queries, identifying a way to incorporate both of them into our system's interaction loop. Third, in the context of the visual content production pipeline, we present a detailed study of visual metrics. We propose a novel method for detecting rendering-based artefacts in images. It exploits analogous deep learning algorithms used when extracting features from sketches

    Multimodal Interactive DialOgue System

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 239-243).Interactions between people are typically conversational, multimodal, and symmetric. In conversational interactions, information flows in both directions. In multimodal interactions, people use multiple channels. In symmetric interactions, both participants communicate multimodally, with the integration of and switching between modalities basically effortless. In contrast, consider typical human-computer interaction. It is almost always unidirectional { we're telling the machine what to do; it's almost always unimodal (can you type and use the mouse simultaneously?); and it's symmetric only in the disappointing sense that when you type, it types back at you. There are a variety of things wrong with this picture. Perhaps chief among them is that if communication is unidirectional, it must be complete and unambiguous, exhaustively anticipating every detail and every misinterpretation. In brief, it's exhausting. This thesis examines the benefits of creating multimodal human-computer dialogues that employ sketching and speech, aimed initially at the task of describing early stage designs of simple mechanical devices. The goal of the system is to be a collaborative partner, facilitating design conversations. Two initial user studies provided key insights into multimodal communication: simple questions are powerful, color choices are deliberate, and modalities are closely coordinated. These observations formed the basis for our multimodal interactive dialogue system, or Midos. Midos makes possible a dynamic dialogue, i.e., one in which it asks questions to resolve uncertainties or ambiguities.(cont.) The benefits of a dialogue in reducing the cognitive overhead of communication have long been known. We show here that having the system able to ask questions is good, but for an unstructured task like describing a design, knowing what questions to ask is crucial. We describe an architecture that enables the system to accept partial information from the user, then request details it considers relevant, noticeably lowering the cognitive overhead of communicating. The multimodal questions Midos asks are in addition purposefully designed to use the same multimodal integration pattern that people exhibited in our study. Our evaluation of the system showed that Midos successfully engages the user in a dialogue and produces the same conversational features as our initial human-human conversation studies.by Aaron Daniel Adler.Ph.D

    Estudio e integración de un sistema de diálogos dinámico en un entorno inteligente

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid. Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de lectura: 6-05-200

    Perceptually-based language to simplify sketch recognition user interface development

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 473-495).Diagrammatic sketching is a natural modality of human-computer interaction that can be used for a variety of tasks, for example, conceptual design. Sketch recognition systems are currently being developed for many domains. However, they require signal-processing expertise if they are to handle the intricacies of each domain, and they are time-consuming to build. Our goal is to enable user interface designers and domain experts who may not have expertise in sketch recognition to be able to build these sketch systems. We created and implemented a new framework (FLUID - f acilitating user interface development) in which developers can specify a domain description indicating how domain shapes are to be recognized, displayed, and edited. This description is then automatically transformed into a sketch recognition user interface for that domain. LADDER, a language using a perceptual vocabulary based on Gestalt principles, was developed to describe how to recognize, display, and edit domain shapes. A translator and a customizable recognition system (GUILD - a generator of user interfaces using ladder descriptions) are combined with a domain description to automatically create a domain specific recognition system.(cont.) With this new technology, by writing a domain description, developers are able to create a new sketch interface for a domain, greatly reducing the time and expertise for the task Continuing in pursuit of our goal to facilitate UI development, we noted that 1) human generated descriptions contained syntactic and conceptual errors, and that 2) it is more natural for a user to specify a shape by drawing it than by editing text. However, computer generated descriptions from a single drawn example are also flawed, as one cannot express all allowable variations in a single example. In response, we created a modification of the traditional model of active learning in which the system selectively generates its own near-miss examples and uses the human teacher as a source of labels. System generated near-misses offer a number of advantages. Human generated examples are tedious to create and may not expose problems in the current concept. It seems most effective for the near-miss examples to be generated by whichever learning participant (teacher or student) knows better where the deficiencies lie; this will allow the concepts to be more quickly and effectively refined.(cont.) When working in a closed domain such as this one, the computer learner knows exactly which conceptual uncertainties remain, and which hypotheses need to be tested and confirmed. The system uses these labeled examples to automatically build a LADDER shape description, using a modification of the version spaces algorithm that handles interrelated constraints, and which also has the ability to learn negative and disjunctive constraints.by Tracy Anne Hammond.Ph.D

    Speech and sketching for multimodal design

    No full text
    While sketches are commonly and effectively used in the early stages of design, some information is far more easily conveyed verbally than by sketching. In response, we have combined sketching with speech, enabling a more natural form of communication. We studied the behavior of people sketching and speaking, and from this derived a set of rules for segmenting and aligning the signals from both modalities. Once the inputs are aligned, we use both modalities in interpretation. The result is a more natural interface to our system. Categories and Subject Descriptors H.5.2 [Information Interfaces and Presentation (e.g., HCI)]: User Interfaces—Natural language, Graphical user interfaces (GUI), Evaluation/methodology, Input devices and strategies (e.g., mouse, touchscreen), Interaction styles (e.g., commands, menus, forms, direc
    corecore