183 research outputs found

    Image sense disambiguation : a multimodal approach

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 131-136).If a picture is worth a thousand words, can a thousand words be worth a training image? Most successful object recognition algorithms require manually annotated images of objects to be collected for training. The amount of human effort required to collect training data has limited most approaches to the several hundred object categories available in the labeled datasets. While human-annotated image data is scarce, additional sources of information can be used as weak labels, reducing the need for human supervision. In this thesis, we use three types of information to learn models of object categories: speech, text and dictionaries. We demonstrate that our use of non-traditional information sources facilitates automatic acquisition of visual object models for arbitrary words without requiring any labeled image examples. Spoken object references occur in many scenarios: interaction with an assistant robot, voice-tagging of photos, etc. Existing reference resolution methods are unimodal, relying either only on image features, or only on speech recognition. We propose a method that uses both the image of the object and the speech segment referring to it to disambiguate the underlying object label. We show that even noisy speech input helps visual recognition, and vice versa. We also explore two sources of linguistic sense information: the words surrounding images on web pages, and dictionary entries for nouns that refer to objects. Keywords that index images on the web have been used as weak object labels, but these tend to produce noisy datasets with many unrelated images. We use unlabeled text, dictionary definitions, and semantic relations between concepts to learn a refined model of image sense. Our model can work with as little supervision as a single English word. We apply this model to a dataset of web images indexed by polysemous keywords, and show that it improves both retrieval of specific senses, and the resulting object classifiers.by Kate Saenko.Ph.D

    Developmentally deep perceptual system for a humanoid robot

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (p. 139-152).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity.(cont.) This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally.by Paul Michael Fitzpatrick.Ph.D

    From First Contact to Close Encounters: A Developmentally Deep Perceptual System for a Humanoid Robot

    Get PDF
    This thesis presents a perceptual system for a humanoid robot that integrates abilities such as object localization and recognition with the deeper developmental machinery required to forge those competences out of raw physical experiences. It shows that a robotic platform can build up and maintain a system for object localization, segmentation, and recognition, starting from very little. What the robot starts with is a direct solution to achieving figure/ground separation: it simply 'pokes around' in a region of visual ambiguity and watches what happens. If the arm passes through an area, that area is recognized as free space. If the arm collides with an object, causing it to move, the robot can use that motion to segment the object from the background. Once the robot can acquire reliable segmented views of objects, it learns from them, and from then on recognizes and segments those objects without further contact. Both low-level and high-level visual features can also be learned in this way, and examples are presented for both: orientation detection and affordance recognition, respectively. The motivation for this work is simple. Training on large corpora of annotated real-world data has proven crucial for creating robust solutions to perceptual problems such as speech recognition and face detection. But the powerful tools used during training of such systems are typically stripped away at deployment. Ideally they should remain, particularly for unstable tasks such as object detection, where the set of objects needed in a task tomorrow might be different from the set of objects needed today. The key limiting factor is access to training data, but as this thesis shows, that need not be a problem on a robotic platform that can actively probe its environment, and carry out experiments to resolve ambiguity. This work is an instance of a general approach to learning a new perceptual judgment: find special situations in which the perceptual judgment is easy and study these situations to find correlated features that can be observed more generally

    3D terrain generation using neural networks

    Get PDF
    With the increase in computation power, coupled with the advancements in the field in the form of GANs and cGANs, Neural Networks have become an attractive proposition for content generation. This opened opportunities for Procedural Content Generation algorithms (PCG) to tap Neural Networks generative power to create tools that allow developers to remove part of creative and developmental burden imposed throughout the gaming industry, be it from investors looking for a return on their investment and from consumers that want more and better content, fast. This dissertation sets out to develop a PCG mixed-initiative tool, leveraging cGANs, to create authored 3D terrains, allowing users to directly influence the resulting generated content without the need for formal training on terrain generation or complex interactions with the tool to influence the generative output, as opposed to state of the art generative algorithms that only allow for random content generation or are needlessly complex. Testing done to 113 people online, as well as in-person testing done to 30 people, revealed that it is indeed possible to develop a tool that allows users from any level of terrain creation knowledge, and minimal tool training, to easily create a 3D terrain that is more realistic looking than those generated by state-of-the-art solutions such as Perlin Noise.Com o aumento do poder de computação, juntamente com os avanços neste campo na forma de GANs e cGANs, as Redes Neurais tornaram-se numa proposta atrativa para a geração de conteúdos. Graças a estes avanços, abriram-se oportunidades para os algoritmos de Geração de Conteúdos Procedimentais(PCG) explorarem o poder generativo das Redes Neurais para a criação de ferramentas que permitam aos programadores remover parte da carga criativa e de desenvolvimento imposta em toda a indústria dos jogos, seja por parte dos investidores que procuram um retorno do seu investimento ou por parte dos consumidores que querem mais e melhor conteúdo, o mais rápido possível. Esta dissertação pretende desenvolver uma ferramenta de iniciativa mista PCG, alavancando cGANs, para criar terrenos 3D cocriados, permitindo aos utilizadores influenciarem diretamente o conteúdo gerado sem necessidade de terem formação formal sobre a criação de terrenos 3D ou interações complexas com a ferramenta para influenciar a produção generativa, opondo-se assim a algoritmos generativos comummente utilizados, que apenas permitem a geração de conteúdo aleatório ou que são desnecessariamente complexos. Um conjunto de testes feitos a 113 pessoas online e a 30 pessoas presencialmente, revelaram que é de facto possível desenvolver uma ferramenta que permita aos utilizadores, de qualquer nível de conhecimento sobre criação de terrenos, e com uma formação mínima na ferramenta, criar um terreno 3D mais realista do que os terrenos gerados a partir da solução de estado da arte, como o Perlin Noise, e de uma forma fácil

    Entities with quantities : extraction, search, and ranking

    Get PDF
    Quantities are more than numeric values. They denote measures of the world’s entities such as heights of buildings, running times of athletes, energy efficiency of car models or energy production of power plants, all expressed in numbers with associated units. Entity-centric search and question answering (QA) are well supported by modern search engines. However, they do not work well when the queries involve quantity filters, such as searching for athletes who ran 200m under 20 seconds or companies with quarterly revenue above $2 Billion. State-of-the-art systems fail to understand the quantities, including the condition (less than, above, etc.), the unit of interest (seconds, dollar, etc.), and the context of the quantity (200m race, quarterly revenue, etc.). QA systems based on structured knowledge bases (KBs) also fail as quantities are poorly covered by state-of-the-art KBs. In this dissertation, we developed new methods to advance the state-of-the-art on quantity knowledge extraction and search.Zahlen sind mehr als nur numerische Werte. Sie beschreiben Maße von Entitäten wie die Höhe von Gebäuden, die Laufzeit von Sportlern, die Energieeffizienz von Automodellen oder die Energieerzeugung von Kraftwerken - jeweils ausgedrückt durch Zahlen mit zugehörigen Einheiten. Entitätszentriete Anfragen und direktes Question-Answering werden von Suchmaschinen häufig gut unterstützt. Sie funktionieren jedoch nicht gut, wenn die Fragen Zahlenfilter beinhalten, wie z. B. die Suche nach Sportlern, die 200m unter 20 Sekunden gelaufen sind, oder nach Unternehmen mit einem Quartalsumsatz von über 2 Milliarden US-Dollar. Selbst moderne Systeme schaffen es nicht, Quantitäten, einschließlich der genannten Bedingungen (weniger als, über, etc.), der Maßeinheiten (Sekunden, Dollar, etc.) und des Kontexts (200-Meter-Rennen, Quartalsumsatz usw.), zu verstehen. Auch QA-Systeme, die auf strukturierten Wissensbanken (“Knowledge Bases”, KBs) aufgebaut sind, versagen, da quantitative Eigenschaften von modernen KBs kaum erfasst werden. In dieser Dissertation werden neue Methoden entwickelt, um den Stand der Technik zur Wissensextraktion und -suche von Quantitäten voranzutreiben. Unsere Hauptbeiträge sind die folgenden: • Zunächst präsentieren wir Qsearch [Ho et al., 2019, Ho et al., 2020] – ein System, das mit erweiterten Fragen mit Quantitätsfiltern umgehen kann, indem es Hinweise verwendet, die sowohl in der Frage als auch in den Textquellen vorhanden sind. Qsearch umfasst zwei Hauptbeiträge. Der erste Beitrag ist ein tiefes neuronales Netzwerkmodell, das für die Extraktion quantitätszentrierter Tupel aus Textquellen entwickelt wurde. Der zweite Beitrag ist ein neuartiges Query-Matching-Modell zum Finden und zur Reihung passender Tupel. • Zweitens, um beim Vorgang heterogene Tabellen einzubinden, stellen wir QuTE [Ho et al., 2021a, Ho et al., 2021b] vor – ein System zum Extrahieren von Quantitätsinformationen aus Webquellen, insbesondere Ad-hoc Webtabellen in HTML-Seiten. Der Beitrag von QuTE umfasst eine Methode zur Verknüpfung von Quantitäts- und Entitätsspalten, für die externe Textquellen genutzt werden. Zur Beantwortung von Fragen kontextualisieren wir die extrahierten Entitäts-Quantitäts-Paare mit informativen Hinweisen aus der Tabelle und stellen eine neue Methode zur Konsolidierung und verbesserteer Reihung von Antwortkandidaten durch Inter-Fakten-Konsistenz vor. • Drittens stellen wir QL [Ho et al., 2022] vor – eine Recall-orientierte Methode zur Anreicherung von Knowledge Bases (KBs) mit quantitativen Fakten. Moderne KBs wie Wikidata oder YAGO decken viele Entitäten und ihre relevanten Informationen ab, übersehen aber oft wichtige quantitative Eigenschaften. QL ist frage-gesteuert und basiert auf iterativem Lernen mit zwei Hauptbeiträgen, um die KB-Abdeckung zu verbessern. Der erste Beitrag ist eine Methode zur Expansion von Fragen, um einen größeren Pool an Faktenkandidaten zu erfassen. Der zweite Beitrag ist eine Technik zur Selbstkonsistenz durch Berücksichtigung der Werteverteilungen von Quantitäten

    Recent Developments in Smart Healthcare

    Get PDF
    Medicine is undergoing a sector-wide transformation thanks to the advances in computing and networking technologies. Healthcare is changing from reactive and hospital-centered to preventive and personalized, from disease focused to well-being centered. In essence, the healthcare systems, as well as fundamental medicine research, are becoming smarter. We anticipate significant improvements in areas ranging from molecular genomics and proteomics to decision support for healthcare professionals through big data analytics, to support behavior changes through technology-enabled self-management, and social and motivational support. Furthermore, with smart technologies, healthcare delivery could also be made more efficient, higher quality, and lower cost. In this special issue, we received a total 45 submissions and accepted 19 outstanding papers that roughly span across several interesting topics on smart healthcare, including public health, health information technology (Health IT), and smart medicine

    Accessible On-Body Interaction for People With Visual Impairments

    Get PDF
    While mobile devices offer new opportunities to gain independence in everyday activities for people with disabilities, modern touchscreen-based interfaces can present accessibility challenges for low vision and blind users. Even with state-of-the-art screenreaders, it can be difficult or time-consuming to select specific items without visual feedback. The smooth surface of the touchscreen provides little tactile feedback compared to physical button-based phones. Furthermore, in a mobile context, hand-held devices present additional accessibility issues when both of the users’ hands are not available for interaction (e.g., on hand may be holding a cane or a dog leash). To improve mobile accessibility for people with visual impairments, I investigate on-body interaction, which employs the user’s own skin surface as the input space. On-body interaction may offer an alternative or complementary means of mobile interaction for people with visual impairments by enabling non-visual interaction with extra tactile and proprioceptive feedback compared to a touchscreen. In addition, on-body input may free users’ hands and offer efficient interaction as it can eliminate the need to pull out or hold the device. Despite this potential, little work has investigated the accessibility of on-body interaction for people with visual impairments. Thus, I begin by identifying needs and preferences of accessible on-body interaction. From there, I evaluate user performance in target acquisition and shape drawing tasks on the hand compared to on a touchscreen. Building on these studies, I focus on the design, implementation, and evaluation of an accessible on-body interaction system for visually impaired users. The contributions of this dissertation are: (1) identification of perceived advantages and limitations of on-body input compared to a touchscreen phone, (2) empirical evidence of the performance benefits of on-body input over touchscreen input in terms of speed and accuracy, (3) implementation and evaluation of an on-body gesture recognizer using finger- and wrist-mounted sensors, and (4) design implications for accessible non-visual on-body interaction for people with visual impairments

    Mastering Uncertainty in Mechanical Engineering

    Get PDF
    This open access book reports on innovative methods, technologies and strategies for mastering uncertainty in technical systems. Despite the fact that current research on uncertainty is mainly focusing on uncertainty quantification and analysis, this book gives emphasis to innovative ways to master uncertainty in engineering design, production and product usage alike. It gathers authoritative contributions by more than 30 scientists reporting on years of research in the areas of engineering, applied mathematics and law, thus offering a timely, comprehensive and multidisciplinary account of theories and methods for quantifying data, model and structural uncertainty, and of fundamental strategies for mastering uncertainty. It covers key concepts such as robustness, flexibility and resilience in detail. All the described methods, technologies and strategies have been validated with the help of three technical systems, i.e. the Modular Active Spring-Damper System, the Active Air Spring and the 3D Servo Press, which have been in turn developed and tested during more than ten years of cooperative research. Overall, this book offers a timely, practice-oriented reference guide to graduate students, researchers and professionals dealing with uncertainty in the broad field of mechanical engineering

    HandSight: A Touch-Based Wearable System to Increase Information Accessibility for People with Visual Impairments

    Get PDF
    Many activities of daily living such as getting dressed, preparing food, wayfinding, or shopping rely heavily on visual information, and the inability to access that information can negatively impact the quality of life for people with vision impairments. While numerous researchers have explored solutions for assisting with visual tasks that can be performed at a distance, such as identifying landmarks for navigation or recognizing people and objects, few have attempted to provide access to nearby visual information through touch. Touch is a highly attuned means of acquiring tactile and spatial information, especially for people with vision impairments. By supporting touch-based access to information, we may help users to better understand how a surface appears (e.g., document layout, clothing patterns), thereby improving the quality of life. To address this gap in research, this dissertation explores methods to augment a visually impaired user’s sense of touch with interactive, real-time computer vision to access information about the physical world. These explorations span three application areas: reading and exploring printed documents, controlling mobile devices, and identifying colors and visual textures. At the core of each application is a system called HandSight that uses wearable cameras and other sensors to detect touch events and identify surface content beneath the user’s finger. To create HandSight, we designed and implemented the physical hardware, developed signal processing and computer vision algorithms, and designed real-time feedback that enables users to interpret visual or digital content. We involve visually impaired users throughout the design and development process, conducting several user studies to assess usability and robustness and to improve our prototype designs. The contributions of this dissertation include: (i) developing and iteratively refining HandSight, a novel wearable system to assist visually impaired users in their daily lives; (ii) evaluating HandSight across a diverse set of tasks, and identifying tradeoffs of a finger-worn approach in terms of physical design, algorithmic complexity and robustness, and usability; and (iii) identifying broader design implications for future wearable systems and for the fields of accessibility, computer vision, augmented and virtual reality, and human-computer interaction
    • …
    corecore