168 research outputs found

    Content-Based Image Retrieval Using Self-Organizing Maps

    Full text link

    Action-oriented Scene Understanding

    Get PDF
    In order to allow robots to act autonomously it is crucial that they do not only describe their environment accurately but also identify how to interact with their surroundings. While we witnessed tremendous progress in descriptive computer vision, approaches that explicitly target action are scarcer. This cumulative dissertation approaches the goal of interpreting visual scenes “in the wild” with respect to actions implied by the scene. We call this approach action-oriented scene understanding. It involves identifying and judging opportunities for interaction with constituents of the scene (e.g. objects and their parts) as well as understanding object functions and how interactions will impact the future. All of these aspects are addressed on three levels of abstraction: elements, perception and reasoning. On the elementary level, we investigate semantic and functional grouping of objects by analyzing annotated natural image scenes. We compare object label-based and visual context definitions with respect to their suitability for generating meaningful object class representations. Our findings suggest that representations generated from visual context are on-par in terms of semantic quality with those generated from large quantities of text. The perceptive level concerns action identification. We propose a system to identify possible interactions for robots and humans with the environment (affordances) on a pixel level using state-of-the-art machine learning methods. Pixel-wise part annotations of images are transformed into 12 affordance maps. Using these maps, a convolutional neural network is trained to densely predict affordance maps from unknown RGB images. In contrast to previous work, this approach operates exclusively on RGB images during both, training and testing, and yet achieves state-of-the-art performance. At the reasoning level, we extend the question from asking what actions are possible to what actions are plausible. For this, we gathered a dataset of household images associated with human ratings of the likelihoods of eight different actions. Based on the judgement provided by the human raters, we train convolutional neural networks to generate plausibility scores from unseen images. Furthermore, having considered only static scenes previously in this thesis, we propose a system that takes video input and predicts plausible future actions. Since this requires careful identification of relevant features in the video sequence, we analyze this particular aspect in detail using a synthetic dataset for several state-of-the-art video models. We identify feature learning as a major obstacle for anticipation in natural video data. The presented projects analyze the role of action in scene understanding from various angles and in multiple settings while highlighting the advantages of assuming an action-oriented perspective. We conclude that action-oriented scene understanding can augment classic computer vision in many real-life applications, in particular robotics

    A review of technical factors to consider when designing neural networks for semantic segmentation of Earth Observation imagery

    Full text link
    Semantic segmentation (classification) of Earth Observation imagery is a crucial task in remote sensing. This paper presents a comprehensive review of technical factors to consider when designing neural networks for this purpose. The review focuses on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Generative Adversarial Networks (GANs), and transformer models, discussing prominent design patterns for these ANN families and their implications for semantic segmentation. Common pre-processing techniques for ensuring optimal data preparation are also covered. These include methods for image normalization and chipping, as well as strategies for addressing data imbalance in training samples, and techniques for overcoming limited data, including augmentation techniques, transfer learning, and domain adaptation. By encompassing both the technical aspects of neural network design and the data-related considerations, this review provides researchers and practitioners with a comprehensive and up-to-date understanding of the factors involved in designing effective neural networks for semantic segmentation of Earth Observation imagery.Comment: 145 pages with 32 figure

    Brain-machine interface coupled cognitive sensory fusion with a Kohonen and reservoir computing scheme

    Get PDF
    Artificial Intelligence (AI) has been a source of great intrigue and has spawned many questions regarding the human condition and the core of what it means to be a sentient entity. The field has bifurcated into so-called “weak” and “strong” artificial intelligence. In weak artificial intelligence reside the forms of automation and data mining that we interact with on a daily basis. Strong artificial intelligence can be best defined as a “synthetic” being with cognitive abilities and the capacity for presence of mind that we would normally associate with humankind. We feel that this distinction is misguided. First, we begin with the statement that intelligence lies on a spectrum, even in artificial systems. The fact that our systems currently can be considered weak artificial intelligence does not preclude our ability to develop an understanding that can lead us to more complex behavior. In this research, we utilized neural feedback via electroencephalogram (EEG) data to develop an emotional landscape for linguistic interaction via the android's sensory fields which we consider to be part and parcel of embodied cognition. We have also given the iCub child android the instinct to babble the words it has learned. This is a skill that we leveraged for low-level linguistic acquisition in the latter part of this research, the slightly stronger artificial intelligence goal. This research is motivated by two main questions regarding intelligence: Is intelligence an emergent phenomenon? And, if so, can multi-modal sensory information and a term coined called “co-intelligence” which is a shared sensory experience via coupling EEG input, assist in the development of representations in the mind that we colloquially refer to as language? Given that it is not reasonable to program all of the activities needed to foster intelligence in artificial systems, our hope is that these types of forays will set the stage for further development of stronger artificial intelligence constructs. We have incorporated self-organizing processes - i.e. Kohonen maps, hidden Markov models for the speech, language development and emotional information via neural data - to help lay the substrate for emergence. Next, homage is given to the central and unique role played in intellectual study by language. We have also developed rudimentary associative memory for the iCub that is derived from the aforementioned sensory input that was collected. We formalized this process only as needed, but that is based on the assumption that mind, brain and language can be represented using the mathematics and logic of the day without contradiction. We have some reservations regarding this statement, but unfortunately a proof is a task beyond the scope of this Ph.D. Finally, this data from the coupling of the EEG and the other sensory modes of embodied cognition is used to interact with a reservoir computing recurrent neural network in an attempt to produce simple language interaction, e.g. babbling, from the child android

    Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications

    Get PDF
    The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations
    • …
    corecore