50 research outputs found

    Designing Network Design Strategies Through Gradient Path Analysis

    Full text link
    Designing a high-efficiency and high-quality expressive network architecture has always been the most important research topic in the field of deep learning. Most of today's network design strategies focus on how to integrate features extracted from different layers, and how to design computing units to effectively extract these features, thereby enhancing the expressiveness of the network. This paper proposes a new network design strategy, i.e., to design the network architecture based on gradient path analysis. On the whole, most of today's mainstream network design strategies are based on feed forward path, that is, the network architecture is designed based on the data path. In this paper, we hope to enhance the expressive ability of the trained model by improving the network learning ability. Due to the mechanism driving the network parameter learning is the backward propagation algorithm, we design network design strategies based on back propagation path. We propose the gradient path design strategies for the layer-level, the stage-level, and the network-level, and the design strategies are proved to be superior and feasible from theoretical analysis and experiments.Comment: 12 pages, 9 figure

    Holistic indoor scene understanding, modelling and reconstruction from single images.

    Get PDF
    3D indoor scene understanding in computer vision refers to perceiving the semantic and geometric information in a 3D indoor environment from partial observations (e.g. images or depth scans). Semantics in a scene generally involves the conceptual knowledge such as the room layout, object categories, and their interrelationships (e.g. support relationship). These scene semantics are usually coupled with object and room geometry for 3D scene understanding, for example, layout plan (i.e. location of walls, ceiling and floor), shape of in-room objects, and a camera pose of observer. This thesis focuses on the problem of holistic 3D scene understanding from single images to model or reconstruct the in- door geometry with enriched scene semantics. This challenging task requires computers to perform equivalently as human vision system to perceive and understand indoor contents from colour intensities. Existing works either focus on a sub-problem (e.g. layout estimation, 3D detection or object reconstruction), or ad- dressing this entire problem with independent subtasks, while this thesis aims to an integrated and unified solution toward semantic scene understanding and reconstruction. In this thesis, scene semantics and geometry are regarded inter- twined and complementary. Understanding each part (semantics or geometry) helps to perceive the other one, which enables joint scene understanding, modelling & reconstruction. We start by the problem of semantic scene modelling. To estimate the object semantics and shapes from a single image, a feasible scene modelling streamline is proposed. It is backboned with fully convolutional networks to learn 2D semantics and geometry, and powered by a top-down shape retrieval for object modelling. After this, We build a unified and more efficient visual system for semantic scene modelling. Scene semantics are divided into relational (i.e. support relationship) and non-relational (i.e. object segmentation & geometry, room layout) knowledge. A Relation Network is proposed to estimate the support relations between objects to guide the object modelling process. Afterwards, We focus on the problem of holistic and end-to-end scene understanding and reconstruction. Instead of modelling scenes by top-down shape retrieval, this method bridges the gap between scene understanding and object mesh reconstruction. It does not rely on any external CAD repositories. Camera poses, room lay- out, object bounding boxes and meshes are end-to-end predicted from an RGB image with a single network architecture. At the end, We extend our work by using a different input modality, single-view depth scan, to explore the object reconstruction performance. A skeleton-bridged approach is proposed to predict the meso-skeleton of shapes as an intermediate representation to guide surface reconstruction, which outperforms the prior-arts in shape completion. Overall, this thesis provides a series of novel approaches towards holistic 3D indoor scene understanding, modelling and reconstruction. It aims at automatic 3D scene perception that enables machines to understand and predict 3D contents as human vision, which we hope could advance the boundaries of 3D vision in machine perception, robotics and Artificial Intelligence

    Survey of contemporary trends in color image segmentation

    Full text link

    Multi-modal surrogates for retrieving and making sense of videos: is synchronization between the multiple modalities optimal?

    Get PDF
    Video surrogates can help people quickly make sense of the content of a video before downloading or seeking more detailed information. Visual and audio features of a video are primary information carriers and might become important components of video retrieval and video sense-making. In the past decades, most research and development efforts on video surrogates have focused on visual features of the video, and comparatively little work has been done on audio surrogates and examining their pros and cons in aiding users' retrieval and sense-making of digital videos. Even less work has been done on multi-modal surrogates, where more than one modality are employed for consuming the surrogates, for example, the audio and visual modalities. This research examined the effectiveness of a number of multi-modal surrogates, and investigated whether synchronization between the audio and visual channels is optimal. A user study was conducted to evaluate six different surrogates on a set of six recognition and inference tasks to answer two main research questions: (1) How do automatically-generated multi-modal surrogates compare to manually-generated ones in video retrieval and video sense-making? and (2) Does synchronization between multiple surrogate channels enhance or inhibit video retrieval and video sense-making? Forty-eight participants participated in the study, in which the surrogates were measured on the the time participants spent on experiencing the surrogates, the time participants spent on doing the tasks, participants' performance accuracy on the tasks, participants' confidence in their task responses, and participants' subjective ratings on the surrogates. On average, the uncoordinated surrogates were more helpful than the coordinated ones, but the manually-generated surrogates were only more helpful than the automatically-generated ones in terms of task completion time. Participants' subjective ratings were more favorable for the coordinated surrogate C2 (Magic A + V) and the uncoordinated surrogate U1 (Magic A + Storyboard V) with respect to usefulness, usability, enjoyment, and engagement. The post-session questionnaire comments demonstrated participants' preference for the coordinated surrogates, but the comments also revealed the value of having uncoordinated sensory channels

    Tie-zone : the bridge between watershed transforms and fuzzy connectedness

    Get PDF
    Orientador: Roberto de Alencar LotufoTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de ComputaçãoResumo: Esta tese introduz o novo conceito de transformada de zona de empate que unifica as múltiplas soluções de uma transformada de watershed, conservando apenas as partes comuns em todas estas, tal que as partes que diferem constituem a zona de empate. A zona de empate aplicada ao watershed via transformada imagem-floresta (TZ-IFT-WT) se revela um elo inédito entre transformadas de watershed baseadas em paradigmas muito diferentes: gota d'água, inundação, caminhos ótimos e floresta de peso mínimo. Para todos esses paradigmas e os algoritmos derivados, é um desafio se ter uma solução única, fina, e que seja consistente com uma definição. Por isso, propõe-se um afinamento da zona de empate, único e consistente. Além disso, demonstra-se que a TZ-IFT-WT também é o dual de métodos de segmentação baseados em conexidade nebulosa. Assim, a ponte criada entre as abordagens morfológica e nebulosa permite aproveitar avanços de ambas. Em conseqüência disso, o conceito de núcleo de robustez para as sementes é explorado no caso do watershed.Abstract: This thesis introduces the new concept of tie-zone transform that unifies the multiple solutions of a watershed transform, by conserving only the common parts among them such that the differing parts constitute the tie zone. The tie zone applied to the watershed via image-foresting transform (TZ-IFTWT) proves to be a link between watershed transforms based on very different paradigms: drop of water, flooding, optimal paths and forest of minimum weight. For all these paradigms and the derived algorithms, it is a challenge to get a unique and thin solution which is consistent with a definition. That is why we propose a unique and consistent thinning of the tie zone. In addition, we demonstrate that the TZ-IFT-WT is also the dual of segmentation methods based on fuzzy connectedness. Thus, the bridge between the morphological and the fuzzy approaches allows to take benefit from the advance of both. As a consequence, the concept of cores of robustness for the seeds is exploited in the case of watersheds.DoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétric

    Human-Centric Deep Generative Models: The Blessing and The Curse

    Get PDF
    Over the past years, deep neural networks have achieved significant progress in a wide range of real-world applications. In particular, my research puts a focused lens in deep generative models, a neural network solution that proves effective in visual (re)creation. But is generative modeling a niche topic that should be researched on its own? My answer is critically no. In the thesis, I present the two sides of deep generative models, their blessing and their curse to human beings. Regarding what can deep generative models do for us, I demonstrate the improvement in performance and steerability of visual (re)creation. Regarding what can we do for deep generative models, my answer is to mitigate the security concerns of DeepFakes and improve minority inclusion of deep generative models. For the performance of deep generative models, I probe on applying attention modules and dual contrastive loss to generative adversarial networks (GANs), which pushes photorealistic image generation to a new state of the art. For the steerability, I introduce Texture Mixer, a simple yet effective approach to achieve steerable texture synthesis and blending. For the security, my research spans over a series of GAN fingerprinting solutions that enable the detection and attribution of GAN-generated image misuse. For the inclusion, I investigate the biased misbehavior of generative models and present my solution in enhancing the minority inclusion of GAN models over underrepresented image attributes. All in all, I propose to project actionable insights to the applications of deep generative models, and finally contribute to human-generator interaction

    Collaborative design and feasibility assessment of computational nutrient sensing for simulated food-intake tracking in a healthcare environment

    Get PDF
    One in four older adults (65 years and over) are living with some form of malnutrition. This increases their odds of hospitalization four-fold and is associated with decreased quality of life and increased mortality. In long-term care (LTC), residents have more complex care needs and the proportion affected is a staggering 54% primarily due to low intake. Tracking intake is important for monitoring whether residents are meeting their nutritional needs however current methods are time-consuming, subjective, and prone to large margins of error. This reduces the utility of tracked data and makes it challenging to identify individuals at-risk in a timely fashion. While technologies exist for tracking food-intake, they have not been designed for use within the LTC context and require a large time burden by the user. Especially in light of the machine learning boom, there is great opportunity to harness learnings from this domain and apply it to the field of nutrition for enhanced food-intake tracking. Additionally, current approaches to monitoring food-intake tracking are limited by the nutritional database to which they are linked making generalizability a challenge. Drawing inspiration from current methods, the desires of end-users (primary users: personal support workers, registered staff, dietitians), and machine learning approaches suitable for this context in which there is limited data available, we investigated novel methods for assessing needs in this environment and imagine an alternative approach. We leveraged image processing and machine learning to remove subjectivity while increasing accuracy and precision to support higher-quality food-intake tracking. This thesis presents the ideation, design, development and evaluation of a collaboratively designed, and feasibility assessment, of computational nutrient sensing for simulated food-intake tracking in the LTC environment. We sought to remove potential barriers to uptake through collaborative design and ongoing end user engagement for developing solution concepts for a novel Automated Food Imaging and Nutrient Intake Tracking (AFINI-T) system while implementing the technology in parallel. More specifically, we demonstrated the effectiveness of applying a modified participatory iterative design process modeled from the Google Sprint framework in the LTC context which identified priority areas and established functional criteria for usability and feasibility. Concurrently, we developed the novel AFINI-T system through the co-integration of image processing and machine learning and guided by the application of food-intake tracking in LTC to address three questions: (1) where is there food? (i.e., food segmentation), (2) how much food was consumed? (i.e., volume estimation) using a fully automatic imaging system for quantifying food-intake. We proposed a novel deep convolutional encoder-decoder food network with depth-refinement (EDFN-D) using an RGB-D camera for quantifying a plate’s remaining food volume relative to reference portions in whole and modified texture foods. To determine (3) what foods are present (i.e., feature extraction and classification), we developed a convolutional autoencoder to learn meaningful food-specific features and developed classifiers which leverage a priori information about when certain foods would be offered and the level of texture modification prescribed to apply real-world constraints of LTC. We sought to address real-world complexity by assessing a wide variety of food items through the construction of a simulated food-intake dataset emulating various degrees of food-intake and modified textures (regular, minced, puréed). To ensure feasibility-related barriers to uptake were mitigated, we employed a feasibility assessment using the collaboratively designed prototype. Finally, this thesis explores the feasibility of applying biophotonic principles to food as a first step to enhancing food database estimates. Motivated by a theoretical optical dilution model, a novel deep neural network (DNN) was evaluated for estimating relative nutrient density of commercially prepared purées. For deeper analysis we describe the link between color and two optically active nutrients, vitamin A, and anthocyanins, and suggest it may be feasible to utilize optical properties of foods to enhance nutritional estimation. This research demonstrates a transdisciplinary approach to designing and implementing a novel food-intake tracking system which addresses several shortcomings of the current method. Upon translation, this system may provide additional insights for supporting more timely nutritional interventions through enhanced monitoring of nutritional intake status among LTC residents

    Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications

    Get PDF
    The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations
    corecore