313 research outputs found

    IST Austria Thesis

    Get PDF
    Modern computer vision systems heavily rely on statistical machine learning models, which typically require large amounts of labeled data to be learned reliably. Moreover, very recently computer vision research widely adopted techniques for representation learning, which further increase the demand for labeled data. However, for many important practical problems there is relatively small amount of labeled data available, so it is problematic to leverage full potential of the representation learning methods. One way to overcome this obstacle is to invest substantial resources into producing large labelled datasets. Unfortunately, this can be prohibitively expensive in practice. In this thesis we focus on the alternative way of tackling the aforementioned issue. We concentrate on methods, which make use of weakly-labeled or even unlabeled data. Specifically, the first half of the thesis is dedicated to the semantic image segmentation task. We develop a technique, which achieves competitive segmentation performance and only requires annotations in a form of global image-level labels instead of dense segmentation masks. Subsequently, we present a new methodology, which further improves segmentation performance by leveraging tiny additional feedback from a human annotator. By using our methods practitioners can greatly reduce the amount of data annotation effort, which is required to learn modern image segmentation models. In the second half of the thesis we focus on methods for learning from unlabeled visual data. We study a family of autoregressive models for modeling structure of natural images and discuss potential applications of these models. Moreover, we conduct in-depth study of one of these applications, where we develop the state-of-the-art model for the probabilistic image colorization task

    IST Austria Thesis

    Get PDF
    Deep neural networks have established a new standard for data-dependent feature extraction pipelines in the Computer Vision literature. Despite their remarkable performance in the standard supervised learning scenario, i.e. when models are trained with labeled data and tested on samples that follow a similar distribution, neural networks have been shown to struggle with more advanced generalization abilities, such as transferring knowledge across visually different domains, or generalizing to new unseen combinations of known concepts. In this thesis we argue that, in contrast to the usual black-box behavior of neural networks, leveraging more structured internal representations is a promising direction for tackling such problems. In particular, we focus on two forms of structure. First, we tackle modularity: We show that (i) compositional architectures are a natural tool for modeling reasoning tasks, in that they efficiently capture their combinatorial nature, which is key for generalizing beyond the compositions seen during training. We investigate how to to learn such models, both formally and experimentally, for the task of abstract visual reasoning. Then, we show that (ii) in some settings, modularity allows us to efficiently break down complex tasks into smaller, easier, modules, thereby improving computational efficiency; We study this behavior in the context of generative models for colorization, as well as for small objects detection. Secondly, we investigate the inherently layered structure of representations learned by neural networks, and analyze its role in the context of transfer learning and domain adaptation across visually dissimilar domains

    DCTM: Discrete-Continuous Transformation Matching for Semantic Flow

    Full text link
    Techniques for dense semantic correspondence have provided limited ability to deal with the geometric variations that commonly exist between semantically similar images. While variations due to scale and rotation have been examined, there lack practical solutions for more complex deformations such as affine transformations because of the tremendous size of the associated solution space. To address this problem, we present a discrete-continuous transformation matching (DCTM) framework where dense affine transformation fields are inferred through a discrete label optimization in which the labels are iteratively updated via continuous regularization. In this way, our approach draws solutions from the continuous space of affine transformations in a manner that can be computed efficiently through constant-time edge-aware filtering and a proposed affine-varying CNN-based descriptor. Experimental results show that this model outperforms the state-of-the-art methods for dense semantic correspondence on various benchmarks

    How Can Play-based Learning with Authentic Assessment Practices Support Healthy Development in Preschool Classrooms?

    Get PDF
    Early childhood educators need to recognize the importance of implementing play-based learning as a developmentally appropriate practice to support young children’s development. Play provides children opportunities to demonstrate knowledge and understandings through materials in the environment, expand on prior experiences and learn alongside peers. Teachers can authentically gain valuable information from children’s play to plan appropriate lessons to foster development (NAEYC, 2009). Instead of using natural routines to assess and gather information about a child’s knowledge and behavior, educators use standardized methods to determine a child’s capabilities. This paper synthesized available research on qualitative and quantitative studies that analyzed the effects of authentic assessment to support educators understanding of a child’s developmental domains. Research examined how naturally assessing children in the environment informed teachers to create a meaningful curriculum that was appropriate and monitored individuals progress continuously. Other studies evaluated the effects of play-based learning to promote development. Research reviewed positively supported preschool teachers to use play and authentic assessment for a more accurate understanding of an individual’s knowledge. An assessment tool that is authentic supports collection of on-going data to provide teachers feedback to make instructional changes

    IST Austria Thesis

    Get PDF
    The human ability to recognize objects in complex scenes has driven research in the computer vision field over couple of decades. This thesis focuses on the object recognition task in images. That is, given the image, we want the computer system to be able to predict the class of the object that appears in the image. A recent successful attempt to bridge semantic understanding of the image perceived by humans and by computers uses attribute-based models. Attributes are semantic properties of the objects shared across different categories, which humans and computers can decide on. To explore the attribute-based models we take a statistical machine learning approach, and address two key learning challenges in view of object recognition task: learning augmented attributes as mid-level discriminative feature representation, and learning with attributes as privileged information. Our main contributions are parametric and non-parametric models and algorithms to solve these frameworks. In the parametric approach, we explore an autoencoder model combined with the large margin nearest neighbor principle for mid-level feature learning, and linear support vector machines for learning with privileged information. In the non-parametric approach, we propose a supervised Indian Buffet Process for automatic augmentation of semantic attributes, and explore the Gaussian Processes classification framework for learning with privileged information. A thorough experimental analysis shows the effectiveness of the proposed models in both parametric and non-parametric views

    Animation of Hand-drawn Faces using Machine Learning

    Get PDF
    Today's research in artificial vision has brought us new and exciting possibilities for the production and analysis of multimedia content. Pose estimation is an artificial vision technology that detects and identifies a human body's position and orientation within a picture or video. It locates key points on the bodies, and uses them to create three-dimensional models. In digital animation, pose estimation has paved the way for new visual effects and 3D renderings. By detecting human movements, it is now possible to create fluid realistic animations from still images. This bachelor thesis discusses the development of a pose estimation based program that is able to animate hand-drawn faces -- in particular the caricatured faces in Papiri di Laurea -- using machine learning and image manipulation. Working off of existing techniques for motion capture and 3D animation and making use of existing computer vision libraries like \textit{OpenCV} or \textit{dlib}, the project gave a satisfying result in the form of a short video of a hand-drawn caricatured figure that assumes the facial expressions fed to the program through an input video. The \textit{First Order Motion Model} was used to create this facial animation. It is a model based on the idea of transferring the movement detected from a source video to an image. %This model works best on close-ups of faces; the larger the background, the more the image gets distorted in the background. Possible future developments could include the creation of a website: the user loads their drawing and a video of themselves to get a gif version of their papiro. This could make for a new feature to add to portraits and caricatures, and more specifically to this thesis, a new way to celebrate graduates in Padova.Today's research in artificial vision has brought us new and exciting possibilities for the production and analysis of multimedia content. Pose estimation is an artificial vision technology that detects and identifies a human body's position and orientation within a picture or video. It locates key points on the bodies, and uses them to create three-dimensional models. In digital animation, pose estimation has paved the way for new visual effects and 3D renderings. By detecting human movements, it is now possible to create fluid realistic animations from still images. This bachelor thesis discusses the development of a pose estimation based program that is able to animate hand-drawn faces -- in particular the caricatured faces in Papiri di Laurea -- using machine learning and image manipulation. Working off of existing techniques for motion capture and 3D animation and making use of existing computer vision libraries like \textit{OpenCV} or \textit{dlib}, the project gave a satisfying result in the form of a short video of a hand-drawn caricatured figure that assumes the facial expressions fed to the program through an input video. The \textit{First Order Motion Model} was used to create this facial animation. It is a model based on the idea of transferring the movement detected from a source video to an image. %This model works best on close-ups of faces; the larger the background, the more the image gets distorted in the background. Possible future developments could include the creation of a website: the user loads their drawing and a video of themselves to get a gif version of their papiro. This could make for a new feature to add to portraits and caricatures, and more specifically to this thesis, a new way to celebrate graduates in Padova

    Content warehouses

    Get PDF
    Nowadays, content management systems are an established technology. Based on the experiences from several application scenarios we discuss the points of contact between content management systems and other disciplines of information systems engineering like data warehouses, data mining, and data integration. We derive a system architecture called "content warehouse" that integrates these technologies and defines a more general and more sophisticated view on content management. As an example, a system for the collection, maintenance, and evaluation of biological content like survey data or multimedia resources is shown as a case study
    • …
    corecore