58 research outputs found
Recommended from our members
Learning to See with Minimal Human Supervision
Deep learning has significantly advanced computer vision in the past decade, paving the way for practical applications such as facial recognition and autonomous driving. However, current techniques depend heavily on human supervision, limiting their broader deployment. This dissertation tackles this problem by introducing algorithms and theories to minimize human supervision in three key areas: data, annotations, and neural network architectures, in the context of various visual understanding tasks such as object detection, image restoration, and 3D generation.
First, we present self-supervised learning algorithms to handle in-the-wild images and videos that traditionally require time-consuming manual curation and labeling. We demonstrate that when a deep network is trained to be invariant to geometric and photometric transformations, representations from its intermediate layers are highly predictive of object semantic parts such as eyes and noses. This insight offers a simple unsupervised learning framework that significantly improves the efficiency and accuracy of few-shot landmark prediction and matching. We then present a technique for learning single-view 3D object pose estimation models by utilizing in-the-wild videos where objects turn (e.g., cars in roundabouts). This technique achieves competitive performance with respect to existing state-of-the-art without requiring any manual labels during training. We also contribute an Accidental Turntables Dataset, containing a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur, and illumination changes that serve as a benchmark for 3D pose estimation.
Second, we address variations in labeling styles across different annotators, which leads to a type of noisy label referred to as heterogeneous label. This variability in human annotation can cause subpar performance during both the training and testing phases. To mitigate this, we have developed a framework that models the labeling styles of individual annotators, reducing the impact of human annotation variations and enhancing the performance of standard object detection models. We have also applied this framework to analyze ecological data, which are often collected opportunistically across different case studies without consistent annotation guidelines. Through this application, we have obtained several insightful observations into large-scale bird migration behaviors and their relationship to climate change.
Our next study explores the challenges of designing neural networks, an area that lacks a comprehensive theoretical understanding. By linking deep neural networks with Gaussian processes, we propose a novel Bayesian interpretation of the deep image prior, which parameterizes a natural image as the output of a convolutional network with random parameters and random input. This approach offers valuable insights to optimize the design of neural networks for various image restoration tasks.
Lastly, we introduce several machine-learning techniques to reconstruct and edit 3D shapes from 2D images with minimal human effort. We first present a generic multi-modal generative model that bridges 2D images and 3D shapes via a shared latent space, and demonstrate its applications on versatile 3D shape generation and manipulation tasks. Additionally, we develop a framework for joint estimation of 3D neural scene representation and camera poses. This approach outperforms prior works and allows us to operate in the general SE(3) camera pose setting, unlike the baselines. The results also indicate this method can be complementary to classical structure-from-motion (SfM) pipelines as it compares favorably to SfM on low-texture and low-resolution images
A New Texture Based Segmentation Method to Extract Object from Background
Extraction of object regions from complex background is a hard task and it is an essential part of image segmentation and recognition. Image segmentation denotes a process of dividing an image into different regions. Several segmentation approaches for images have been developed. Image segmentation plays a vital role in image analysis. According to several authors, segmentation terminates when the observer2019;s goal is satisfied. The very first problem of segmentation is that a unique general method still does not exist: depending on the application, algorithm performances vary. This paper studies the insect segmentation in complex background. The segmentation methodology on insect images consists of five steps. Firstly, the original image of RGB space is converted into Lab color space. In the second step 2018;a2019; component of Lab color space is extracted. Then segmentation by two-dimension OTSU of automatic threshold in 2018;a-channel2019; is performed. Based on the color segmentation result, and the texture differences between the background image and the required object, the object is extracted by the gray level co-occurrence matrix for texture segmentation. The algorithm was tested on dreamstime image database and the results prove to be satisfactory
Skin Colour Segmentation using Fintte Bivariate Pearsonian Type-IV a Mixture Model
The human computer interaction with respect to skin colour is an important area of research due to its ready applications in several areas like face recognition, surveillance, image retrievals, identification, gesture analysis, human tracking etc. For efficient skin colour segmentation statistical modeling is a prime desiderata. In general skin colour segment is done based on Gaussian mixture model. Due to the limitations on GMM like symmetric and mesokurtic nature the accuracy of the skin colour segmentation is affected. To improve the accuracy of the skin colour segmentation system, In this paper the skin colour is modeled by a finite bivariate Pearsonian type-IVa mixture distribution under HSI colour space of the image. The model parameters are estimated by EM algorithm. Using the Bayesian frame the segmentation algorithm is proposed. Through experimentation it is observed that the proposed skin colour segmentation algorithm perform better with respect to the segmentation quality metrics like PRI, GCE and VOI. The ROC curves plotted for the system also revealed that the developed algorithm segment pixels in the image more efficiently. Keywords: Skin colour segmentation, HSI colour space, Bivariate Pearson type IVa mixture model, Image segmentation metrics
An Analysis of Facial Expression Recognition Techniques
In present era of technology , we need applications which could be easy to use and are user-friendly , that even people with specific disabilities use them easily. Facial Expression Recognition has vital role and challenges in communities of computer vision, pattern recognition which provide much more attention due to potential application in many areas such as human machine interaction, surveillance , robotics , driver safety, non- verbal communication, entertainment, health- care and psychology study. Facial Expression Recognition has major importance ration in face recognition for significant image applications understanding and analysis. There are many algorithms have been implemented on different static (uniform background, identical poses, similar illuminations ) and dynamic (position variation, partial occlusion orientation, varying lighting )conditions. In general way face expression recognition consist of three main steps first is face detection then feature Extraction and at last classification. In this survey paper we discussed different types of facial expression recognition techniques and various methods which is used by them and their performance measures
Particle Filters for Colour-Based Face Tracking Under Varying Illumination
Automatic human face tracking is the basis of robotic and active vision systems used for facial feature analysis, automatic surveillance, video conferencing, intelligent transportation, human-computer interaction and many other applications. Superior human face tracking will allow future safety surveillance systems which monitor drowsy drivers, or patients and elderly people at the risk of seizure or sudden falls and will perform with lower risk of failure in unexpected situations. This area has actively been researched in the current literature in an attempt to make automatic face trackers more stable in challenging real-world environments. To detect faces in video sequences, features like colour, texture, intensity, shape or motion is used. Among these feature colour has been the most popular, because of its insensitivity to orientation and size changes and fast process-ability. The challenge of colour-based face trackers, however, has been dealing with the instability of trackers in case of colour changes due to the drastic variation in environmental illumination. Probabilistic tracking and the employment of particle filters as powerful Bayesian stochastic estimators, on the other hand, is increasing in the visual tracking field thanks to their ability to handle multi-modal distributions in cluttered scenes. Traditional particle filters utilize transition prior as importance sampling function, but this can result in poor posterior sampling. The objective of this research is to investigate and propose stable face tracker capable of dealing with challenges like rapid and random motion of head, scale changes when people are moving closer or further from the camera, motion of multiple people with close skin tones in the vicinity of the model person, presence of clutter and occlusion of face. The main focus has been on investigating an efficient method to address the sensitivity of the colour-based trackers in case of gradual or drastic illumination variations. The particle filter is used to overcome the instability of face trackers due to nonlinear and random head motions. To increase the traditional particle filter\u27s sampling efficiency an improved version of the particle filter is introduced that considers the latest measurements. This improved particle filter employs a new colour-based bottom-up approach that leads particles to generate an effective proposal distribution. The colour-based bottom-up approach is a classification technique for fast skin colour segmentation. This method is independent to distribution shape and does not require excessive memory storage or exhaustive prior training. Finally, to address the adaptability of the colour-based face tracker to illumination changes, an original likelihood model is proposed based of spatial rank information that considers both the illumination invariant colour ordering of a face\u27s pixels in an image or video frame and the spatial interaction between them. The original contribution of this work lies in the unique mixture of existing and proposed components to improve colour-base recognition and tracking of faces in complex scenes, especially where drastic illumination changes occur. Experimental results of the final version of the proposed face tracker, which combines the methods developed, are provided in the last chapter of this manuscript
People objects : 3-D modeling of heads in real-time
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 1998.Includes bibliographical references (p. 54-59).by Thomas E. Slowe.S.M
Semantic classification of rural and urban images using learning vector quantization
One of the major hurdles in semantic image classification is that only low-level features can be reliably extracted from images as opposed to higher level features (objects present in the scene and their inter-relationships). The main challenge lies in grouping images into semantically meaningful categories based on the available low-level visual features of the images. It is important that we have a classification method that will handle a complex image dataset with not so well defined boundaries between clusters. Learning Vector Quantization (LVQ) neural networks offer a great deal of robustness in clustering complex datasets. This study presents a semantic image classification using LVQ neural network that uses low level texture, shape, and color features that are extracted from images from rural and urban domains using the Box Counting Dimension method (Peitgen et al. 1992), Fast Fourier Transformation and HSV color space. The performance measures precision and recall were calculated while using various ranges of input parameters such as learning rate, iterations, number of hidden neurons for the LVQ network. The study also tested for the feature robustness for image object orientation (rotation and position) and image size. Our method was compared against the method given in Prabhakar et al, 2002. The precision and recall while using various combination of texture, shape, and color features for our method was between .68 and .88, and 0.64 and .90 respectively compared against the precision and recall (for our image data set) of 0.59 and .62 for the method given by Prabhakar et al., 2002
Enhanced face detection framework based on skin color and false alarm rejection
Fast and precise face detection is a challenging task in computer vision. Human face detection plays an essential role in the first stage of face processing applications such as recognition tracking, and image database management. In the applications, face objects often come from an inconsequential part of images that contain variations namely different illumination, pose, and occlusion. These variations can decrease face detection rate noticeably. Besides that, detection time is an important factor, especially in real time systems. Most existing face detection approaches are not accurate as they have not been able to resolve unstructured images due to large appearance variations and can only detect human face under one particular variation. Existing frameworks of face detection need enhancement to detect human face under the stated variations to improve detection rate and reduce detection time. In this study, an enhanced face detection framework was proposed to improve detection rate based on skin color and provide a validity process. A preliminary segmentation of input images based on skin color can significantly reduce search space and accelerate the procedure of human face detection. The main detection process is based on Haar-like features and Adaboost algorithm. A validity process is introduced to reject non-face objects, which may be selected during a face detection process. The validity process is based on a two-stage Extended Local Binary Patterns. Experimental results on CMU-MIT and Caltech 10000 datasets over a wide range of facial variations in different colors, positions, scales, and lighting conditions indicated a successful face detection rate. As a conclusion, the proposed enhanced face detection framework in color images with the presence of varying lighting conditions and under different poses has resulted in high detection rate and reducing overall detection time
Analysis and synthesis of iris images
Of all the physiological traits of the human body that help in personal identification, the iris is probably the most robust and accurate. Although numerous iris recognition algorithms have been proposed, the underlying processes that define the texture of irises have not been extensively studied. In this thesis, multiple pair-wise pixel interactions have been used to describe the textural content of the iris image thereby resulting in a Markov Random Field (MRF) model for the iris image. This information is expected to be useful for the development of user-specific models for iris images, i.e. the matcher could be tuned to accommodate the characteristics of each user\u27s iris image in order to improve matching performance. We also use MRF modeling to construct synthetic irises based on iris primitive extracted from real iris images. The synthesis procedure is deterministic and avoids the sampling of a probability distribution making it computationally simple. We demonstrate that iris textures in general are significantly different from other irregular textural patterns. Clustering experiments indicate that the synthetic irises generated using the proposed technique are similar in textural content to real iris images
- …