669 research outputs found

    Towards generalization of semi-supervised place classification over generalized Voronoi graph

    Full text link
    With the progress of human-robot interaction (HRI), the ability of a robot to perform high-level tasks in complex environments is fast becoming an essential requirement. To this end, it is desirable for a robot to understand the environment at both geometric and semantic levels. Therefore in recent years, research towards place classification has been gaining in popularity. After the era of heuristic and rule-based approaches, supervised learning algorithms have been extensively used for this purpose, showing satisfactory performance levels. However, most of those approaches have only been trained and tested in the same environments and thus impede a generalized solution. In this paper, we have proposed a semi-supervised place classification over a generalized Voronoi graph (SPCoGVG) which is a semi-supervised learning framework comprised of three techniques: support vector machine (SVM), conditional random field (CRF) and generalized Voronoi graph (GVG), in order to improve the generalizability. The inherent problem of training CRF with partially labeled data has been solved using a novel parameter estimation algorithm. The effectiveness of the proposed algorithm is validated through extensive analysis of data collected in international university environments. © 2013 Elsevier B.V. All rights reserved

    Indoor place classification for intelligent mobile systems

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Place classification is an emerging theme in the study of human-robot interaction which requires common understanding of human-defined concepts between the humans and machines. The requirement posts a significant challenge to the current intelligent mobile systems which are more likely to be operating in absolute coordinate systems, and hence unaware of the semantic labels. Aimed at filling this gap, the objective of the research is to develop an approach for intelligent mobile systems to understand and label the indoor environments in a holistic way based on the sensory observations. Focusing on commonly available sensors and machine learning based solutions which play a significant role in the research of place classification, solutions to train a machine to assign unknown instances with concepts understandable to human beings, like room, office and corridor, in both independent and structured prediction ways, have been proposed in this research. The solution modelling dependencies between random variables, which takes the spatial relationship between observations into consideration, is further extended by integrating the logical coexistence of the objects and the places to provide the machine with the additional object detection ability. The main techniques involve logistic regression, support vector machine, and conditional random field, in both supervised and semi-supervised learning frameworks. Experiments in a variety of environments show convincing place classification results through machine learning based approaches on data collected with either single or multiple sensory modalities; modelling spatial dependencies and introducing semi-supervised learning paradigm further improve the accuracy of the prediction and the generalisation ability of the system; and vision-based object detection can be seamlessly integrated into the learning framework to enhance the discrimination ability and the flexibility of the system. The contributions of this research lie in the in-depth studies on the place classification solutions with independent predictions, the improvements on the generalisation ability of the system through semi-supervised learning paradigm, the formulation of training a conditional random field with partially labelled data, and the integration of multiple cues in two sensory modalities to improve the system's functionality. It is anticipated that the findings of this research will significantly enhance the current capabilities of the human robot interaction and robot-environment interaction

    Multiple Instance Learning: A Survey of Problem Characteristics and Applications

    Full text link
    Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

    Stochastic Methods for Fine-Grained Image Segmentation and Uncertainty Estimation in Computer Vision

    Get PDF
    In this dissertation, we exploit concepts of probability theory, stochastic methods and machine learning to address three existing limitations of deep learning-based models for image understanding. First, although convolutional neural networks (CNN) have substantially improved the state of the art in image understanding, conventional CNNs provide segmentation masks that poorly adhere to object boundaries, a critical limitation for many potential applications. Second, training deep learning models requires large amounts of carefully selected and annotated data, but large-scale annotation of image segmentation datasets is often prohibitively expensive. And third, conventional deep learning models also lack the capability of uncertainty estimation, which compromises both decision making and model interpretability. To address these limitations, we introduce the Region Growing Refinement (RGR) algorithm, an unsupervised post-processing algorithm that exploits Monte Carlo sampling and pixel similarities to propagate high-confidence labels into regions of low-confidence classification. The probabilistic Region Growing Refinement (pRGR) provides RGR with a rigorous mathematical foundation that exploits concepts of Bayesian estimation and variance reduction techniques. Experiments demonstrate both the effectiveness of (p)RGR for the refinement of segmentation predictions, as well as its suitability for uncertainty estimation, since its variance estimates obtained in the Monte Carlo iterations are highly correlated with segmentation accuracy. We also introduce FreeLabel, an intuitive open-source web interface that exploits RGR to allow users to obtain high-quality segmentation masks with just a few freehand scribbles, in a matter of seconds. Designed to benefit the computer vision community, FreeLabel can be used for both crowdsourced or private annotation and has a modular structure that can be easily adapted for any image dataset. The practical relevance of methods developed in this dissertation are illustrated through applications on agricultural and healthcare-related domains. We have combined RGR and modern CNNs for fine segmentation of fruit flowers, motivated by the importance of automated bloom intensity estimation for optimization of fruit orchard management and, possibly, automatizing procedures such as flower thinning and pollination. We also exploited an early version of FreeLabel to annotate novel datasets for segmentation of fruit flowers, which are currently publicly available. Finally, this dissertation also describes works on fine segmentation and gaze estimation for images collected from assisted living environments, with the ultimate goal of assisting geriatricians in evaluating health status of patients in such facilities

    Transfer learning: bridging the gap between deep learning and domain-specific text mining

    Get PDF
    Inspired by the success of deep learning techniques in Natural Language Processing (NLP), this dissertation tackles the domain-specific text mining problems for which the generic deep learning approaches would fail. More specifically, the domain-specific problems are: (1) success prediction in crowdfunding, (2) variants identification in biomedical literature, and (3) text data augmentation for domains with low-resources. In the first part, transfer learning in a multimodal perspective is utilized to facilitate solving the project success prediction on the crowdfunding application. Even though the information in a project profile can be of different modalities such as text, images, and metadata, most existing prediction approaches leverage only the text modality. It is promising to utilize the visual images in project profiles to find out how images could contribute to the success prediction. An advanced neural network scheme is designed and evaluated combining information learned from different modalities for project success prediction. In the second part, transfer learning is combined with deep learning techniques to solve genomic variants Named Entity Recognition (NER) problems in biomedical literature. Most of the advanced generic NER algorithms can fail due to the restricted training corpus. However, those generic deep learning algorithms are capable of learning from a canonical corpus, without any effort on feature engineering. This work aims to build an end-to-end deep learning approach to transfer the domain-specific knowledge to those advanced generic NER algorithms, addressing the challenges in low-resource training and requiring neither hand-crafted features nor post-processing rules. For the last part, transfer learning with knowledge distillation and active learning are utilized to solve text augmentation for domains with low-resources. Most of the recent text augmentation methods heavily rely on large external resources. This work is dedicates to solving the text augmentation problem adaptively and consistently with minimal resources for token-level tasks like NER. The solution can also assure the reliability of machine labels for noisy data and can enhance training consistency with noisy labels. All the works are evaluated on different domain-specific benchmarks, respectively. Experimental results demonstrate the effectiveness of those proposed methods. The advantages also indicate promising potential for transfer learning in domain-specific applications

    Methods for Learning Structured Prediction in Semantic Segmentation of Natural Images

    Get PDF
    Automatic segmentation and recognition of semantic classes in natural images is an important open problem in computer vision. In this work, we investigate three different approaches to recognition: without supervision, with supervision on level of images, and with supervision on the level of pixels. The thesis comprises three parts. The first part introduces a clustering algorithm that optimizes a novel information-theoretic objective function. We show that the proposed algorithm has clear advantages over standard algorithms from the literature on a wide array of datasets. Clustering algorithms are an important building block for higher-level computer vision applications, in particular for semantic segmentation. The second part of this work proposes an algorithm for automatic segmentation and recognition of object classes in natural images, that learns a segmentation model solely from annotation in the form of presence and absence of object classes in images. The third and main part of this work investigates one of the most popular approaches to the task of object class segmentation and semantic segmentation, based on conditional random fields and structured prediction. We investigate several learning algorithms, in particular in combination with approximate inference procedures. We show how structured models for image segmentation can be learned exactly in practical settings, even in the presence of many loops in the underlying neighborhood graphs. The introduced methods provide results advancing the state-of-the-art on two complex benchmark datasets for semantic segmentation, the MSRC-21 Dataset of RGB images and the NYU V2 Dataset or RGB-D images of indoor scenes. Finally, we introduce a software library that al- lows us to perform extensive empirical comparisons of state-of-the-art structured learning approaches. This allows us to characterize their practical properties in a range of applications, in particular for semantic segmentation and object class segmentation.Methoden zum Lernen von Strukturierter Vorhersage in Semantischer Segmentierung von Natürlichen Bildern Automatische Segmentierung und Erkennung von semantischen Klassen in natür- lichen Bildern ist ein wichtiges offenes Problem des maschinellen Sehens. In dieser Arbeit untersuchen wir drei möglichen Ansätze der Erkennung: ohne Überwachung, mit Überwachung auf Ebene von Bildern und mit Überwachung auf Ebene von Pixeln. Diese Arbeit setzt sich aus drei Teilen zusammen. Im ersten Teil der Arbeit schlagen wir einen Clustering-Algorithmus vor, der eine neuartige, informationstheoretische Zielfunktion optimiert. Wir zeigen, dass der vorgestellte Algorithmus üblichen Standardverfahren aus der Literatur gegenüber klare Vorteile auf vielen verschiedenen Datensätzen hat. Clustering ist ein wichtiger Baustein in vielen Applikationen des machinellen Sehens, insbesondere in der automatischen Segmentierung. Der zweite Teil dieser Arbeit stellt ein Verfahren zur automatischen Segmentierung und Erkennung von Objektklassen in natürlichen Bildern vor, das mit Hilfe von Supervision in Form von Klassen-Vorkommen auf Bildern in der Lage ist ein Segmentierungsmodell zu lernen. Der dritte Teil der Arbeit untersucht einen der am weitesten verbreiteten Ansätze zur semantischen Segmentierung und Objektklassensegmentierung, Conditional Random Fields, verbunden mit Verfahren der strukturierten Vorhersage. Wir untersuchen verschiedene Lernalgorithmen des strukturierten Lernens, insbesondere im Zusammenhang mit approximativer Vorhersage. Wir zeigen, dass es möglich ist trotz des Vorhandenseins von Kreisen in den betrachteten Nachbarschaftsgraphen exakte strukturierte Modelle zur Bildsegmentierung zu lernen. Mit den vorgestellten Methoden bringen wir den Stand der Kunst auf zwei komplexen Datensätzen zur semantischen Segmentierung voran, dem MSRC-21 Datensatz von RGB-Bildern und dem NYU V2 Datensatz von RGB-D Bildern von Innenraum-Szenen. Wir stellen außerdem eine Software-Bibliothek vor, die es erlaubt einen weitreichenden Vergleich der besten Lernverfahren für strukturiertes Lernen durchzuführen. Unsere Studie erlaubt uns eine Charakterisierung der betrachteten Algorithmen in einer Reihe von Anwendungen, insbesondere der semantischen Segmentierung und Objektklassensegmentierung

    Line Based Multi-Range Asymmetric Conditional Random Field For Terrestrial Laser Scanning Data Classification

    Get PDF
    Terrestrial Laser Scanning (TLS) is a ground-based, active imaging method that rapidly acquires accurate, highly dense three-dimensional point cloud of object surfaces by laser range finding. For fully utilizing its benefits, developing a robust method to classify many objects of interests from huge amounts of laser point clouds is urgently required. However, classifying massive TLS data faces many challenges, such as complex urban scene, partial data acquisition from occlusion. To make an automatic, accurate and robust TLS data classification, we present a line-based multi-range asymmetric Conditional Random Field algorithm. The first contribution is to propose a line-base TLS data classification method. In this thesis, we are interested in seven classes: building, roof, pedestrian road (PR), tree, low man-made object (LMO), vehicle road (VR), and low vegetation (LV). The line-based classification is implemented in each scan profile, which follows the line profiling nature of laser scanning mechanism.Ten conventional local classifiers are tested, including popular generative and discriminative classifiers, and experimental results validate that the line-based method can achieve satisfying classification performance. However, local classifiers implement labeling task on individual line independently of its neighborhood, the inference of which often suffers from similar local appearance across different object classes. The second contribution is to propose a multi-range asymmetric Conditional Random Field (maCRF) model, which uses object context as post-classification to improve the performance of a local generative classifier. The maCRF incorporates appearance, local smoothness constraint, and global scene layout regularity together into a probabilistic graphical model. The local smoothness enforces that lines in a local area to have the same class label, while scene layout favours an asymmetric regularity of spatial arrangement between different object classes within long-range, which is considered both in vertical (above-bellow relation) and horizontal (front-behind) directions. The asymmetric regularity allows capturing directional spatial arrangement between pairwise objects (e.g. it allows ground is lower than building, not vice-versa). The third contribution is to extend the maCRF model by adding across scan profile context, which is called Across scan profile Multi-range Asymmetric Conditional Random Field (amaCRF) model. Due to the sweeping nature of laser scanning, the sequentially acquired TLS data has strong spatial dependency, and the across scan profile context can provide more contextual information. The final contribution is to propose a sequential classification strategy. Along the sweeping direction of laser scanning, amaCRF models were sequentially constructed. By dynamically updating posterior probability of common scan profiles, contextual information propagates through adjacent scan profiles
    corecore