16 research outputs found

    Deep learning for object detection in robotic grasping contexts

    Get PDF
    Dans la dernière décennie, les approches basées sur les réseaux de neurones convolutionnels sont devenus les standards pour la plupart des tâches en vision numérique. Alors qu'une grande partie des méthodes classiques de vision étaient basées sur des règles et algorithmes, les réseaux de neurones sont optimisés directement à partir de données d'entraînement qui sont étiquetées pour la tâche voulue. En pratique, il peut être difficile d'obtenir une quantité su sante de données d'entraînement ou d'interpréter les prédictions faites par les réseaux. Également, le processus d'entraînement doit être recommencé pour chaque nouvelle tâche ou ensemble d'objets. Au final, bien que très performantes, les solutions basées sur des réseaux de neurones peuvent être difficiles à mettre en place. Dans cette thèse, nous proposons des stratégies visant à contourner ou solutionner en partie ces limitations en contexte de détection d'instances d'objets. Premièrement, nous proposons d'utiliser une approche en cascade consistant à utiliser un réseau de neurone comme pré-filtrage d'une méthode standard de "template matching". Cette façon de faire nous permet d'améliorer les performances de la méthode de "template matching" tout en gardant son interprétabilité. Deuxièmement, nous proposons une autre approche en cascade. Dans ce cas, nous proposons d'utiliser un réseau faiblement supervisé pour générer des images de probabilité afin d'inférer la position de chaque objet. Cela permet de simplifier le processus d'entraînement et diminuer le nombre d'images d'entraînement nécessaires pour obtenir de bonnes performances. Finalement, nous proposons une architecture de réseau de neurones ainsi qu'une procédure d'entraînement permettant de généraliser un détecteur d'objets à des objets qui ne sont pas vus par le réseau lors de l'entraînement. Notre approche supprime donc la nécessité de réentraîner le réseau de neurones pour chaque nouvel objet.In the last decade, deep convolutional neural networks became a standard for computer vision applications. As opposed to classical methods which are based on rules and hand-designed features, neural networks are optimized and learned directly from a set of labeled training data specific for a given task. In practice, both obtaining sufficient labeled training data and interpreting network outputs can be problematic. Additionnally, a neural network has to be retrained for new tasks or new sets of objects. Overall, while they perform really well, deployment of deep neural network approaches can be challenging. In this thesis, we propose strategies aiming at solving or getting around these limitations for object detection. First, we propose a cascade approach in which a neural network is used as a prefilter to a template matching approach, allowing an increased performance while keeping the interpretability of the matching method. Secondly, we propose another cascade approach in which a weakly-supervised network generates object-specific heatmaps that can be used to infer their position in an image. This approach simplifies the training process and decreases the number of required training images to get state-of-the-art performances. Finally, we propose a neural network architecture and a training procedure allowing detection of objects that were not seen during training, thus removing the need to retrain networks for new objects

    Neural Radiance Fields: Past, Present, and Future

    Full text link
    The various aspects like modeling and interpreting 3D environments and surroundings have enticed humans to progress their research in 3D Computer Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in Computer Graphics, Robotics, Computer Vision, and the possible scope of High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D models have gained traction from res with more than 1000 preprints related to NeRFs published. This paper serves as a bridge for people starting to study these fields by building on the basics of Mathematics, Geometry, Computer Vision, and Computer Graphics to the difficulties encountered in Implicit Representations at the intersection of all these disciplines. This survey provides the history of rendering, Implicit Learning, and NeRFs, the progression of research on NeRFs, and the potential applications and implications of NeRFs in today's world. In doing so, this survey categorizes all the NeRF-related research in terms of the datasets used, objective functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation

    Object detection and sim-to-real 6D pose estimation

    Get PDF
    Deep Learning has led to significant advances in computer vision, making perception an important component in many fields such as robotics, medicine, agriculture, remote sensing, etc. Object detection has been a major part of computer vision research that has led to further enhancements like object pose, grasp, and depth estimation. However, even object detectors suffer from a lack of data, which requires a well-defined data pipeline that first labels and then augments data. Based on the conducted review, no available labeling tool supports the benchmark (COCO) export functionality for multi-label ground truth, and no augmentation library supports transformations for the combination of polygon segmentation, bounding boxes, and key points. Having determined the need for an updated data pipeline, in this project, a novel approach is presented that spans from labeling to augmentation and includes data visualization, manipulation, and cleaning. In addition, this work majorly focuses on the usage of object detectors in an industrial use case and further uses multitask learning to develop a state-of-the-art multitask architecture. This pipeline and the architecture are further utilized to infer industrial object pose in the world coordinate frame. Finally, after comparison among multiple object detectors and pose estimators, a multitask architecture with pose estimation methodology is considered better for the industrial use case

    Large Scale Kernel Methods for Fun and Profit

    Get PDF
    Kernel methods are among the most flexible classes of machine learning models with strong theoretical guarantees. Wide classes of functions can be approximated arbitrarily well with kernels, while fast convergence and learning rates have been formally shown to hold. Exact kernel methods are known to scale poorly with increasing dataset size, and we believe that one of the factors limiting their usage in modern machine learning is the lack of scalable and easy to use algorithms and software. The main goal of this thesis is to study kernel methods from the point of view of efficient learning, with particular emphasis on large-scale data, but also on low-latency training, and user efficiency. We improve the state-of-the-art for scaling kernel solvers to datasets with billions of points using the Falkon algorithm, which combines random projections with fast optimization. Running it on GPUs, we show how to fully utilize available computing power for training kernel machines. To boost the ease-of-use of approximate kernel solvers, we propose an algorithm for automated hyperparameter tuning. By minimizing a penalized loss function, a model can be learned together with its hyperparameters, reducing the time needed for user-driven experimentation. In the setting of multi-class learning, we show that – under stringent but realistic assumptions on the separation between classes – a wide set of algorithms needs much fewer data points than in the more general setting (without assumptions on class separation) to reach the same accuracy. The first part of the thesis develops a framework for efficient and scalable kernel machines. This raises the question of whether our approaches can be used successfully in real-world applications, especially compared to alternatives based on deep learning which are often deemed hard to beat. The second part aims to investigate this question on two main applications, chosen because of the paramount importance of having an efficient algorithm. First, we consider the problem of instance segmentation of images taken from the iCub robot. Here Falkon is used as part of a larger pipeline, but the efficiency afforded by our solver is essential to ensure smooth human-robot interactions. In the second instance, we consider time-series forecasting of wind speed, analysing the relevance of different physical variables on the predictions themselves. We investigate different schemes to adapt i.i.d. learning to the time-series setting. Overall, this work aims to demonstrate, through novel algorithms and examples, that kernel methods are up to computationally demanding tasks, and that there are concrete applications in which their use is warranted and more efficient than that of other, more complex, and less theoretically grounded models

    Deep Learning-Based 6-DoF Object Pose Estimation With Synthetic Data: A Case Study in Underwater Environments

    Get PDF
    In this thesis we aim to address the image based 6-DoF pose estimation problem, or 3D pose estimation problem, for Autonomous Underwater Vehicles (AUVs). The results of the object pose estimation will be used, for example, to estimate the global location of the AUV or to approach more accurately the underwater infrastructures. Actually, an autonomous robot or a team of autonomous robots need accurate location skills to safely and effectively move within an underwater environment, where communications are sparse and unreliable, and to accomplish high-level tasks such as: underwater exploration, mapping of the surrounding environment, multi-robot conveyance and many other multi-robot problems. Several state-of-the-art approaches will be analysed and tested on real datasets. Collecting underwater images and providing them with an accurate ground-truth estimate of the object's pose is an expansive and extremely time-consuming activity To this end, we addressed the problem using only synthetic datasets. In fact, it was not possible to use the standard datasets used in the analyzed papers, since they are datasets with objects and conditions very different from those in which the AUVs operate. Hence, we exploited an unpaired image-to-image translation network is employed to bridge the gap between the rendered and the real images, producing photorealistic synthetic training images. Promising preliminary results confirm the goodness of the made choices.In this thesis we aim to address the image based 6-DoF pose estimation problem, or 3D pose estimation problem, for Autonomous Underwater Vehicles (AUVs). The results of the object pose estimation will be used, for example, to estimate the global location of the AUV or to approach more accurately the underwater infrastructures. Actually, an autonomous robot or a team of autonomous robots need accurate location skills to safely and effectively move within an underwater environment, where communications are sparse and unreliable, and to accomplish high-level tasks such as: underwater exploration, mapping of the surrounding environment, multi-robot conveyance and many other multi-robot problems. Several state-of-the-art approaches will be analysed and tested on real datasets. Collecting underwater images and providing them with an accurate ground-truth estimate of the object's pose is an expansive and extremely time-consuming activity To this end, we addressed the problem using only synthetic datasets. In fact, it was not possible to use the standard datasets used in the analyzed papers, since they are datasets with objects and conditions very different from those in which the AUVs operate. Hence, we exploited an unpaired image-to-image translation network is employed to bridge the gap between the rendered and the real images, producing photorealistic synthetic training images. Promising preliminary results confirm the goodness of the made choices

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    The Machine as Art/ The Machine as Artist

    Get PDF
    The articles collected in this volume from the two companion Arts Special Issues, “The Machine as Art (in the 20th Century)” and “The Machine as Artist (in the 21st Century)”, represent a unique scholarly resource: analyses by artists, scientists, and engineers, as well as art historians, covering not only the current (and astounding) rapprochement between art and technology but also the vital post-World War II period that has led up to it; this collection is also distinguished by several of the contributors being prominent individuals within their own fields, or as artists who have actually participated in the still unfolding events with which it is concerne

    Deep Neural Networks and Data for Automated Driving

    Get PDF
    This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence. Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety? This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above

    Class-incremental lifelong object learning for domestic robots

    Get PDF
    Traditionally, robots have been confined to settings where they operate in isolation and in highly controlled and structured environments to execute well-defined non-varying tasks. As a result, they usually operate without the need to perceive their surroundings or to adapt to changing stimuli. However, as robots start to move towards human-centred environments and share the physical space with people, there is an urgent need to endow them with the flexibility to learn and adapt given the changing nature of the stimuli they receive and the evolving requirements of their users. Standard machine learning is not suitable for these types of applications because it operates under the assumption that data samples are independent and identically distributed, and requires access to all the data in advance. If any of these assumptions is broken, the model fails catastrophically, i.e., either it does not learn or it forgets all that was previously learned. Therefore, different strategies are required to address this problem. The focus of this thesis is on lifelong object learning, whereby a model is able to learn from data that becomes available over time. In particular we address the problem of classincremental learning with an emphasis on algorithms that can enable interactive learning with a user. In class-incremental learning, models learn from sequential data batches where each batch can contain samples coming from ideally a single class. The emphasis on interactive learning capabilities poses additional requirements in terms of the speed with which model updates are performed as well as how the interaction is handled. The work presented in this thesis can be divided into two main lines of work. First, we propose two versions of a lifelong learning algorithm composed of a feature extractor based on pre-trained residual networks, an array of growing self-organising networks and a classifier. Self-organising networks are able to adapt their structure based on the input data distribution, and learn representative prototypes of the data. These prototypes can then be used to train a classifier. The proposed approaches are evaluated on various benchmarks under several conditions and the results show that they outperform competing approaches in each case. Second, we propose a robot architecture to address lifelong object learning through interactions with a human partner using natural language. The architecture consists of an object segmentation, tracking and preprocessing pipeline, a dialogue system, and a learning module based on the algorithm developed in the first part of the thesis. Finally, the thesis also includes an exploration into the contributions that different preprocessing operations have on performance when learning from both RGB and Depth images.James Watt Scholarshi
    corecore