90 research outputs found

    Active recognition and pose estimation of rigid and deformable objects in 3D space

    Get PDF
    Object recognition and pose estimation is a fundamental problem in computer vision and of utmost importance in robotic applications. Object recognition refers to the problem of recognizing certain object instances, or categorizing objects into specific classes. Pose estimation deals with estimating the exact position of the object in 3D space, usually expressed in Euler angles. There are generally two types of objects that require special care when designing solutions to the aforementioned problems: rigid and deformable. Dealing with deformable objects has been a much harder problem, and usually solutions that apply to rigid objects, fail when used for deformable objects due to the inherent assumptions made during the design. In this thesis we deal with object categorization, instance recognition and pose estimation of both rigid and deformable objects. In particular, we are interested in a special type of deformable objects, clothes. We tackle the problem of autonomously recognizing and unfolding articles of clothing using a dual manipulator. This problem consists of grasping an article from a random point, recognizing it and then bringing it into an unfolded state by a dual arm robot. We propose a data-driven method for clothes recognition from depth images using Random Decision Forests. We also propose a method for unfolding an article of clothing after estimating and grasping two key-points, using Hough Forests. Both methods are implemented into a POMDP framework allowing the robot to interact optimally with the garments, taking into account uncertainty in the recognition and point estimation process. This active recognition and unfolding makes our system very robust to noisy observations. Our methods were tested on regular-sized clothes using a dual-arm manipulator. Our systems perform better in both accuracy and speed compared to state-of-the-art approaches. In order to take advantage of the robotic manipulator and increase the accuracy of our system, we developed a novel approach to address generic active vision problems, called Active Random Forests. While state of the art focuses on best viewing parameters selection based on single view classifiers, we propose a multi-view classifier where the decision mechanism of optimally changing viewing parameters is inherent to the classification process. This has many advantages: a) the classifier exploits the entire set of captured images and does not simply aggregate probabilistically per view hypotheses; b) actions are based on learnt disambiguating features from all views and are optimally selected using the powerful voting scheme of Random Forests and c) the classifier can take into account the costs of actions. The proposed framework was applied to the same task of autonomously unfolding clothes by a robot, addressing the problem of best viewpoint selection in classification, grasp point and pose estimation of garments. We show great performance improvement compared to state of the art methods and our previous POMDP formulation. Moving from deformable to rigid objects while keeping our interest to domestic robotic applications, we focus on object instance recognition and 3D pose estimation of household objects. We are particularly interested in realistic scenes that are very crowded and objects can be perceived under severe occlusions. Single shot-based 6D pose estimators with manually designed features are still unable to tackle such difficult scenarios for a variety of objects, motivating the research towards unsupervised feature learning and next-best-view estimation. We present a complete framework for both single shot-based 6D object pose estimation and next-best-view prediction based on Hough Forests, the state of the art object pose estimator that performs classification and regression jointly. Rather than using manually designed features we propose an unsupervised feature learnt from depth-invariant patches using a Sparse Autoencoder. Furthermore, taking advantage of the clustering performed in the leaf nodes of Hough Forests, we learn to estimate the reduction of uncertainty in other views, formulating the problem of selecting the next-best-view. To further improve 6D object pose estimation, we propose an improved joint registration and hypotheses verification module as a final refinement step to reject false detections. We provide two additional challenging datasets inspired from realistic scenarios to extensively evaluate the state of the art and our framework. One is related to domestic environments and the other depicts a bin-picking scenario mostly found in industrial settings. We show that our framework significantly outperforms state of the art both on public and on our datasets. Unsupervised feature learning, although efficient, might produce sub-optimal features for our particular tast. Therefore in our last work, we leverage the power of Convolutional Neural Networks to tackled the problem of estimating the pose of rigid objects by an end-to-end deep regression network. To improve the moderate performance of the standard regression objective function, we introduce the Siamese Regression Network. For a given image pair, we enforce a similarity measure between the representation of the sample images in the feature and pose space respectively, that is shown to boost regression performance. Furthermore, we argue that our pose-guided feature learning using our Siamese Regression Network generates more discriminative features that outperform the state of the art. Last, our feature learning formulation provides the ability of learning features that can perform under severe occlusions, demonstrating high performance on our novel hand-object dataset. Concluding, this work is a research on the area of object detection and pose estimation in 3D space, on a variety of object types. Furthermore we investigate how accuracy can be further improved by applying active vision techniques to optimally move the camera view to minimize the detection error.Open Acces

    New Approach of Indoor and Outdoor Localization Systems

    Get PDF
    Accurate determination of the mobile position constitutes the basis of many new applications. This book provides a detailed account of wireless systems for positioning, signal processing, radio localization techniques (Time Difference Of Arrival), performances evaluation, and localization applications. The first section is dedicated to Satellite systems for positioning like GPS, GNSS. The second section addresses the localization applications using the wireless sensor networks. Some techniques are introduced for localization systems, especially for indoor positioning, such as Ultra Wide Band (UWB), WIFI. The last section is dedicated to Coupled GPS and other sensors. Some results of simulations, implementation and tests are given to help readers grasp the presented techniques. This is an ideal book for students, PhD students, academics and engineers in the field of Communication, localization & Signal Processing, especially in indoor and outdoor localization domains

    Map-based localization for urban service mobile robotics

    Get PDF
    Mobile robotics research is currently interested on exporting autonomous navigation results achieved in indoor environments, to more challenging environments, such as, for instance, urban pedestrian areas. Developing mobile robots with autonomous navigation capabilities in such urban environments supposes a basic requirement for a upperlevel service set that could be provided to an users community. However, exporting indoor techniques to outdoor urban pedestrian scenarios is not evident due to the larger size of the environment, the dynamism of the scene due to pedestrians and other moving obstacles, the sunlight conditions, and the high presence of three dimensional elements such as ramps, steps, curbs or holes. Moreover, GPS-based mobile robot localization has demonstrated insufficient performance for robust long-term navigation in urban environments. One of the key modules within autonomous navigation is localization. If localization supposes an a priori map, even if it is not a complete model of the environment, localization is called map-based. This assumption is realistic since current trends of city councils are on building precise maps of their cities, specially of the most interesting places such as city downtowns. Having robots localized within a map allows for a high-level planning and monitoring, so that robots can achieve goal points expressed on the map, by following in a deliberative way a previously planned route. This thesis deals with the mobile robot map-based localization issue in urban pedestrian areas. The thesis approach uses the particle filter algorithm, a well-known and widely used probabilistic and recursive method for data fusion and state estimation. The main contributions of the thesis are divided on four aspects: (1) long-term experiments of mobile robot 2D and 3D position tracking in real urban pedestrian scenarios within a full autonomous navigation framework, (2) developing a fast and accurate technique to compute on-line range observation models in 3D environments, a basic step required by the real-time performance of the developed particle filter, (3) formulation of a particle filter that integrates asynchronous data streams and (4) a theoretical proposal to solve the global localization problem in an active and cooperative way, defining cooperation as either information sharing among the robots or planning joint actions to solve a common goal.Actualment, la recerca en robòtica mòbil té un interés creixent en exportar els resultats de navegació autònoma aconseguits en entorns interiors cap a d'altres tipus d'entorns més exigents, com, per exemple, les àrees urbanes peatonals. Desenvolupar capacitats de navegació autònoma en aquests entorns urbans és un requisit bàsic per poder proporcionar un conjunt de serveis de més alt nivell a una comunitat d'usuaris. Malgrat tot, exportar les tècniques d'interiors cap a entorns exteriors peatonals no és evident, a causa de la major dimensió de l'entorn, del dinamisme de l'escena provocada pels peatons i per altres obstacles en moviment, de la resposta de certs sensors a la il.luminació natural, i de la constant presència d'elements tridimensionals tals com rampes, escales, voreres o forats. D'altra banda, la localització de robots mòbils basada en GPS ha demostrat uns resultats insuficients de cara a una navegació robusta i de llarga durada en entorns urbans. Una de les peces clau en la navegació autònoma és la localització. En el cas que la localització consideri un mapa conegut a priori, encara que no sigui un model complet de l'entorn, parlem d'una localització basada en un mapa. Aquesta assumpció és realista ja que la tendència actual de les administracions locals és de construir mapes precisos de les ciutats, especialment dels llocs d'interés tals com les zones més cèntriques. El fet de tenir els robots localitzats en un mapa permet una planificació i una monitorització d'alt nivell, i així els robots poden arribar a destinacions indicades sobre el mapa, tot seguint de forma deliberativa una ruta prèviament planificada. Aquesta tesi tracta el tema de la localització de robots mòbils, basada en un mapa i per entorns urbans peatonals. La proposta de la tesi utilitza el filtre de partícules, un mètode probabilístic i recursiu, ben conegut i àmpliament utilitzat per la fusió de dades i l'estimació d'estats. Les principals contribucions de la tesi queden dividides en quatre aspectes: (1) experimentació de llarga durada del seguiment de la posició, tant en 2D com en 3D, d'un robot mòbil en entorns urbans reals, en el context de la navegació autònoma, (2) desenvolupament d'una tècnica ràpida i precisa per calcular en temps d'execució els models d'observació de distàncies en entorns 3D, un requisit bàsic pel rendiment del filtre de partícules a temps real, (3) formulació d'un filtre de partícules que integra conjunts de dades asíncrones i (4) proposta teòrica per solucionar la localització global d'una manera activa i cooperativa, entenent la cooperació com el fet de compartir informació, o bé com el de planificar accions conjuntes per solucionar un objectiu comú

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    3D Recording and Interpretation for Maritime Archaeology

    Get PDF
    This open access peer-reviewed volume was inspired by the UNESCO UNITWIN Network for Underwater Archaeology International Workshop held at Flinders University, Adelaide, Australia in November 2016. Content is based on, but not limited to, the work presented at the workshop which was dedicated to 3D recording and interpretation for maritime archaeology. The volume consists of contributions from leading international experts as well as up-and-coming early career researchers from around the globe. The content of the book includes recording and analysis of maritime archaeology through emerging technologies, including both practical and theoretical contributions. Topics include photogrammetric recording, laser scanning, marine geophysical 3D survey techniques, virtual reality, 3D modelling and reconstruction, data integration and Geographic Information Systems. The principal incentive for this publication is the ongoing rapid shift in the methodologies of maritime archaeology within recent years and a marked increase in the use of 3D and digital approaches. This convergence of digital technologies such as underwater photography and photogrammetry, 3D sonar, 3D virtual reality, and 3D printing has highlighted a pressing need for these new methodologies to be considered together, both in terms of defining the state-of-the-art and for consideration of future directions. As a scholarly publication, the audience for the book includes students and researchers, as well as professionals working in various aspects of archaeology, heritage management, education, museums, and public policy. It will be of special interest to those working in the field of coastal cultural resource management and underwater archaeology but will also be of broader interest to anyone interested in archaeology and to those in other disciplines who are now engaging with 3D recording and visualization

    Enhanced quality reconstruction of erroneous video streams using packet filtering based on non-desynchronizing bits and UDP checksum-filtered list decoding

    Get PDF
    The latest video coding standards, such as H.264 and H.265, are extremely vulnerable in error-prone networks. Due to their sophisticated spatial and temporal prediction tools, the effect of an error is not limited to the erroneous area but it can easily propagate spatially to the neighboring blocks and temporally to the following frames. Thus, reconstructed video packets at the decoder side may exhibit significant visual quality degradation. Error concealment and error corrections are two mechanisms that have been developed to improve the quality of reconstructed frames in the presence of errors. In most existing error concealment approaches, the corrupted packets are ignored and only the correctly received information of the surrounding areas (spatially and/or temporally) is used to recover the erroneous area. This is due to the fact that there is no perfect error detection mechanism to identify correctly received blocks within a corrupted packet, and moreover because of the desynchronization problem caused by the transmission errors on the variable-length code (VLC). But, as many studies have shown, the corrupted packets may contain valuable information that can be used to reconstruct adequately of the lost area (e.g. when the error is located at the end of a slice). On the other hand, error correction approaches, such as list decoding, exploit the corrupted packets to generate several candidate transmitted packets from the corrupted received packet. They then select, among these candidates, the one with the highest likelihood of being the transmitted packet based on the available soft information (e.g. log-likelihood ratio (LLR) of each bit). However, list decoding approaches suffer from a large solution space of candidate transmitted packets. This is worsened when the soft information is not available at the application layer; a more realistic scenario in practice. Indeed, since it is unknown which bits have higher probabilities of having been modified during transmission, the candidate received packets cannot be ranked by likelihood. In this thesis, we propose various strategies to improve the quality of reconstructed packets which have been lightly damaged during transmission (e.g. at most a single error per packet). We first propose a simple but efficient mechanism to filter damaged packets in order to retain those likely to lead to a very good reconstruction and discard the others. This method can be used as a complement to most existing concealment approaches to enhance their performance. The method is based on the novel concept of non-desynchronizing bits (NDBs) defined, in the context of an H.264 context-adaptive variable-length coding (CAVLC) coded sequence, as a bit whose inversion does not cause desynchronization at the bitstream level nor changes the number of decoded macroblocks. We establish that, on typical coded bitstreams, the NDBs constitute about a one-third (about 30%) of a bitstream, and that the effect on visual quality of flipping one of them in a packet is mostly insignificant. In most cases (90%), the quality of the reconstructed packet when modifying an individual NDB is almost the same as the intact one. We thus demonstrate that keeping, under certain conditions, a corrupted packet as a candidate for the lost area can provide better visual quality compared to the concealment approaches. We finally propose a non-desync-based decoding framework, which retains a corrupted packet, under the condition of not causing desynchronization and not altering the number of expected macroblocks. The framework can be combined with most current concealment approaches. The proposed approach is compared to the frame copy (FC) concealment of Joint Model (JM) software (JM-FC) and a state-of-the-art concealment approach using the spatiotemporal boundary matching algorithm (STBMA) mechanism, in the case of one bit in error, and on average, respectively, provides 3.5 dB and 1.42 dB gain over them. We then propose a novel list decoding approach called checksum-filtered list decoding (CFLD) which can correct a packet at the bit stream level by exploiting the receiver side user datagram protocol (UDP) checksum value. The proposed approach is able to identify the possible locations of errors by analyzing the pattern of the calculated UDP checksum on the corrupted packet. This makes it possible to considerably reduce the number of candidate transmitted packets in comparison to conventional list decoding approaches, especially when no soft information is available. When a packet composed of N bits contains a single bit in error, instead of considering N candidate packets, as is the case in conventional list decoding approaches, the proposed approach considers approximately N/32 candidate packets, leading to a 97% reduction in the number of candidates. This reduction can increase to 99.6% in the case of a two-bit error. The method’s performance is evaluated using H.264 and high efficiency video coding (HEVC) test model software. We show that, in the case H.264 coded sequence, on average, the CFLD approach is able to correct the packet 66% of the time. It also offers a 2.74 dB gain over JM-FC and 1.14 dB and 1.42 dB gains over STBMA and hard output maximum likelihood decoding (HO-MLD), respectively. Additionally, in the case of HEVC, the CFLD approach corrects the corrupted packet 91% of the time, and offers 2.35 dB and 4.97 dB gains over our implementation of FC concealment in HEVC test model software (HM-FC) in class B (1920×1080) and C (832×480) sequences, respectively

    A SEASAT report. Volume 1: Program summary

    Get PDF
    The program background and experiment objectives are summarized, and a description of the organization and interfaces of the project are provided. The mission plan and history are also included as well as user activities and a brief description of the data system. A financial and manpower summary and preliminary results of the mission are also included
    corecore