64 research outputs found

    μ-MAR: Multiplane 3D Marker based Registration for depth-sensing cameras

    Get PDF
    Many applications including object reconstruction, robot guidance, and. scene mapping require the registration of multiple views from a scene to generate a complete geometric and appearance model of it. In real situations, transformations between views are unknown and it is necessary to apply expert inference to estimate them. In the last few years, the emergence of low-cost depth-sensing cameras has strengthened the research on this topic, motivating a plethora of new applications. Although they have enough resolution and accuracy for many applications, some situations may not be solved with general state-of-the-art registration methods due to the signal-to-noise ratio (SNR) and the resolution of the data provided. The problem of working with low SNR data, in general terms, may appear in any 3D system, then it is necessary to propose novel solutions in this aspect. In this paper, we propose a method, μ-MAR, able to both coarse and fine register sets of 3D points provided by low-cost depth-sensing cameras, despite it is not restricted to these sensors, into a common coordinate system. The method is able to overcome the noisy data problem by means of using a model-based solution of multiplane registration. Specifically, it iteratively registers 3D markers composed by multiple planes extracted from points of multiple views of the scene. As the markers and the object of interest are static in the scenario, the transformations obtained for the markers are applied to the object in order to reconstruct it. Experiments have been performed using synthetic and real data. The synthetic data allows a qualitative and quantitative evaluation by means of visual inspection and Hausdorff distance respectively. The real data experiments show the performance of the proposal using data acquired by a Primesense Carmine RGB-D sensor. The method has been compared to several state-of-the-art methods. The results show the good performance of the μ-MAR to register objects with high accuracy in presence of noisy data outperforming the existing methods.This work has been supported by grant University of Alicante projects GRE11-01 and grant Valencian Government GV/2013/005

    Few-shot learning for COVID-19 chest X-ray classification with imbalanced data: an inter vs. intra domain study

    Get PDF
    Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research. However, some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images. This work studies the effect of these challenges at the intra- and inter-domain level in few-shot learning scenarios with severe data imbalance. For this, we propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance. Specifically, different initialization and data augmentation methods are analyzed, and four adaptations to Siamese networks of solutions to deal with imbalanced data are introduced, including data balancing and weighted loss, both separately and combined, and with a different balance of pairing ratios. Moreover, we also assess the inference process considering four classifiers, namely Histogram, kNN, SVM, and Random Forest. Evaluation is performed on three chest X-ray datasets with annotated cases of both positive and negative COVID-19 diagnoses. The accuracy of each technique proposed for the Siamese architecture is analyzed separately. The results are compared to those obtained using equivalent methods on a state-of-the-art CNN, achieving an average F1 improvement of up to 3.6%, and up to 5.6% of F1 for intra-domain cases. We conclude that the introduced techniques offer promising improvements over the baseline in almost all cases and that the technique selection may vary depending on the amount of data available and the level of imbalance.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature

    A Novel Prediction Method for Early Recognition of Global Human Behaviour in Image Sequences

    Get PDF
    Human behaviour recognition has been, and still remains, a challenging problem that involves different areas of computational intelligence. The automated understanding of people activities from video sequences is an open research topic in which the computer vision and pattern recognition areas have made big efforts. In this paper, the problem is studied from a prediction point of view. We propose a novel method able to early detect behaviour using a small portion of the input, in addition to the capabilities of it to predict behaviour from new inputs. Specifically, we propose a predictive method based on a simple representation of trajectories of a person in the scene which allows a high level understanding of the global human behaviour. The representation of the trajectory is used as a descriptor of the activity of the individual. The descriptors are used as a cue of a classification stage for pattern recognition purposes. Classifiers are trained using the trajectory representation of the complete sequence. However, partial sequences are processed to evaluate the early prediction capabilities having a specific observation time of the scene. The experiments have been carried out using the three different dataset of the CAVIAR database taken into account the behaviour of an individual. Additionally, different classic classifiers have been used for experimentation in order to evaluate the robustness of the proposal. Results confirm the high accuracy of the proposal on the early recognition of people behaviours.This work was supported in part by the University of Alicante, Valencian Government and Spanish government under grants GRE11-01, GV/2013/005 and DPI2013-40534-R

    Three-dimensional planar model estimation using multi-constraint knowledge based on k-means and RANSAC

    Get PDF
    Plane model extraction from three-dimensional point clouds is a necessary step in many different applications such as planar object reconstruction, indoor mapping and indoor localization. Different RANdom SAmple Consensus (RANSAC)-based methods have been proposed for this purpose in recent years. In this study, we propose a novel method-based on RANSAC called Multiplane Model Estimation, which can estimate multiple plane models simultaneously from a noisy point cloud using the knowledge extracted from a scene (or an object) in order to reconstruct it accurately. This method comprises two steps: first, it clusters the data into planar faces that preserve some constraints defined by knowledge related to the object (e.g., the angles between faces); and second, the models of the planes are estimated based on these data using a novel multi-constraint RANSAC. We performed experiments in the clustering and RANSAC stages, which showed that the proposed method performed better than state-of-the-art methods

    A Compilation of Methods and Datasets for Group and Crowd Action Recognition

    Get PDF
    The human behaviour analysis has been a subject of study in various fields of science (e.g. sociology, psychology, computer science). Specifically, the automated understanding of the behaviour of both individuals and groups remains a very challenging problem from the sensor systems to artificial intelligence techniques. Being aware of the extent of the topic, the objective of this paper is to review the state of the art focusing on machine learning techniques and computer vision as sensor system to the artificial intelligence techniques. Moreover, a lack of review comparing the level of abstraction in terms of activities duration is found in the literature. In this paper, a review of the methods and techniques based on machine learning to classify group behaviour in sequence of images is presented. The review takes into account the different levels of understanding and the number of people in the group

    Best Viewpoint Tracking for Camera Mounted on Robotic Arm with Dynamic Obstacles

    Get PDF
    The problem of finding a next best viewpoint for 3D modeling or scene mapping has been explored in computer vision over the last decade. This paper tackles a similar problem, but with different characteristics. It proposes a method for dynamic next best viewpoint recovery of a target point while avoiding possible occlusions. Since the environment can change, the method has to iteratively find the next best view with a global understanding of the free and occupied parts. We model the problem as a set of possible viewpoints which correspond to the centers of the facets of a virtual tessellated hemisphere covering the scene. Taking into account occlusions, distances between current and future viewpoints, quality of the viewpoint and joint constraints (robot arm joint distances or limits), we evaluate the next best viewpoint. The proposal has been evaluated on 8 different scenarios with different occlusions and a short 3D video sequence to validate its dynamic performance.Comment: 10 pages, 6 figures, poster in 3DV conferenc

    Architecture for automatic recognition of group activities using local motions and context

    Get PDF
    Currently, the ability to automatically detect human behavior in image sequences is one of the most important challenges in the area of computer vision. Within this broad field of knowledge, the recognition of activities of people groups in public areas is receiving special attention due to its importance in many aspects including safety and security. This paper proposes a generic computer vision architecture with the ability to learn and recognize different group activities using mainly the local group’s movements. Specifically, a multi-stream deep learning architecture is proposed whose two main streams correspond to a representation based on a descriptor capable of representing the trajectory information of a sequence of images as a collection of local movements that occur in specific regions of the scene. Additional information (e.g. location, time, etc.) to strengthen the classification of activities by including it as additional streams. The proposed architecture is capable of classifying in a robust way different activities of a group as well to deal with the one-class problems. Moreover, the use of a simple descriptor that transforms a sequence of color images into a sequence of two-image streams can reduce the curse of dimensionality using a deep learning approach. The generic deep learning architecture has been evaluated with different datasets outperforming the state-of-the-art approaches providing an efficient architecture for single and multi-class classification problems
    • …
    corecore