92 research outputs found

    Depth Enhancement and Surface Reconstruction with RGB/D Sequence

    Get PDF
    Surface reconstruction and 3D modeling is a challenging task, which has been explored for decades by the computer vision, computer graphics, and machine learning communities. It is fundamental to many applications such as robot navigation, animation and scene understanding, industrial control and medical diagnosis. In this dissertation, I take advantage of the consumer depth sensors for surface reconstruction. Considering its limited performance on capturing detailed surface geometry, a depth enhancement approach is proposed in the first place to recovery small and rich geometric details with captured depth and color sequence. In addition to enhancing its spatial resolution, I present a hybrid camera to improve the temporal resolution of consumer depth sensor and propose an optimization framework to capture high speed motion and generate high speed depth streams. Given the partial scans from the depth sensor, we also develop a novel fusion approach to build up complete and watertight human models with a template guided registration method. Finally, the problem of surface reconstruction for non-Lambertian objects, on which the current depth sensor fails, is addressed by exploiting multi-view images captured with a hand-held color camera and we propose a visual hull based approach to recovery the 3D model

    Intensive-care unit patients monitoring by computer vision system

    Get PDF
    Treballs Finals de Grau d'Enginyeria InformĂ tica, Facultat de MatemĂ tiques, Universitat de Barcelona, Any: 2013, Director: Santi SeguĂ­ MesquidaIn this project, we propose an automatic computer vision system for patient monitoring at the Intensive-Care Unit (ICU). These patients require constant monitoring and, due to the high costs associated to equipment and staff necessary, the design of an automatic system would be helpful. Depth imaging technology has advanced dramatically over the last few years, finally reaching a consumer price point with the launch of Kinect. These depth images are not affected by the lighting conditions and provide us a good vision, even without any light, so we can monitorize the patients 24 hours a day. In this project, we worked on two of the parts of the object detection systems: the descriptor and classifier. Concerning the descriptor, we analyzed the performance of one of the most used descriptors for object detection in RGB images, the Histogram of Oriented Gradients, and we have proposed a descriptor designed for depth images. It is shown that the combination of these two descriptors increases system accuracy. As to the detection, we have done various tests. We analyzed the detection of patient body parts separately, and we have used a model where the patient is divided into multiple parts and each part is modeled with a set of templates, demonstrating that the use of a model helps to improve detection

    3D modeling by low-cost range cameras: methods and potentialities

    Get PDF
    Nowadays the demand of 3D models for the documentation and visualization of objects and environments is continually increasing. However, the traditional 3D modeling techniques and systems (i.e. photogrammetry and laser scanners) can be very expensive and/or onerous, as they often need qualified technicians and specific post-processing phases. Thus, it is important to find new instruments, able to provide low-cost 3D data in real time and in a user-friendly way. Range cameras seem one of the most promising tools to achieve this goal: they are low-cost 3D scanners, able to easily collect dense point clouds at high frame rate, in a short range (few meters) from the imaged objects. Such sensors, though, still remain a relatively new 3D measurement technology, not yet exhaustively studied. Thus, it is essential to assess the metric quality of the depth data retrieved by these devices. This thesis is precisely included in this background: the aim is to evaluate the potentialities of range cameras for geomatic applications and to provide useful indications for their practical use. Therefore the three most popular and/or promising low-cost range cameras, namely the Microsoft Kinect v1, the Micorsoft Kinect v2 and the Occipital Structure Sensor, were firstly characterized from a geomatic point of view in order to assess the metric quality of the depth data retrieved by them. These investigations showed that such sensors present a depth precision and a depth accuracy in the range of some millimeters to few centimeters, depending both on the operational principle adopted by the single device (Structured Light or Time of Flight) and on the depth itself. On this basis, two different models were identified for precision and accuracy vs. depth: parabolic for the Structured Light (the Kinect v1 and the Structure Sensor) and linear for Time of Flight (the Kinect v2) sensors, respectively. Then the effectiveness of such accuracy models was demonstrated to be globally compliant with the found precision models for all of the three sensors. Furthermore, the proposed calibration model was validated for the Structure Sensor: with calibration, the overall RMSE, decreased from 27 to 16 mm. Finally four case studies were carried out in order to evaluate: ‱ the performances of the Kinect v2 sensor for monitoring oscillatory motions (relevant for structural and/or industrial monitoring), demonstrating a good ability of the system to detect movements and displacements; ‱ the integration feasibility of Kinect v2 with a classical stereo system, highlighting the need of an integration of range cameras into 3D classical photogrammetric systems especially to overpass limitations due to acquisition completeness; ‱ the potentialities of the Structure Sensor for the 3D surveying of indoor environments, showing a more than sufficient accuracy for most applications; ‱ the potentialities of the Structure Sensor to document archaeological small finds, where metric accuracy seems to be rather good while textured models shows some misalignments. In conclusion, although the experimental results demonstrated that range cameras have the capability to give good and encouraging results, the performances of traditional 3D modeling techniques in terms of accuracy and precision are still superior and must be preferred when the accuracy requirements are restrictive. But for a very wide and continuously increasing range of applications, when the required accuracy can be at the level from few millimeters (very close-range) to few centimeters, then range cameras can be a valuable alternative, especially when non expert users are involved. Furthermore, the technology on which these sensors are based is continually evolving, driven also by the new generation of AR/VR reality kits, and certainly also their geometric performances will soon improve

    Visual Human-Computer Interaction

    Get PDF

    New strategies for row-crop management based on cost-effective remote sensors

    Get PDF
    Agricultural technology can be an excellent antidote to resource scarcity. Its growth has led to the extensive study of spatial and temporal in-field variability. The challenge of accurate management has been addressed in recent years through the use of accurate high-cost measurement instruments by researchers. However, low rates of technological adoption by farmers motivate the development of alternative technologies based on affordable sensors, in order to improve the sustainability of agricultural biosystems. This doctoral thesis has as main objective the development and evaluation of systems based on affordable sensors, in order to address two of the main aspects affecting the producers: the need of an accurate plant water status characterization to perform a proper irrigation management and the precise weed control. To address the first objective, two data acquisition methodologies based on aerial platforms have been developed, seeking to compare the use of infrared thermometry and thermal imaging to determine the water status of two most relevant row-crops in the region, sugar beet and super high-density olive orchards. From the data obtained, the use of an airborne low-cost infrared sensor to determine the canopy temperature has been validated. Also the reliability of sugar beet canopy temperature as an indicator its of water status has been confirmed. The empirical development of the Crop Water Stress Index (CWSI) has also been carried out from aerial thermal imaging combined with infrared temperature sensors and ground measurements of factors such as water potential or stomatal conductance, validating its usefulness as an indicator of water status in super high-density olive orchards. To contribute to the development of precise weed control systems, a system for detecting tomato plants and measuring the space between them has been developed, aiming to perform intra-row treatments in a localized and precise way. To this end, low cost optical sensors have been used and compared with a commercial LiDAR laser scanner. Correct detection results close to 95% show that the implementation of these sensors can lead to promising advances in the automation of weed control. The micro-level field data collected from the evaluated affordable sensors can help farmers to target operations precisely before plant stress sets in or weeds infestation occurs, paving the path to increase the adoption of Precision Agriculture techniques

    Markerless facial motion capture: deep learning approaches on RGBD data

    Get PDF
    Facial expressions are a series of fast, complex and interconnected movement that causes an array of deformations, such as stretching, compressing and folding of the skin. Identifying expression is a natural process in human vision, but due to the diversity of faces, it has many challenges for computer vision. Research in markerless facial motion capture using single Red Green Blue (RGB) camera has gained popularity due to the wide access of the data, such as from mobile phones. The motivation behind this work is much of the existing work attempts to infer the 3-Dimensional (3D) data from 2-Dimensional (2D) images, such as in motion capture multiple 2D cameras are calibration to allow some depth prediction. Whereas, the inclusion of Red Green Blue Depth (RGBD) sensors that give ground truth depth data could gain a better understanding of the human face and how expressions are visualised. The aim of this thesis is to investigate and develop novel methods of markerless facial motion capture, where the focus is on the inclusions of RGBD data to provide 3D data. The contributions are: A tool to aid in the annotation of 3D facial landmarks; A novel neural network that demonstrate the ability of predicting 2D and 3D landmarks by merging RGBD data; Working application that demonstrates complex deep learning network on portable handheld devices; A review of existing methods of denoising fine detail in depth maps using neural networks; A network for the complete analysis of facial landmarks and expressions in 3D. The 3D annotator was developed to overcome the issues of relying on existing 3D modelling software, which made feature identification difficult. The technique of predicting 2D and 3D with auxiliary information, allowed high accuracy 3D landmarking, without the need for full model generation. Also, it outperformed other recent techniques of landmarking. The networks running on the handheld devices show as a proof of concept that even without much optimisation, a complex task can be performed in near real-time. Denoising Time of Flight (ToF) depth maps, showed much more complexity than the tradition RGB denoising, where we reviewed and applied an array of techniques to the task. The full facial analysis showed that when neural networks perform on a wide range of related task for auxiliary information allow for deep understanding of the overall task. The research for facial processing is vast, but still with many new problems and challenges to face and improve upon. While RGB cameras are used widely, we see the inclusion of high accuracy and cost-effective depth sensing device available. The new devices allow better understanding of facial features and expression. By using and merging RGB data, the area of facial landmarking, and expression intensity recognition can be improved

    Pedestrian detection for underground mine vehicles using thermal imaging

    Get PDF
    Vehicle accidents are one of the major causes of deaths in South African un- derground mines. A computer vision-based pedestrian detection and track- ing system is presented in this research that will assist locomotive drivers in operating their vehicles safer. The detection and tracking system uses a combination of thermal and three-dimensional (3D) imagery for the detec- tion and tracking of people. The developed system uses a segment-classify- track methodology which eliminates computationally expensive multi-scale classi cation. A minimum error thresholding algorithm for segmentation is shown to be e ective in a wide range of environments with temperature up to 26 C and in a 1000 m deep mine. The classi er uses a principle component analysis and support vector classi er to achieve a 95% accuracy and 97% speci city in classifying the segmented images. It is shown that each detec- tion is not independent of the previous but the probability of missing two detections in a row is 0.6%, which is considered acceptably low. The tracker uses the Kinect's structured-light 3D sensor for tracking the identi ed peo- ple. It is shown that the useful range of the Kinect is insu cient to provide timeous warning of a collision. The error in the Kinect depth, measurements increases quadratically with depth resulting in very noisy velocity estimates at longer ranges. The use of the Kinect for the tracker demonstrates the principle of the tracker but due to budgetary constraints the replacement of the Kinect with a long range sensor remains future work

    {3D} Morphable Face Models -- Past, Present and Future

    No full text
    In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications

    Vision as inverse graphics for detailed scene understanding

    Get PDF
    An image of a scene can be described by the shape, pose and appearance of the objects within it, as well as the illumination, and the camera that captured it. A fundamental goal in computer vision is to recover such descriptions from an image. Such representations can be useful for tasks such as autonomous robotic interaction with an environment, but obtaining them can be very challenging due the large variability of objects present in natural scenes. A long-standing approach in computer vision is to use generative models of images in order to infer the descriptions that generated the image. These methods are referred to as “vision as inverse graphics” or “inverse graphics”. We propose using this approach to scene understanding by making use of a generative model (GM) in the form of a graphics renderer. Since searching over scene factors to obtain the best match for an image is very inefficient, we make use of convolutional neural networks, which we refer to as the recognition models (RM), trained on synthetic data to initialize the search. First we address the effect that occlusions on objects have on the performance of predictive models of images. We propose an inverse graphics approach to predicting the shape, pose, appearance and illumination with a GM which includes an outlier model to account for occlusions. We study how the inferences are affected by the degree of occlusion of the foreground object, and show that a robust GM which includes an outlier model to account for occlusions works significantly better than a non-robust model. We then characterize the performance of the RM and the gains that can be made by refining the search using the robust GM, using a new synthetic dataset that includes background clutter and occlusions. We find that pose and shape are predicted very well by the RM, but appearance and especially illumination less so. However, accuracy on these latter two factors can be clearly improved with the generative model. Next we apply our inverse graphics approach to scenes with multiple objects. We propose using a method to efficiently and differentiably model self shadowing which improves the realism of the GM renders. We also propose a way to render object occlusion boundaries which results in more accurate gradients of the rendering function. We evaluate these improvements using a dataset with multiple objects and show that the refinement step of the GM clearly improves on the predictions of the RM for the latent variables of shape, pose, appearance and illumination. Finally we tackle the task of learning generative models of 3D objects from a collection of meshes. We present a latent variable architecture that learns to separately capture the underlying factors of shape and appearance from the meshes. To do so we first transform the meshes of a given class to a data representation that sidesteps the need for landmark correspondences across meshes when learning the GM. The ability and usefulness of learning a disentangled latent representation of objects is demonstrated via an experiment where the appearance of one object is transferred onto the shape of another

    State of the art of audio- and video based solutions for AAL

    Get PDF
    Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio
    • 

    corecore