39 research outputs found
Explicit Edge Inconsistency Evaluation Model for Color-Guided Depth Map Enhancement
© 2016 IEEE. Color-guided depth enhancement is used to refine depth maps according to the assumption that the depth edges and the color edges at the corresponding locations are consistent. In methods on such low-level vision tasks, the Markov random field (MRF), including its variants, is one of the major approaches that have dominated this area for several years. However, the assumption above is not always true. To tackle the problem, the state-of-the-art solutions are to adjust the weighting coefficient inside the smoothness term of the MRF model. These methods lack an explicit evaluation model to quantitatively measure the inconsistency between the depth edge map and the color edge map, so they cannot adaptively control the efforts of the guidance from the color image for depth enhancement, leading to various defects such as texture-copy artifacts and blurring depth edges. In this paper, we propose a quantitative measurement on such inconsistency and explicitly embed it into the smoothness term. The proposed method demonstrates promising experimental results compared with the benchmark and state-of-the-art methods on the Middlebury ToF-Mark, and NYU data sets
Privacy aware human action recognition: an exploration of temporal salience modelling and neuromorphic vision sensing
Solving the issue of privacy in the application of vision-based home monitoring has emerged as a significant demand. The state-of-the-art studies contain advanced privacy protection by filtering/covering the most sensitive content, which is the identity in this scenario. However, going beyond privacy remains a challenge for the machine to explore the obfuscated data, i.e., utility.
Thanks for the usefulness of exploring the human visual system to solve the problem of visual data. Nowadays, a high level of visual abstraction can be obtained from the visual scene by constructing saliency maps that highlight the most useful content in the scene and attenuate others. One way of maintaining privacy with keeping useful information about the action is
by discovering the most significant region and removing the redundancy. Another solution to address the privacy is motivated by the new visual sensor technology, i.e., neuromorphic vision sensor. In this thesis, we first introduce a novel method for vision-based privacy preservation. Particularly, we propose a new temporal salience-based anonymisation method to preserve privacy
with maintaining the usefulness of the anonymity domain-based data. This anonymisation method has achieved a high level of privacy compared to the current work. The second contribution involves the development of a new descriptor for human action recognition (HAR) based on exploring the anonymity domain of the temporal salience method. The proposed descriptor tests
the utility of the anonymised data without referring to RGB intensities of the original data. The extracted features using our proposed descriptor have shown an improvement with accuracies of the human actions, outperforming the existing methods. The proposed method has shown improvements by 3.04%, 3.14%, 0.83%, 3.67%, and 16.71% for DHA, KTH, UIUC1, UCF sports, and HMDB51 datasets, respectively, compared to state-of-the-art methods. The third contribution focuses on proposing a new method to deal with the new neuromorphic vision domain, which has come up to the application, since the issue of privacy has been already solved by the sensor itself. The output of this new domain is exploited by further exploring the local and global details of the log intensity changes. The empirical evaluation shows that exploring the neuromorphic domain provides useful details that have demonstrated increasing accuracy rates for E-KTH, E-UCF11 and E-HMDB5 by 0.54%, 19.42% and 25.61%, respectively
Development of in-field data acquisition systems and machine learning-based data processing and analysis approaches for turfgrass quality rating and peanut flower detection
Digital image processing and machine vision techniques provide scientists with an objective measure of crop quality that adds to the validity of study results without burdening the evaluation process. This dissertation aimed to develop in-field data acquisition systems and supervised machine learning-based data processing and analysis approaches for turfgrass quality classification and peanut flower detection. The new 3D Scanner App for Apple iPhone 12 Pro's camera with a LiDAR sensor provided high resolution of rendered turfgrass images. The battery life lasted for the entire time of data acquisition for an experimental field (49 m × 15 m size) that had 252 warm-season turfgrass plots. The utilized smartphone as an image acquisition tool at the same time achieved a similar outcome to the traditional image acquisition methods described in other studies. Experiments were carried out on turfgrass quality classification grouped into two classes (“Poor”, “Acceptable”) and four classes (“Very poor,” “Poor,” “Acceptable,” “High”) using supervised machine learning techniques. Gray-level Co-occurrence Matrix (GLCM) feature extractor with Random Forest classifier achieved the highest accuracy rate (81%) for the testing dataset for two classes. For four classes, Gabor filter was the best feature extractor and performed the best with Support Vector Machine (SVM) and XGBoost classifiers achieving 82% accuracy rates. The presented method will further assist researchers to develop a smartphone application for turfgrass quality rating. The study also applied deep learning-based features to feed machine learning classifiers. ResNet-101 deep feature extractor with SVM classifier achieved accuracy rate of 91% for two classes. ResNet-152 deep feature extractor with the SVM classifier achieved 86% accuracy rate for four classes. YOLOX-L and YOLOX-X models were compared with different data augmentation configurations to find the best YOLOX object detector for peanut flower detection. Peanut flowers were detected from images collected from a research field. YOLOX-X with weak data augmentation configurations achieved the highest mean average precision result at the Intersection over Union threshold of 50%. The presented method will further assist researchers in developing a counting method on flowers in images. The presented detection technique with required minor modifications can be implemented for other crops or flowers
Rich probabilistic models for semantic labeling
Das Ziel dieser Monographie ist es die Methoden und Anwendungen des semantischen Labelings zu erforschen. Unsere Beiträge zu diesem sich rasch entwickelten Thema sind bestimmte Aspekte der Modellierung und der Inferenz in probabilistischen Modellen und ihre Anwendungen in den interdisziplinären Bereichen der Computer Vision sowie medizinischer Bildverarbeitung und Fernerkundung
State of the art of audio- and video based solutions for AAL
Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach.
This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users.
The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted.
The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio
Contributions to region-based image and video analysis: feature aggregation, background subtraction and description constraining
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 22-01-2016Esta tesis tiene embargado el acceso al texto completo hasta el 22-07-2017The use of regions for image and video analysis has been traditionally motivated by their ability
to diminish the number of processed units and hence, the number of required decisions. However,
as we explore in this thesis, this is just one of the potential advantages that regions may
provide. When dealing with regions, two description spaces may be differentiated: the decision
space, on which regions are shaped—region segmentation—, and the feature space, on which
regions are used for analysis—region-based applications—. These two spaces are highly related.
The solutions taken on the decision space severely affect their performance in the feature space.
Accordingly, in this thesis we propose contributions on both spaces. Regarding the contributions
to region segmentation, these are two-fold. Firstly, we give a twist to a classical region segmentation
technique, the Mean-Shift, by exploring new solutions to automatically set the spectral
kernel bandwidth. Secondly, we propose a method to describe the micro-texture of a pixel
neighbourhood by using an easily customisable filter-bank methodology—which is based on the
discrete cosine transform (DCT)—. The rest of the thesis is devoted to describe region-based
approaches to several highly topical issues in computer vision; two broad tasks are explored:
background subtraction (BS) and local descriptors (LD). Concerning BS, regions are here used
as complementary cues to refine pixel-based BS algorithms: by providing robust to illumination
cues and by storing the background dynamics in a region-driven background modelling. Relating
to LD, the region is here used to reshape the description area usually fixed for local descriptors.
Region-masked versions of classical two-dimensional and three-dimensional local descriptions are
designed. So-built descriptions are proposed for the task of object identification, under a novel
neural-oriented strategy. Furthermore, a local description scheme based on a fuzzy use of the
region membership is derived. This characterisation scheme has been geometrically adapted to
account for projective deformations, providing a suitable tool for finding corresponding points
in wide-baseline scenarios. Experiments have been conducted for every contribution, discussing
the potential benefits and the limitations of the proposed schemes. In overall, obtained results
suggest that the region—conditioned by successful aggregation processes—is a reliable and
useful tool to extrapolate pixel-level results, diminish semantic noise, isolate significant object
cues and constrain local descriptions. The methods and approaches described along this thesis
present alternative or complementary solutions to pixel-based image processing.El uso de regiones para el análisis de imágenes y secuencias de video ha estado tradicionalmente
motivado por su utilidad para disminuir el número de unidades de análisis y, por ende, el número
de decisiones. En esta tesis evidenciamos que esta es sólo una de las muchas ventajas adheridas
a la utilización de regiones. En el procesamiento por regiones deben distinguirse dos espacios de
análisis: el espacio de decisión, en donde se construyen las regiones, y el espacio de características,
donde se utilizan. Ambos espacios están altamente relacionados. Las soluciones diseñadas para
la construcción de regiones en el espacio de decisión definen su utilidad en el espacio de análisis.
Por este motivo, a lo largo de esta tesis estudiamos ambos espacios. En particular, proponemos
dos contribuciones en la etapa de construcción de regiones. En la primera, revisitamos una
técnica clásica, Mean-Shift, e introducimos un esquema para la selección automática del ancho
de banda que permite estimar localmente la densidad de una determinada característica. En
la segunda, utilizamos la transformada discreta del coseno para describir la variabilidad local
en el entorno de un píxel. En el resto de la tesis exploramos soluciones en el espacio de características,
en otras palabras, proponemos aplicaciones que se apoyan en la región para realizar
el procesamiento. Dichas aplicaciones se centran en dos ramas candentes en el ámbito de la
visión por computador: la segregación del frente por substracción del fondo y la descripción
local de los puntos de una imagen. En la rama substracción de fondo, utilizamos las regiones
como unidades de apoyo a los algoritmos basados exclusivamente en el análisis a nivel de píxel.
En particular, mejoramos la robustez de estos algoritmos a los cambios locales de iluminación y
al dinamismo del fondo. Para esta última técnica definimos un modelo de fondo completamente
basado en regiones. Las contribuciones asociadas a la rama de descripción local están centradas
en el uso de la región para definir, automáticamente, entornos de descripción alrededor
de los puntos. En las aproximaciones existentes, estos entornos de descripción suelen ser de
tamaño y forma fija. Como resultado de este procedimiento se establece el diseño de versiones
enmascaradas de descriptores bidimensionales y tridimensionales. En el algoritmo desarrollado,
organizamos los descriptores así diseñados en una estructura neuronal y los utilizamos para la
identificación automática de objetos. Por otro lado, proponemos un esquema de descripción
mediante asociación difusa de píxeles a regiones. Este entorno de descripción es transformado
geométricamente para adaptarse a potenciales deformaciones proyectivas en entornos estéreo donde las cámaras están ampliamente separadas. Cada una de las aproximaciones desarrolladas
se evalúa y discute, remarcando las ventajas e inconvenientes asociadas a su utilización. En
general, los resultados obtenidos sugieren que la región, asumiendo que ha sido construida de
manera exitosa, es una herramienta fiable y de utilidad para: extrapolar resultados a nivel de
pixel, reducir el ruido semántico, aislar las características significativas de los objetos y restringir
la descripción local de estas características. Los métodos y enfoques descritos a lo largo de esta
tesis establecen soluciones alternativas o complementarias al análisis a nivel de píxelIt was partially supported by the Spanish Government trough
its FPU grant program and the projects (TEC2007-65400 - SemanticVideo), (TEC2011-25995 Event
Video) and (TEC2014-53176-R HAVideo); the European Commission (IST-FP6-027685 - Mesh); the
Comunidad de Madrid (S-0505/TIC-0223 - ProMultiDis-CM) and the Spanish Administration Agency
CENIT 2007-1007 (VISION)
Single View 3D Reconstruction using Deep Learning
One of the major challenges in the field of Computer Vision has been the reconstruction of a 3D object or scene from a single 2D image. While there are many notable examples, traditional methods for single view reconstruction often fail to generalise due to the presence of many brittle hand-crafted engineering solutions, limiting their applicability to real world problems. Recently, deep learning has taken over the field of Computer Vision and ”learning to reconstruct” has become the dominant technique for addressing the limitations of traditional methods when performing single view 3D reconstruction. Deep learning allows our reconstruction methods to learn generalisable image features and monocular cues that would otherwise be difficult to engineer through ad-hoc hand-crafted approaches. However, it can often be difficult to efficiently integrate the various 3D shape representations within the deep learning framework. In particular, 3D volumetric representations can be adapted to work with Convolutional Neural Networks, but they are computationally expensive and memory inefficient when using local convolutional layers. Also, the successful learning of generalisable feature representations for 3D reconstruction requires large amounts of diverse training data. In practice, this is challenging for 3D training data, as it entails a costly and time consuming manual data collection and annotation process. Researchers have attempted to address these issues by utilising self-supervised learning and generative modelling techniques, however these approaches often produce suboptimal results when compared with models trained on larger datasets. This thesis addresses several key challenges incurred when using deep learning for ”learning to reconstruct” 3D shapes from single view images. We observe that it is possible to learn a compressed representation for multiple categories of the 3D ShapeNet dataset, improving the computational and memory efficiency when working with 3D volumetric representations. To address the challenge of data acquisition, we leverage deep generative models to ”hallucinate” hidden or latent novel viewpoints for a given input image. Combining these images with depths estimated by a self-supervised depth estimator and the known camera properties, allowed us to reconstruct textured 3D point clouds without any ground truth 3D training data. Furthermore, we show that is is possible to improve upon the previous self-supervised monocular depth estimator by adding a self-attention and a discrete volumetric representation, significantly improving accuracy on the KITTI 2015 dataset and enabling the estimation of uncertainty depth predictions.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202
Sensing and Signal Processing in Smart Healthcare
In the last decade, we have witnessed the rapid development of electronic technologies that are transforming our daily lives. Such technologies are often integrated with various sensors that facilitate the collection of human motion and physiological data and are equipped with wireless communication modules such as Bluetooth, radio frequency identification, and near-field communication. In smart healthcare applications, designing ergonomic and intuitive human–computer interfaces is crucial because a system that is not easy to use will create a huge obstacle to adoption and may significantly reduce the efficacy of the solution. Signal and data processing is another important consideration in smart healthcare applications because it must ensure high accuracy with a high level of confidence in order for the applications to be useful for clinicians in making diagnosis and treatment decisions. This Special Issue is a collection of 10 articles selected from a total of 26 contributions. These contributions span the areas of signal processing and smart healthcare systems mostly contributed by authors from Europe, including Italy, Spain, France, Portugal, Romania, Sweden, and Netherlands. Authors from China, Korea, Taiwan, Indonesia, and Ecuador are also included