424 research outputs found
Local depth patterns for fine-grained activity recognition in depth videos
© 2016 IEEE. Fine-grained activities are human activities involving small objects and small movements. Automatic recognition of such activities can prove useful for many applications, including detailed diarization of meetings and training sessions, assistive human-computer interaction and robotics interfaces. Existing approaches to fine-grained activity recognition typically leverage the combined use of multiple sensors including cameras, RFID tags, gyroscopes and accelerometers borne by the monitored people and target objects. Although effective, the downside of these solutions is that they require minute instrumentation of the environment that is intrusive and hard to scale. To this end, this paper investigates fine-grained activity recognition in a kitchen setting by solely using a depth camera. The primary contribution of this work is an aggregated depth descriptor that effectively captures the shape of the objects and the actors. Experimental results over the challenging '50 Salads' dataset of kitchen activities show an accuracy comparable to that of a state-of-the-art approach based on multiple sensors, thereby validating a less intrusive and more practical way of monitoring fine-grained activities
Prototype-based budget maintenance for tracking in depth videos
© 2016, Springer Science+Business Media New York. The use of conventional video tracking based on color or gray-level videos often raises concerns about the privacy of the tracked targets. To alleviate this issue, this paper presents a novel tracker that operates solely from depth data. The proposed tracker is designed as an extension of the popular Struck algorithm which leverages the effective framework of structural SVM. The main contributions of our paper are: i) a dedicated depth feature based on local depth patterns, ii) a heuristic for handling view occlusions in depth frames, and iii) a technique for keeping the number of the support vectors within a given âbudgetâ so as to limit computational costs. Experimental results over the challenging Princeton Tracking Benchmark (PTB) dataset report a remarkable accuracy compared to the original Struck tracker and other state-of-the-art trackers using depth and RGB data
Joint action recognition and summarization by sub-modular inference
© 2016 IEEE. Action recognition and video summarization are two important multimedia tasks that are useful for applications such as video indexing and retrieval, video surveillance, humancomputer interaction and home intelligence. While many approaches exist in the literature for these two tasks, to date they have always been addressed separately. Instead, in this paper we move from the assumption that these two tasks should be tackled as a joint objective: on the one hand, action recognition can drive the selection of meaningful and informative summaries; on the other, recognizing actions from a summary rather than the entire video can in principle reduce noise and prove more accurate. To this aim, we propose a novel approach for joint action recognition-summarization based on the performing latent structural SVM framework, together with an efficient algorithm for inferring the action and the summary based on the property of sub-modularity. Experimental results on a challenging benchmark, MSR Dai-lyActivity3D, show that the approach is capable of achieving remarkable action recognition accuracy while providing appealing video summaries
The use of privacy-protected computer vision to measure the quality of healthcare worker hand hygiene
© 2018 The Author(s). Objectives: (i) To demonstrate the feasibility of automated, direct observation and collection of hand hygiene data, (ii) to develop computer visual methods capable of reporting compliance with moment 1 (the performance of hand hygiene before touching a patient) and (iii) to report the diagnostic accuracy of automated, direct observation of moment 1. Design: Observation of simulated hand hygiene encounters between a healthcare worker and a patient. Setting: Computer laboratory in a university. Participants: Healthy volunteers. Main outcome measures: Sensitivity and specificity of automatic detection of the first moment of hand hygiene. Methods: We captured video and depth images using a Kinect camera and developed computer visual methods to automatically detect the use of alcohol-based hand rub (ABHR), rubbing together of hands and subsequent contact of the patient by the healthcare worker using depth imagery. Results: We acquired images from 18 different simulated hand hygiene encounters where the healthcare worker complied with the first moment of hand hygiene, and 8 encounters where they did not. The diagnostic accuracy of determining that ABHR was dispensed and that the patient was touched was excellent (sensitivity 100%, specificity 100%). The diagnostic accuracy of determining that the hands were rubbed together after dispensing ABHR was good (sensitivity 83%, specificity 88%). Conclusions: We have demonstrated that it is possible to automate the direct observation of hand hygiene performance in a simulated clinical setting. We used cheap, widely available consumer technology and depth imagery which potentially increases clinical application and decreases privacy concerns
The Relationships between Cognitive Styles and Creativity: The Role of Field Dependence-Independence on Visual Creative Production
Previous studies explored the relationships between field dependent-independent cognitive
style (FDI) and creativity, providing misleading and unclear results. The present research explored
this problematic interplay through the lens of the Geneplore model, employing a product oriented
task: the Visual Creative Synthesis Task (VCST). The latter requires creating objects belonging
to pre-established categories, starting from triads of visual components and consists of two
steps: the preinventive phase and the inventive phase. Following the Amabileâs consensual assessment
technique, three independent judges evaluated preinventive structures in terms of originality
and synthesis whereas inventions were evaluated in terms of originality and appropriateness. The
Embedded Figure Test (EFT) was employed in order to measure the individualâs predisposition
toward the field dependence or the field independence. Sixty undergraduate college students (31
females) took part in the experiment. Results revealed that field independent individuals outperformed
field dependent ones in each of the four VCST scores, showing higher levels of creativity.
Results were discussed in light of the better predisposition of field independent individuals in mental
imagery, mental manipulation of abstract objects, as well as in using their knowledge during
complex tasks that require creativity. Future research directions were also discussed
The neural correlates of orienting to walking direction in 6-month-old infants: an ERP study
The ability to detect social signals represents a first step to enter our social world. Behavioral evidence has demonstrated that 6âmonthâold infants are able to orient their attention toward the position indicated by walking direction, showing faster orienting responses toward stimuli cued by the direction of motion than toward uncued stimuli. The present study investigated the neural mechanisms underpinning this attentional priming effect by using a spatial cueing paradigm and recording EEG (Geodesic System 128 channels) from 6âmonthâold infants. Infants were presented with a central pointâlight walker followed by a single peripheral target. The target appeared randomly at a position either congruent or incongruent with the walking direction of the cue. We examined infants' targetâlocked eventârelated potential (ERP) responses and we used cortical source analysis to explore which brain regions gave rise to the ERP responses. The P1 component and saccade latencies toward the peripheral target were modulated by the congruency between the walking direction of the cue and the position of the target. Infants' saccade latencies were faster in response to targets appearing at congruent spatial locations. The P1 component was larger in response to congruent than to incongruent targets and a similar congruency effect was found with cortical source analysis in the parahippocampal gyrus and the anterior fusiform gyrus. Overall, these findings suggest that a type of biological motion like the one of a vertebrate walking on the legs can trigger covert orienting of attention in 6âmonthâold infants, enabling enhancement of neural activity related to visual processing of potentially relevant information as well as a facilitation of oculomotor responses to stimuli appearing at the attended location
Well-MÂłN: A Maximum-Margin Approach to Unsupervised Structured Prediction
Unsupervised structured prediction is of fundamental importance for the clustering and classification of unannotated structured data. To date, its most common approach still relies on the use of structural probabilistic models and the expectation-maximization (EM) algorithm. Conversely, structural maximum-margin approaches, despite their extensive success in supervised and semi-supervised classification, have not raised equivalent attention in the unsupervised case. For this reason, in this paper we propose a novel approach that extends the maximum-margin Markov networks (M3N) to an unsupervised training framework. The main contributions of our extension are new formulations for the feature map and loss function of M3N that decouple the labels from the measurements and support multiple ground-truth training. Experiments on two challenging segmentation datasets have achieved competitive accuracy and generalization compared to other unsupervised algorithms such as k-means, EM and unsupervised structural SVM, and comparable performance to a contemporary deep learning-based approach
Improving Adversarial Text Generation with n-Gram Matching
In the past few years, generative adversarial networks (GANs) have become increasingly important in natural language generation. However, their performance seems to still have a significant margin for improvement. For this reason, in this paper we propose a new adversarial training method that tackles some of the limitations of GAN training in unconditioned generation tasks. In addition to the commonly used reward signal from the discriminator, our approach leverages another reward signal which is based on the occurrence of n-gram matches between the generated sentences and the training corpus. Thanks to the inherent correlation of this reward signal with the commonly used evaluation metrics such as BLEU, our approach implicitly bridges the gap between the objectives used during training and inference. To circumvent the non-differentiability issues associated with a discrete objective, our approach leverages the reinforcement learning policy gradient theorem. Our experimental results show that the model trained with mixed rewards from both n-gram matching and the discriminator has been able to outperform other GAN-based models in terms of BLEU score and quality-diversity trade-off at a parity of computational budget
A simulated annealing-based maximum-margin clustering algorithm
© 2018 Wiley Periodicals, Inc. Maximum-margin clustering is an extension of the support vector machine (SVM) to clustering. It partitions a set of unlabeled data into multiple groups by finding hyperplanes with the largest margins. Although existing algorithms have shown promising results, there is no guarantee of convergence of these algorithms to global solutions due to the nonconvexity of the optimization problem. In this paper, we propose a simulated annealing-based algorithm that is able to mitigate the issue of local minima in the maximum-margin clustering problem. The novelty of our algorithm is twofold, ie, (i) it comprises a comprehensive cluster modification scheme based on simulated annealing, and (ii) it introduces a new approach based on the combination of k-means++ and SVM at each step of the annealing process. More precisely, k-means++ is initially applied to extract subsets of the data points. Then, an unsupervised SVM is applied to improve the clustering results. Experimental results on various benchmark data sets (of up to over a million points) give evidence that the proposed algorithm is more effective at solving the clustering problem than a number of popular clustering algorithms
Does spatial locative comprehension predict landmark-based navigation?
In the present study we investigated the role of spatial locative comprehension in learning and retrieving pathways when landmarks were available and when they were absent in a sample of typically developing 6- to 11-year-old children. Our results show that the more proficient children are in understanding spatial locatives the more they are able to learn pathways, retrieve them after a delay and represent them on a map when landmarks are present in the environment. These findings suggest that spatial language is crucial when individuals rely on sequences of landmarks to drive their navigation towards a given goal but that it is not involved when navigational representations based on the geometrical shape of the environment or the coding of body movements are sufficient for memorizing and recalling short pathways
- âŠ