Search CORE

337 research outputs found

Recognising Complex Activities with Histograms of Relative Tracklets

Author: Aksoy
Albanese
Behera
Bilinski
Bouguet
Brown
Chen
de la Torre
Farnebäck
Fathi
Fathi
Figo
Gupta
Hammerla
Henderson
Hoey
Hoey
Hsu
Huynh
Laptev
Laptev
Laptev
Lei
Liu
Maki
Marszałek
Matikainen
Messing
Pham
Plötz
Rao
Rhienmora
Roggen
Rohrbach
Ryoo
Ryoo
Schuldt
Sebastian Stein
Shigeta
Stein
Stein
Stephen J. McKenna
Summers-Stay
Teixeira
Tenorth
Tomasi
Wang
Wang
Wilson
Wu
Yang
Zappi
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/09/2016
Field of study

One approach to the recognition of complex human activities is to use feature descriptors that encode visual inter-actions by describing properties of local visual features with respect to trajectories of tracked objects. We explore an example of such an approach in which dense tracklets are described relative to multiple reference trajectories, providing a rich representation of complex interactions between objects of which only a subset can be tracked. Speciﬁcally, we report experiments in which reference trajectories are provided by tracking inertial sensors in a food preparation sce-nario. Additionally, we provide baseline results for HOG, HOF and MBH, and combine these features with others for multi-modal recognition. The proposed histograms of relative tracklets (RETLETS) showed better activity recognition performance than dense tracklets, HOG, HOF, MBH, or their combination. Our comparative evaluation of features from accelerometers and video highlighted a performance gap between visual and accelerometer-based motion features and showed a substantial performance gain when combining features from these sensor modalities. A considerable further performance gain was observed in combination with RETLETS and reference tracklet features

Elsevier - Publisher Connector

Crossref

Enlighten

University of Dundee Online Publications

DeepMatching: Hierarchical Deformable Dense Matching

Author: Harchaoui Zaid
Revaud Jerome
Schmid Cordelia
Weinzaepfel Philippe
Publication venue
Publication date: 08/10/2015
Field of study

We introduce a novel matching algorithm, called DeepMatching, to compute dense correspondences between images. DeepMatching relies on a hierarchical, multi-layer, correlational architecture designed for matching images and was inspired by deep convolutional approaches. The proposed matching algorithm can handle non-rigid deformations and repetitive textures and efficiently determines dense correspondences in the presence of significant changes between images. We evaluate the performance of DeepMatching, in comparison with state-of-the-art matching algorithms, on the Mikolajczyk (Mikolajczyk et al 2005), the MPI-Sintel (Butler et al 2012) and the Kitti (Geiger et al 2013) datasets. DeepMatching outperforms the state-of-the-art algorithms and shows excellent results in particular for repetitive textures.We also propose a method for estimating optical flow, called DeepFlow, by integrating DeepMatching in the large displacement optical flow (LDOF) approach of Brox and Malik (2011). Compared to existing matching algorithms, additional robustness to large displacements and complex motion is obtained thanks to our matching approach. DeepFlow obtains competitive performance on public benchmarks for optical flow estimation

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Real-Time Purchase Prediction Using Retail Video Analytics

Author: Ghose Anindya
Li Beibei
Li Rubing
Xu Kaiquan
Publication venue: AIS Electronic Library (AISeL)
Publication date: 12/12/2022
Field of study

The proliferation of video data in retail marketing brings opportunities for researchers to study customer behavior using rich video information. Our study demonstrates how to understand customer behavior of multiple dimensions using video analytics on a scalable basis. We obtained a unique video footage data collected from in-store cameras, resulting in approximately 20,000 customers involved and over 6,000 payments recorded. We extracted features on the demographics, appearance, emotion, and contextual dimensions of customer behavior from the video with state-of-the-art computer vision techniques and proposed a novel framework using machine learning and deep learning models to predict consumer purchase decision. Results showed that our framework makes accurate predictions which indicate the importance of incorporating emotional response into prediction. Our findings reveal multi-dimensional drivers of purchase decision and provide an implementable video analytics tool for marketers. It shows possibility of involving personalized recommendations that would potentially integrate our framework into omnichannel landscape

AIS Electronic Library (AISeL)

The Wits intelligent teaching system (WITS): a smart lecture theatre to assess audience engagement

Author: Klein Richard
Publication venue
Publication date: 01/01/2017
Field of study

A Thesis submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy, 2017The utility of lectures is directly related to the engagement of the students therein. To ensure the value of lectures, one needs to be certain that they are engaging to students. In small classes experienced lecturers develop an intuition of how engaged the class is as a whole and can then react appropriately to remedy the situation through various strategies such as breaks or changes in style, pace and content. As both the number of students and size of the venue grow, this type of contingent teaching becomes increasingly difﬁcult and less precise. Furthermore, relying on intuition alone gives no way to recall and analyse previous classes or to objectively investigate trends over time. To address these problems this thesis presents the WITS INTELLIGENT TEACHING SYSTEM (WITS) to highlight disengaged students during class. A web-based, mobile application called Engage was developed to try elicit anonymous engagement information directly from students. The majority of students were unwilling or unable to self-report their engagement levels during class. This stems from a number of cultural and practical issues related to social display rules, unreliable internet connections, data costs, and distractions. This result highlights the need for a non-intrusive system that does not require the active participation of students. A nonintrusive, computer vision and machine learning based approach is therefore proposed. To support the development thereof, a labelled video dataset of students was built by recording a number of ﬁrst year lectures. Students were labelled across a number of affects – including boredom, frustration, confusion, and fatigue – but poor inter-rater reliability meant that these labels could not be used as ground truth. Based on manual coding methods identiﬁed in the literature, a number of actions, gestures, and postures were identiﬁed as proxies of behavioural engagement. These proxies are then used in an observational checklist to mark students as engaged or not. A Support Vector Machine (SVM) was trained on Histograms of Oriented Gradients (HOG) to classify the students based on the identiﬁed behaviours. The results suggest a high temporal correlation of a single subject’s video frames. This leads to extremely high accuracies on seen subjects. However, this approach generalised poorly to unseen subjects and more careful feature engineering is required. The use of Convolutional Neural Networks (CNNs) improved the classiﬁcation accuracy substantially, both over a single subject and when generalising to unseen subjects. While more computationally expensive than the SVM, the CNN approach lends itself to parallelism using Graphics Processing Units (GPUs). With GPU hardware acceleration, the system is able to run in near real-time and with further optimisations a real-time classiﬁer is feasible. The classiﬁer provides engagement values, which can be displayed to the lecturer live during class. This information is displayed as an Interest Map which highlights spatial areas of disengagement. The lecturer can then make informed decisions about how to progress with the class, what teaching styles to employ, and on which students to focus. An Interest Map was presented to lecturers and professors at the University of the Witwatersrand yielding 131 responses. The vast majority of respondents indicated that they would like to receive live engagement feedback during class, that they found the Interest Map an intuitive visualisation tool, and that they would be interested in using such technology. Contributions of this thesis include the development of a labelled video dataset; the development of a web based system to allow students to self-report engagement; the development of cross-platform, open-source software for spatial, action and affect labelling; the application of Histogram of Oriented Gradient based Support Vector Machines, and Deep Convolutional Neural Networks to classify this data; the development of an Interest Map to intuitively display engagement information to presenters; and ﬁnally an analysis of acceptance of such a system by educators.XL201

Wits Institutional Repository on DSPACE

Development of artificial neural network-based object detection algorithms for low-cost hardware devices

Author: De Benito Picazo José Jesús
Publication venue: UMA Editorial
Publication date: 21/07/2021
Field of study

Finally, the fourth work was published in the “WCCI” conference in 2020 and consisted of an individuals' position estimation algorithm based on a novel neural network model for environments with forbidden regions, named “Forbidden Regions Growing Neural Gas”.The human brain is the most complex, powerful and versatile learning machine ever known. Consequently, many scientists of various disciplines are fascinated by its structures and information processing methods. Due to the quality and quantity of the information extracted from the sense of sight, image is one of the main information channels used by humans. However, the massive amount of video footage generated nowadays makes it difficult to process those data fast enough manually. Thus, computer vision systems represent a fundamental tool in the extraction of information from digital images, as well as a major challenge for scientists and engineers. This thesis' primary objective is automatic foreground object detection and classification through digital image analysis, using artificial neural network-based techniques, specifically designed and optimised to be deployed in low-cost hardware devices. This objective will be complemented by developing individuals' movement estimation methods by using unsupervised learning and artificial neural network-based models. The cited objectives have been addressed through a research work illustrated in four publications supporting this thesis. The first one was published in the “ICAE” journal in 2018 and consists of a neural network-based movement detection system for Pan-Tilt-Zoom (PTZ) cameras deployed in a Raspberry Pi board. The second one was published in the “WCCI” conference in 2018 and consists of a deep learning-based automatic video surveillance system for PTZ cameras deployed in low-cost hardware. The third one was published in the “ICAE” journal in 2020 and consists of an anomalous foreground object detection and classification system for panoramic cameras, based on deep learning and supported by low-cost hardware

Repositorio Institucional Universidad de Málaga

Recent Advances in Deep Learning Techniques for Face Recognition

Author: Al-rakhami Mabrook S.
Fime Awal Ahmed
Fuad Md. Tahmid Hasan
Fuad Mohtasim
Gumae Abdu
Iftee Md. Akil Raihan
Islam Md. Nazrul
Rabbi Jakaria
Sen Ovishake
Sikder Delowar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

In recent years, researchers have proposed many deep learning (DL) methods for various tasks, and particularly face recognition (FR) made an enormous leap using these techniques. Deep FR systems benefit from the hierarchical architecture of the DL methods to learn discriminative face representation. Therefore, DL techniques significantly improve state-of-the-art performance on FR systems and encourage diverse and efficient real-world applications. In this paper, we present a comprehensive analysis of various FR systems that leverage the different types of DL techniques, and for the study, we summarize 168 recent contributions from this area. We discuss the papers related to different algorithms, architectures, loss functions, activation functions, datasets, challenges, improvement ideas, current and future trends of DL-based FR systems. We provide a detailed discussion of various DL methods to understand the current state-of-the-art, and then we discuss various activation and loss functions for the methods. Additionally, we summarize different datasets used widely for FR tasks and discuss challenges related to illumination, expression, pose variations, and occlusion. Finally, we discuss improvement ideas, current and future trends of FR tasks.Comment: 32 pages and citation: M. T. H. Fuad et al., "Recent Advances in Deep Learning Techniques for Face Recognition," in IEEE Access, vol. 9, pp. 99112-99142, 2021, doi: 10.1109/ACCESS.2021.309613

arXiv.org e-Print Archive

Directory of Open Access Journals