Search CORE

103 research outputs found

Extracting contextual information from egocentric videos

Author: Süveges Tamás
Publication venue
Publication date: 01/01/2021
Field of study

University of Dundee Online Publications

Fuzzy Logic in Surveillance Big Video Data Analysis: Comprehensive Review, Challenges, and Research Directions

Author: Auday A.
Badaoui R.
Chandhok Chinki
Chang Xiaomin
Crowley James L.
da Costa Kelton A. P.
De Souza Renato W. R.
Duc Thang Le
Gou Mengran
Gui Shupeng
Haq Ijaz Ul
Hasan Mahmudul
Hussain Tanveer
Hussain Tanveer
Islam Muhammad Aminul
Min Ullah Fath U.
Molina-Cabello Miguel A.
Muhammad Khan
Nhivekar G.
Nikouei Seyed Yahya
Patel Harshita
Revathi R.
Roy Pramit
Sajjad Muhammad
Shell Jethro
Spinner Thilo
Taneja Shweta
Ullah Amin
Ullah Waseem
Wei Song
Yao B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2021
Field of study

CCTV cameras installed for continuous surveillance generate enormous amounts of data daily, forging the term “Big Video Data” (BVD). The active practice of BVD includes intelligent surveillance and activity recognition, among other challenging tasks. To efficiently address these tasks, the computer vision research community has provided monitoring systems, activity recognition methods, and many other computationally complex solutions for the purposeful usage of BVD. Unfortunately, the limited capabilities of these methods, higher computational complexity, and stringent installation requirements hinder their practical implementation in real-world scenarios, which still demand human operators sitting in front of cameras to monitor activities or make actionable decisions based on BVD. The usage of human-like logic, known as fuzzy logic, has been employed emerging for various data science applications such as control systems, image processing, decision making, routing, and advanced safety-critical systems. This is due to its ability to handle various sources of real world domain and data uncertainties, generating easily adaptable and explainable data-based models. Fuzzy logic can be effectively used for surveillance as a complementary for huge-sized artificial intelligence models and tiresome training procedures. In this paper, we draw researchers’ attention towards the usage of fuzzy logic for surveillance in the context of BVD. We carry out a comprehensive literature survey of methods for vision sensory data analytics that resort to fuzzy logic concepts. Our overview highlights the advantages, downsides, and challenges in existing video analysis methods based on fuzzy logic for surveillance applications. We enumerate and discuss the datasets used by these methods, and finally provide an outlook towards future research directions derived from our critical assessment of the efforts invested so far in this exciting field

University of Essex Research Repository

Crossref

Edge Hill University Research Information Repository

Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: Parents fulfill a pivotal role in early childhood development of social and communication skills. In children with autism, the development of these skills can be delayed. Applied behavioral analysis (ABA) techniques have been created to aid in skill acquisition. Among these, pivotal response treatment (PRT) has been empirically shown to foster improvements. Research into PRT implementation has also shown that parents can be trained to be effective interventionists for their children. The current difficulty in PRT training is how to disseminate training to parents who need it, and how to support and motivate practitioners after training. Evaluation of the parents’ fidelity to implementation is often undertaken using video probes that depict the dyadic interaction occurring between the parent and the child during PRT sessions. These videos are time consuming for clinicians to process, and often result in only minimal feedback for the parents. Current trends in technology could be utilized to alleviate the manual cost of extracting data from the videos, affording greater opportunities for providing clinician created feedback as well as automated assessments. The naturalistic context of the video probes along with the dependence on ubiquitous recording devices creates a difficult scenario for classification tasks. The domain of the PRT video probes can be expected to have high levels of both aleatory and epistemic uncertainty. Addressing these challenges requires examination of the multimodal data along with implementation and evaluation of classification algorithms. This is explored through the use of a new dataset of PRT videos. The relationship between the parent and the clinician is important. The clinician can provide support and help build self-efficacy in addition to providing knowledge and modeling of treatment procedures. Facilitating this relationship along with automated feedback not only provides the opportunity to present expert feedback to the parent, but also allows the clinician to aid in personalizing the classification models. By utilizing a human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the classification models by providing additional labeled samples. This will allow the system to improve classification and provides a person-centered approach to extracting multimodal data from PRT video probes.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Recommended from our members

An Investigation into the Performance of Ethnicity Verification Between Humans and Machine Learning Algorithms

Author: Jilani Shelina K.
Publication venue: Faculty of Engineering and Informatics. School of Media, Design and Technology
Publication date: 01/01/2020
Field of study

There has been a significant increase in the interest for the task of classifying demographic profiles i.e. race and ethnicity. Ethnicity is a significant human characteristic and applying facial image data for the discrimination of ethnicity is integral to face-related biometric systems. Given the diversity in the application of ethnicity-specific information such as face recognition and iris recognition, and the availability of image datasets for more commonly available human populations, i.e. Caucasian, African-American, Asians, and South-Asian Indians. A gap has been identified for the development of a system which analyses the full-face and its individual feature-components (eyes, nose and mouth), for the Pakistani ethnic group. An efficient system is proposed for the verification of the Pakistani ethnicity, which incorporates a two-tier (computer vs human) approach. Firstly, hand-crafted features were used to ascertain the descriptive nature of a frontal-image and facial profile, for the Pakistani ethnicity. A total of 26 facial landmarks were selected (16 frontal and 10 for the profile) and by incorporating 2 models for redundant information removal, and a linear classifier for the binary task. The experimental results concluded that the facial profile image of a Pakistani face is distinct amongst other ethnicities. However, the methodology consisted of limitations for example, low performance accuracy, the laborious nature of manual data i.e. facial landmark, annotation, and the small facial image dataset. To make the system more accurate and robust, Deep Learning models are employed for ethnicity classification. Various state-of-the-art Deep models are trained on a range of facial image conditions, i.e. full face and partial-face images, plus standalone feature components such as the nose and mouth. Since ethnicity is pertinent to the research, a novel facial image database entitled Pakistani Face Database (PFDB), was created using a criterion-specific selection process, to ensure assurance in each of the assigned class-memberships, i.e. Pakistani and Non-Pakistani. Comparative analysis between 6 Deep Learning models was carried out on augmented image datasets, and the analysis demonstrates that Deep Learning yields better performance accuracy compared to low-level features. The human phase of the ethnicity classification framework tested the discrimination ability of novice Pakistani and Non-Pakistani participants, using a computerised ethnicity task. The results suggest that humans are better at discriminating between Pakistani and Non-Pakistani full face images, relative to individual face-feature components (eyes, nose, mouth), struggling the most with the nose, when making judgements of ethnicity. To understand the effects of display conditions on ethnicity discrimination accuracy, two conditions were tested; (i) Two-Alternative Forced Choice (2-AFC) and (ii) Single image procedure. The results concluded that participants perform significantly better in trials where the target (Pakistani) image is shown alongside a distractor (Non-Pakistani) image. To conclude the proposed framework, directions for future study are suggested to advance the current understanding of image based ethnicity verification.Acumé Forensi

Bradford Scholars

WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

Author: Elkelany Amany
McKeever Susan
Ross Robert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/02/2023
Field of study

Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments

Arrow@TUDublin

Proceedings of the 1st Doctoral Consortium at the European Conference on Artificial Intelligence (DC-ECAI 2020)

Author: Alonso Moral José María
Cortés Ulises
Publication venue
Publication date: 01/01/2020
Field of study

1st Doctoral Consortium at the European Conference on Artificial Intelligence (DC-ECAI 2020), 29-30 August, 2020 Santiago de Compostela, SpainThe DC-ECAI 2020 provides a unique opportunity for PhD students, who are close to finishing their doctorate research, to interact with experienced researchers in the field. Senior members of the community are assigned as mentors for each group of students based on the student’s research or similarity of research interests. The DC-ECAI 2020, which is held virtually this year, allows students from all over the world to present their research and discuss their ongoing research and career plans with their mentor, to do networking with other participants, and to receive training and mentoring about career planning and career option

Repositorio Institucional da Universidade de Santiago de Compostela

Emotion-aware cross-modal domain adaptation in video sequences

Author: Athanasiadis Christos
Publication venue: 'University of Maastricht'
Publication date: 01/01/2022
Field of study

Maastricht University Research Portal

Prometheus' Digital Fire: The Civic Responsibilities of Artificial Intelligence

Author: Garon Jon M.
Publication venue: Ohio State University. Moritz College of Law
Publication date: 01/01/2024
Field of study

KnowledgeBank at OSU

Emotion-aware voice interfaces based on speech signal processing

Author: Ma Yong
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 28/11/2022
Field of study

Voice interfaces (VIs) will become increasingly widespread in current daily lives as AI techniques progress. VIs can be incorporated into smart devices like smartphones, as well as integrated into autos, home automation systems, computer operating systems, and home appliances, among other things. Current speech interfaces, however, are unaware of users’ emotional states and hence cannot support real communication. To overcome these limitations, it is necessary to implement emotional awareness in future VIs. This thesis focuses on how speech signal processing (SSP) and speech emotion recognition (SER) can enable VIs to gain emotional awareness. Following an explanation of what emotion is and how neural networks are implemented, this thesis presents the results of several user studies and surveys. Emotions are complicated, and they are typically characterized using category and dimensional models. They can be expressed verbally or nonverbally. Although existing voice interfaces are unaware of users’ emotional states and cannot support natural conversations, it is possible to perceive users’ emotions by speech based on SSP in future VIs. One section of this thesis, based on SSP, investigates mental restorative eﬀects on humans and their measures from speech signals. SSP is less intrusive and more accessible than traditional measures such as attention scales or response tests, and it can provide a reliable assessment for attention and mental restoration. SSP can be implemented into future VIs and utilized in future HCI user research. The thesis then moves on to present a novel attention neural network based on sparse correlation features. The detection accuracy of emotions in the continuous speech was demonstrated in a user study utilizing recordings from a real classroom. In this section, a promising result will be shown. In SER research, it is unknown if existing emotion detection methods detect acted emotions or the genuine emotion of the speaker. Another section of this thesis is concerned with humans’ ability to act on their emotions. In a user study, participants were instructed to imitate five fundamental emotions. The results revealed that they struggled with this task; nevertheless, certain emotions were easier to replicate than others. A further study concern is how VIs should respond to users’ emotions if SER techniques are implemented in VIs and can recognize users’ emotions. The thesis includes research on ways for dealing with the emotions of users. In a user study, users were instructed to make sad, angry, and terrified VI avatars happy and were asked if they would like to be treated the same way if the situation were reversed. According to the results, the majority of participants tended to respond to these unpleasant emotions with neutral emotion, but there is a diﬀerence among genders in emotion selection. For a human-centered design approach, it is important to understand what the users’ preferences for future VIs are. In three distinct cultures, a questionnaire-based survey on users’ attitudes and preferences for emotion-aware VIs was conducted. It was discovered that there are almost no gender diﬀerences. Cluster analysis found that there are three fundamental user types that exist in all cultures: Enthusiasts, Pragmatists, and Sceptics. As a result, future VI development should consider diverse sorts of consumers. In conclusion, future VIs systems should be designed for various sorts of users as well as be able to detect the users’ disguised or actual emotions using SER and SSP technologies. Furthermore, many other applications, such as restorative eﬀects assessments, can be included in the VIs system

Digitale Hochschulschriften der LMU

Artificial Intelligence for Multimedia Signal Processing

Author
Publication venue: 'MDPI AG'
Publication date: 16/09/2022
Field of study

Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

Directory of Open Access Books (DOAB)