104 research outputs found
An Indoor Video Surveillance System with Intelligent Fall Detection Capability
This work presents a novel indoor video surveillance system, capable of detecting the falls of humans. The proposed system can detect and evaluate human posture as well. To evaluate human movements, the background model is developed using the codebook method, and the possible position of moving objects is extracted using the background and shadow eliminations method. Extracting a foreground image produces more noise and damage in this image. Additionally, the noise is eliminated using morphological and size filters and this damaged image is repaired. When the image object of a human is extracted, whether or not the posture has changed is evaluated using the aspect ratio and height of a human body. Meanwhile, the proposed system detects a change of the posture and extracts the histogram of the object projection to represent the appearance. The histogram becomes the input vector of K-Nearest Neighbor (K-NN) algorithm and is to evaluate the posture of the object. Capable of accurately detecting different postures of a human, the proposed system increases the fall detection accuracy. Importantly, the proposed method detects the posture using the frame ratio and the displacement of height in an image. Experimental results demonstrate that the proposed system can further improve the system performance and the fall down identification accuracy
AFFECT-PRESERVING VISUAL PRIVACY PROTECTION
The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding.
The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection.
The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously
Image Classification of High Variant Objects in Fast Industrial Applications
Recent advances in machine learning and image processing have expanded the applications of computer vision
in many industries. In industrial applications, image classification is a crucial task since high variant objects
present difficult problems because of their variety and constant change in attributes. Computer vision algorithms
can function effectively in complex environments, working alongside human operators to enhance efficiency and
data accuracy. However, there are still many industries facing difficulties with automation that have not yet been
properly solved and put into practice. They have the need for more accurate, convenient, and faster methods.
These solutions drove my interest in combining multiple learning strategies as well as sensors and image formats
to enable the use of computer vision for these applications. The motivation for this work is to answer a number of
research questions that aim to mitigate current problems in hinder their practical application. This work therefore
aims to present solutions that contribute to enabling these solutions. I demonstrate why standard methods cannot
simply be applied to an existing problem. Each method must be customized to the specific application scenario
in order to obtain a working solution.
One example is face recognition where the classification performance is crucial for the system’s ability to
correctly identify individuals. Additional features would allow higher accuracy, robustness, safety, and make
presentation attacks more difficult. The detection of attempted attacks is critical for the acceptance of such
systems and significantly impacts the applicability of biometrics. Another application is tailgating detection
at automated entrance gates. Especially in high security environments it is important to prevent that authorized
persons can take an unauthorized person into the secured area. There is a plethora of technology that seem potentially
suitable but there are several practical factors to consider that increase or decrease applicability depending
which method is used. The third application covered in this thesis is the classification of textiles when they are
not spread out. Finding certain properties on them is complex, as these properties might be inside a fold, or differ
in appearance because of shadows and position.
The first part of this work provides in-depth analysis of the three individual applications, including background
information that is needed to understand the research topic and its proposed solutions. It includes the state of
the art in the area for all researched applications. In the second part of this work, methods are presented to
facilitate or enable the industrial applicability of the presented applications. New image databases are initially
presented for all three application areas. In the case of biometrics, three methods that identify and improve
specific performance parameters are shown. It will be shown how melanin face pigmentation (MFP) features
can be extracted and used for classification in face recognition and PAD applications. In the entrance control
application, the focus is on the sensor information with six methods being presented in detail. This includes the
use of thermal images to detect humans based on their body heat, depth images in form of RGB-D images and
2D image series, as well as data of a floor mounted sensor-grid. For textile defect detection several methods and
a novel classification procedure, in free-fall is presented.
In summary, this work examines computer vision applications for their practical industrial applicability and
presents solutions to mitigate the identified problems. In contrast to previous work, the proposed approaches are
(a) effective in improving classification performance (b) fast in execution and (c) easily integrated into existing
processes and equipment
Object Tracking
Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application
Latent Dependency Mining for Solving Regression Problems in Computer Vision
PhDRegression-based frameworks, learning the direct mapping between low-level imagery features
and vector/scalar-formed continuous labels, have been widely exploited in computer vision, e.g.
in crowd counting, age estimation and human pose estimation. In the last decade, many efforts
have been dedicated by researchers in computer vision for better regression fitting. Nevertheless,
solving these computer vision problems with regression frameworks remained a formidable
challenge due to 1) feature variation and 2) imbalance and sparse data. On one hand, large feature
variation can be caused by the changes of extrinsic conditions (i.e. images are taken under
different lighting condition and viewing angles) and also intrinsic conditions (e.g. different aging
process of different persons in age estimation and inter-object occlusion in crowd density
estimation). On the other hand, imbalanced and sparse data distributions can also have an important
effect on regression performance. Apparently, these two challenges existing in regression
learning are related in the sense that the feature inconsistency problem is compounded by sparse
and imbalanced training data and vice versa, and they need be tackled jointly in modelling and
explicitly in representation. This thesis firstly mines an intermediary feature representation consisting
of concatenating spatially localised feature for sharing the information from neighbouring
localised cells in the frames. This thesis secondly introduces the cumulative attribute concept
constructed for learning a regression model by exploiting the latent cumulative dependent nature
of label space in regression, in the application of facial age and crowd density estimation.
The thesis thirdly demonstrates the effectiveness of a discriminative structured-output regression
framework to learn the inherent latent correlation between each element of output variables in
the application of 2D human upper body pose estimation. The effectiveness of the proposed regression
frameworks for crowd counting, age estimation, and human pose estimation is validated
with public benchmarks
Neural Radiance Fields: Past, Present, and Future
The various aspects like modeling and interpreting 3D environments and
surroundings have enticed humans to progress their research in 3D Computer
Vision, Computer Graphics, and Machine Learning. An attempt made by Mildenhall
et al in their paper about NeRFs (Neural Radiance Fields) led to a boom in
Computer Graphics, Robotics, Computer Vision, and the possible scope of
High-Resolution Low Storage Augmented Reality and Virtual Reality-based 3D
models have gained traction from res with more than 1000 preprints related to
NeRFs published. This paper serves as a bridge for people starting to study
these fields by building on the basics of Mathematics, Geometry, Computer
Vision, and Computer Graphics to the difficulties encountered in Implicit
Representations at the intersection of all these disciplines. This survey
provides the history of rendering, Implicit Learning, and NeRFs, the
progression of research on NeRFs, and the potential applications and
implications of NeRFs in today's world. In doing so, this survey categorizes
all the NeRF-related research in terms of the datasets used, objective
functions, applications solved, and evaluation criteria for these applications.Comment: 413 pages, 9 figures, 277 citation
Virtual Reality Games for Motor Rehabilitation
This paper presents a fuzzy logic based method to track user satisfaction without the need for devices to monitor users physiological conditions. User satisfaction is the key to any product’s acceptance; computer applications and video games provide a unique opportunity to provide a tailored environment for each user to better suit their needs. We have implemented a non-adaptive fuzzy logic model of emotion, based on the emotional component of the Fuzzy Logic Adaptive Model of Emotion (FLAME) proposed by El-Nasr, to estimate player emotion in UnrealTournament 2004. In this paper we describe the implementation of this system and present the results of one of several play tests. Our research contradicts the current literature that suggests physiological measurements are needed. We show that it is possible to use a software only method to estimate user emotion
Designing a Contactless, AI System to Measure the Human Body using a Single Camera for the Clothing and Fashion Industry
Using a single RGB camera to obtain accurate body dimensions rather than measuring these manually or via more complex multi-camera or more expensive 3D scanners, has a high application potential for the apparel industry.
In this thesis, a system that estimates upper human body measurements using a set of computer vision and machine learning techniques. The main steps involve: (1) using a portable camera; (2) improving image quality; (3) isolating the human body from the surrounding environment; (4) performing a calibration step; (5) extracting body features from the image; (6) indicating markers on the image; (7) producing refined final results.
In this research, a unique geometric shape is favored, namely the ellipse, to approximate human body main cross sections. We focus on the upper body horizontal slices (i.e. from head to hips) which, we show, can be well represented by varying an ellipse’s eccentricity, this per individual. Then, evaluating each fitted ellipse’s perimeter allows us to obtain better results than the current state-of-the-art for use in the fashion and online retail industry.
In our study, I selected a set of two equations, out of many other possible choices, to best estimate upper human body horizontal cross sections via perimeters of fitted ellipses. In this study, I experimented with the system on a diverse sample of 78 participants. The results for the upper human body measurements in comparison to the traditional manual method of tape measurements, when used as a reference, show ±1cm average differences, sufficient for many applications, including online retail
Soft Biometric Analysis: MultiPerson and RealTime Pedestrian Attribute Recognition in Crowded Urban Environments
Traditionally, recognition systems were only based on human hard biometrics. However,
the ubiquitous CCTV cameras have raised the desire to analyze human biometrics from
far distances, without people attendance in the acquisition process. Highresolution
face closeshots
are rarely available at far distances such that facebased
systems cannot
provide reliable results in surveillance applications. Human soft biometrics such as body
and clothing attributes are believed to be more effective in analyzing human data collected
by security cameras.
This thesis contributes to the human soft biometric analysis in uncontrolled environments
and mainly focuses on two tasks: Pedestrian Attribute Recognition (PAR) and person reidentification
(reid).
We first review the literature of both tasks and highlight the history
of advancements, recent developments, and the existing benchmarks. PAR and person reid
difficulties are due to significant distances between intraclass
samples, which originate
from variations in several factors such as body pose, illumination, background, occlusion,
and data resolution. Recent stateoftheart
approaches present endtoend
models that
can extract discriminative and comprehensive feature representations from people. The
correlation between different regions of the body and dealing with limited learning data
is also the objective of many recent works. Moreover, class imbalance and correlation
between human attributes are specific challenges associated with the PAR problem.
We collect a large surveillance dataset to train a novel gender recognition model suitable
for uncontrolled environments. We propose a deep residual network that extracts several
posewise
patches from samples and obtains a comprehensive feature representation. In
the next step, we develop a model for multiple attribute recognition at once. Considering
the correlation between human semantic attributes and class imbalance, we respectively
use a multitask
model and a weighted loss function. We also propose a multiplication
layer on top of the backbone features extraction layers to exclude the background features
from the final representation of samples and draw the attention of the model to the
foreground area.
We address the problem of person reid
by implicitly defining the receptive fields of
deep learning classification frameworks. The receptive fields of deep learning models
determine the most significant regions of the input data for providing correct decisions.
Therefore, we synthesize a set of learning data in which the destructive regions (e.g.,
background) in each pair of instances are interchanged. A segmentation module
determines destructive and useful regions in each sample, and the label of synthesized
instances are inherited from the sample that shared the useful regions in the synthesized
image. The synthesized learning data are then used in the learning phase and help
the model rapidly learn that the identity and background regions are not correlated.
Meanwhile, the proposed solution could be seen as a data augmentation approach that
fully preserves the label information and is compatible with other data augmentation
techniques.
When reid
methods are learned in scenarios where the target person appears with identical garments in the gallery, the visual appearance of clothes is given the most
importance in the final feature representation. Clothbased
representations are not
reliable in the longterm
reid
settings as people may change their clothes. Therefore,
developing solutions that ignore clothing cues and focus on identityrelevant
features are
in demand. We transform the original data such that the identityrelevant
information of
people (e.g., face and body shape) are removed, while the identityunrelated
cues (i.e.,
color and texture of clothes) remain unchanged. A learned model on the synthesized
dataset predicts the identityunrelated
cues (shortterm
features). Therefore, we train a
second model coupled with the first model and learns the embeddings of the original data
such that the similarity between the embeddings of the original and synthesized data is
minimized. This way, the second model predicts based on the identityrelated
(longterm)
representation of people.
To evaluate the performance of the proposed models, we use PAR and person reid
datasets, namely BIODI, PETA, RAP, Market1501,
MSMTV2,
PRCC, LTCC, and MIT
and compared our experimental results with stateoftheart
methods in the field.
In conclusion, the data collected from surveillance cameras have low resolution, such
that the extraction of hard biometric features is not possible, and facebased
approaches
produce poor results. In contrast, soft biometrics are robust to variations in data quality.
So, we propose approaches both for PAR and person reid
to learn discriminative features
from each instance and evaluate our proposed solutions on several publicly available
benchmarks.This thesis was prepared at the University of Beria Interior, IT Instituto de Telecomunicações, Soft Computing and Image Analysis Laboratory (SOCIA Lab), Covilhã Delegation, and was submitted to the University of Beira Interior for defense in a public examination session
Design and semantics of form and movement (DeSForM 2006)
Design and Semantics of Form and Movement (DeSForM) grew from applied research exploring emerging design methods and practices to support new generation product and interface design. The products and interfaces are concerned with: the context of ubiquitous computing and ambient technologies and the need for greater empathy in the pre-programmed behaviour of the ‘machines’ that populate our lives. Such explorative research in the CfDR has been led by Young, supported by Kyffin, Visiting Professor from Philips Design and sponsored by Philips Design over a period of four years (research funding £87k). DeSForM1 was the first of a series of three conferences that enable the presentation and debate of international work within this field: • 1st European conference on Design and Semantics of Form and Movement (DeSForM1), Baltic, Gateshead, 2005, Feijs L., Kyffin S. & Young R.A. eds. • 2nd European conference on Design and Semantics of Form and Movement (DeSForM2), Evoluon, Eindhoven, 2006, Feijs L., Kyffin S. & Young R.A. eds. • 3rd European conference on Design and Semantics of Form and Movement (DeSForM3), New Design School Building, Newcastle, 2007, Feijs L., Kyffin S. & Young R.A. eds. Philips sponsorship of practice-based enquiry led to research by three teams of research students over three years and on-going sponsorship of research through the Northumbria University Design and Innovation Laboratory (nuDIL). Young has been invited on the steering panel of the UK Thinking Digital Conference concerning the latest developments in digital and media technologies. Informed by this research is the work of PhD student Yukie Nakano who examines new technologies in relation to eco-design textiles
- …