Search CORE

12,723 research outputs found

AI Oriented Large-Scale Video Management for Smart City: Technologies, Standards and Beyond

Author: Duan Lingyu
Gao Wen
Lou Yihang
Rui Yong
Wang Shiqi
Publication venue
Publication date: 04/12/2017
Field of study

Deep learning has achieved substantial success in a series of tasks in computer vision. Intelligent video analysis, which can be broadly applied to video surveillance in various smart city applications, can also be driven by such powerful deep learning engines. To practically facilitate deep neural network models in the large-scale video analysis, there are still unprecedented challenges for the large-scale video data management. Deep feature coding, instead of video coding, provides a practical solution for handling the large-scale video surveillance data. To enable interoperability in the context of deep feature coding, standardization is urgent and important. However, due to the explosion of deep learning algorithms and the particularity of feature coding, there are numerous remaining problems in the standardization process. This paper envisions the future deep feature coding standard for the AI oriented large-scale video management, and discusses existing techniques, standards and possible solutions for these open problems.Comment: 8 pages, 8 figures, 5 table

arXiv.org e-Print Archive

Face Recognition in Low Quality Images: A Survey

Author: Flynn Patrick
Li Pei
Mery Domingo
Prieto Loreto
Publication venue
Publication date: 28/03/2019
Field of study

Low-resolution face recognition (LRFR) has received increasing attention over the past few years. Its applications lie widely in the real-world environment when high-resolution or high-quality images are hard to capture. One of the biggest demands for LRFR technologies is video surveillance. As the the number of surveillance cameras in the city increases, the videos that captured will need to be processed automatically. However, those videos or images are usually captured with large standoffs, arbitrary illumination condition, and diverse angles of view. Faces in these images are generally small in size. Several studies addressed this problem employed techniques like super resolution, deblurring, or learning a relationship between different resolution domains. In this paper, we provide a comprehensive review of approaches to low-resolution face recognition in the past five years. First, a general problem definition is given. Later, systematically analysis of the works on this topic is presented by catogory. In addition to describing the methods, we also focus on datasets and experiment settings. We further address the related works on unconstrained low-resolution face recognition and compare them with the result that use synthetic low-resolution data. Finally, we summarized the general limitations and speculate a priorities for the future effort.Comment: There are some mistakes addressing in this paper which will be misleading to the reader and we wont have a new version in short time. We will resubmit once it is being corecte

arXiv.org e-Print Archive

Minor Privacy Protection Through Real-time Video Processing at the Edge

Author: Chen Yu
Dong Yunxi
Fitwi Alem
Nikouei Seyed Yahya
Yuan Meng
Publication venue
Publication date: 03/05/2020
Field of study

The collection of a lot of personal information about individuals, including the minor members of a family, by closed-circuit television (CCTV) cameras creates a lot of privacy concerns. Particularly, revealing children's identifications or activities may compromise their well-being. In this paper, we investigate lightweight solutions that are affordable to edge surveillance systems, which is made feasible and accurate to identify minors such that appropriate privacy-preserving measures can be applied accordingly. State of the art deep learning architectures are modified and re-purposed in a cascaded fashion to maximize the accuracy of our model. A pipeline extracts faces from the input frames and classifies each one to be of an adult or a child. Over 20,000 labeled sample points are used for classification. We explore the timing and resources needed for such a model to be used in the Edge-Fog architecture at the edge of the network, where we can achieve near real-time performance on the CPU. Quantitative experimental results show the superiority of our proposed model with an accuracy of 92.1% in classification compared to some other face recognition based child detection approaches.Comment: Accepted by the 2nd International Workshop on Smart City Communication and Networking at the ICCCN 202

arXiv.org e-Print Archive

SeqFace: Make full use of sequence information for face recognition

Author: Hu Wei
Huang Yangyu
Li Ruirui
Li Wei
Yuan Guodong
Zhang Fan
Publication venue
Publication date: 23/03/2018
Field of study

Deep convolutional neural networks (CNNs) have greatly improved the Face Recognition (FR) performance in recent years. Almost all CNNs in FR are trained on the carefully labeled datasets containing plenty of identities. However, such high-quality datasets are very expensive to collect, which restricts many researchers to achieve state-of-the-art performance. In this paper, we propose a framework, called SeqFace, for learning discriminative face features. Besides a traditional identity training dataset, the designed SeqFace can train CNNs by using an additional dataset which includes a large number of face sequences collected from videos. Moreover, the label smoothing regularization (LSR) and a new proposed discriminative sequence agent (DSA) loss are employed to enhance discrimination power of deep face features via making full use of the sequence data. Our method achieves excellent performance on Labeled Faces in the Wild (LFW), YouTube Faces (YTF), only with a single ResNet. The code and models are publicly available on-line (https://github.com/huangyangyu/SeqFace)

arXiv.org e-Print Archive

Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings

Author: Li Lin
Xie Zhongwei
Zhong Luo
Zhong Xian
Publication venue
Publication date: 22/10/2018
Field of study

Image-to-video person re-identification identifies a target person by a probe image from quantities of pedestrian videos captured by non-overlapping cameras. Despite the great progress achieved,it's still challenging to match in the multimodal scenario,i.e. between image and video. Currently,state-of-the-art approaches mainly focus on the task-specific data,neglecting the extra information on the different but related tasks. In this paper,we propose an end-to-end neural network framework for image-to-video person reidentification by leveraging cross-modal embeddings learned from extra information.Concretely speaking,cross-modal embeddings from image captioning and video captioning models are reused to help learned features be projected into a coordinated space,where similarity can be directly computed. Besides,training steps from fixed model reuse approach are integrated into our framework,which can incorporate beneficial information and eventually make the target networks independent of existing models. Apart from that,our proposed framework resorts to CNNs and LSTMs for extracting visual and spatiotemporal features,and combines the strengths of identification and verification model to improve the discriminative ability of the learned feature. The experimental results demonstrate the effectiveness of our framework on narrowing down the gap between heterogeneous data and obtaining observable improvement in image-to-video person re-identification.Comment: under review for Pattern Recognition Letter

arXiv.org e-Print Archive

Deep Learning Architectures for Face Recognition in Video Surveillance

Author: Bashbaghi Saman
Granger Eric
Parchami Mostafa
Sabourin Robert
Publication venue
Publication date: 27/06/2018
Field of study

Face recognition (FR) systems for video surveillance (VS) applications attempt to accurately detect the presence of target individuals over a distributed network of cameras. In video-based FR systems, facial models of target individuals are designed a priori during enrollment using a limited number of reference still images or video data. These facial models are not typically representative of faces being observed during operations due to large variations in illumination, pose, scale, occlusion, blur, and to camera inter-operability. Specifically, in still-to-video FR application, a single high-quality reference still image captured with still camera under controlled conditions is employed to generate a facial model to be matched later against lower-quality faces captured with video cameras under uncontrolled conditions. Current video-based FR systems can perform well on controlled scenarios, while their performance is not satisfactory in uncontrolled scenarios mainly because of the differences between the source (enrollment) and the target (operational) domains. Most of the efforts in this area have been toward the design of robust video-based FR systems in unconstrained surveillance environments. This chapter presents an overview of recent advances in still-to-video FR scenario through deep convolutional neural networks (CNNs). In particular, deep learning architectures proposed in the literature based on triplet-loss function (e.g., cross-correlation matching CNN, trunk-branch ensemble CNN and HaarNet) and supervised autoencoders (e.g., canonical face representation CNN) are reviewed and compared in terms of accuracy and computational complexity

arXiv.org e-Print Archive

ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition

Author: Angelini Federico
Fu Zeyu
Long Yang
Naqvi Syed Mohsen
Shao Ling
Publication venue
Publication date: 29/10/2018
Field of study

We present ActionXPose, a novel 2D pose-based algorithm for posture-level Human Action Recognition (HAR). The proposed approach exploits 2D human poses provided by OpenPose detector from RGB videos. ActionXPose aims to process poses data to be provided to a Long Short-Term Memory Neural Network and to a 1D Convolutional Neural Network, which solve the classification problem. ActionXPose is one of the first algorithms that exploits 2D human poses for HAR. The algorithm has real-time performance and it is robust to camera movings, subject proximity changes, viewpoint changes, subject appearance changes and provide high generalization degree. In fact, extensive simulations show that ActionXPose can be successfully trained using different datasets at once. State-of-the-art performance on popular datasets for posture-related HAR problems (i3DPost, KTH) are provided and results are compared with those obtained by other methods, including the selected ActionXPose baseline. Moreover, we also proposed two novel datasets called MPOSE and ISLD recorded in our Intelligent Sensing Lab, to show ActionXPose generalization performance

arXiv.org e-Print Archive

Improved Hard Example Mining by Discovering Attribute-based Hard Person Identity

Author: Chen Ziliang
Luo Bin
Tang Jin
Wang Xiao
Yang Rui
Publication venue
Publication date: 05/08/2019
Field of study

In this paper, we propose Hard Person Identity Mining (HPIM) that attempts to refine the hard example mining to improve the exploration efficacy in person re-identification. It is motivated by following observation: the more attributes some people share, the more difficult to separate their identities. Based on this observation, we develop HPIM via a transferred attribute describer, a deep multi-attribute classifier trained from the source noisy person attribute datasets. We encode each image into the attribute probabilistic description in the target person re-ID dataset. Afterwards in the attribute code space, we consider each person as a distribution to generate his view-specific attribute codes in different practical scenarios. Hence we estimate the person-specific statistical moments from zeroth to higher order, which are further used to calculate the central moment discrepancies between persons. Such discrepancy is a ground to choose hard identity to organize proper mini-batches, without concerning the person representation changing in metric learning. It presents as a complementary tool of hard example mining, which helps to explore the global instead of the local hard example constraint in the mini-batch built by randomly sampled identities. Extensive experiments on two person re-identification benchmarks validated the effectiveness of our proposed algorithm

arXiv.org e-Print Archive

PVSS: A Progressive Vehicle Search System for Video Surveillance Networks

Author: Li Shuangqun
Liu Wu
Liu Xinchen
Ma Huadong
Publication venue
Publication date: 10/01/2019
Field of study

This paper is focused on the task of searching for a specific vehicle that appeared in the surveillance networks. Existing methods usually assume the vehicle images are well cropped from the surveillance videos, then use visual attributes, like colors and types, or license plate numbers to match the target vehicle in the image set. However, a complete vehicle search system should consider the problems of vehicle detection, representation, indexing, storage, matching, and so on. Besides, attribute-based search cannot accurately find the same vehicle due to intra-instance changes in different cameras and the extremely uncertain environment. Moreover, the license plates may be misrecognized in surveillance scenes due to the low resolution and noise. In this paper, a Progressive Vehicle Search System, named as PVSS, is designed to solve the above problems. PVSS is constituted of three modules: the crawler, the indexer, and the searcher. The vehicle crawler aims to detect and track vehicles in surveillance videos and transfer the captured vehicle images, metadata and contextual information to the server or cloud. Then multi-grained attributes, such as the visual features and license plate fingerprints, are extracted and indexed by the vehicle indexer. At last, a query triplet with an input vehicle image, the time range, and the spatial scope is taken as the input by the vehicle searcher. The target vehicle will be searched in the database by a progressive process. Extensive experiments on the public dataset from a real surveillance network validate the effectiveness of the PVSS

arXiv.org e-Print Archive

Person Identification with Visual Summary for a Safe Access to a Smart Home

Author: Alam Shahinur
Yeasin Mohammed
Publication venue
Publication date: 18/04/2019
Field of study

SafeAccess is an integrated system designed to provide easier and safer access to a smart home for people with or without disabilities. The system is designed to enhance safety and promote the independence of people with disability (i.e., visually impaired). The key functionality of the system includes the detection and identification of human and generating contextual visual summary from the real-time video streams obtained from the cameras placed in strategic locations around the house. In addition, the system classifies human into groups (i.e. friends/families/caregiver versus intruders/burglars/unknown). These features allow the user to grant/deny remote access to the premises or ability to call emergency services. In this paper, we focus on designing a prototype system for the smart home and building a robust recognition engine that meets the system criteria and addresses speed, accuracy, deployment and environmental challenges under a wide variety of practical and real-life situations. To interact with the system, we implemented a dialog enabled interface to create a personalized profile using face images or video of friend/families/caregiver. To improve computational efficiency, we apply change detection to filter out frames and use Faster-RCNN to detect the human presence and extract faces using Multitask Cascaded Convolutional Networks (MTCNN). Subsequently, we apply LBP/FaceNet to identify a person and groups by matching extracted faces with the profile. SafeAccess sends a visual summary to the users with an MMS containing a person's name if any match found or as "Unknown", scene image, facial description, and contextual information. SafeAccess identifies friends/families/caregiver versus intruders/unknown with an average F-score 0.97 and generates a visual summary from 10 classes with an average accuracy of 98.01%

arXiv.org e-Print Archive