265 research outputs found
Social Eavesdropping in Zebrafish
Group living animals may eavesdrop on signalling interactions between
conspecifics. This enables them to collect adaptively relevant
information about others, without incurring in the costs of first-hand
information acquisition. Such ability, aka social eavesdropping, is
expected to impact Darwinian fitness and hence predicts the evolution
of cognitive processes that enable social animals to use social
information available in the environment.(...
Face Image and Video Analysis in Biometrics and Health Applications
Computer Vision (CV) enables computers and systems to derive meaningful information from acquired visual inputs, such as images and videos, and make decisions based on the extracted information. Its goal is to acquire, process, analyze, and understand the information by developing a theoretical and algorithmic model. Biometrics are distinctive and measurable human characteristics used to label or describe individuals by combining computer vision with knowledge of human physiology (e.g., face, iris, fingerprint) and behavior (e.g., gait, gaze, voice). Face is one of the most informative biometric traits. Many studies have investigated the human face from the perspectives of various different disciplines, ranging from computer vision, deep learning, to neuroscience and biometrics. In this work, we analyze the face characteristics from digital images and videos in the areas of morphing attack and defense, and autism diagnosis. For face morphing attacks generation, we proposed a transformer based generative adversarial network to generate more visually realistic morphing attacks by combining different losses, such as face matching distance, facial landmark based loss, perceptual loss and pixel-wise mean square error. In face morphing attack detection study, we designed a fusion-based few-shot learning (FSL) method to learn discriminative features from face images for few-shot morphing attack detection (FS-MAD), and extend the current binary detection into multiclass classification, namely, few-shot morphing attack fingerprinting (FS-MAF). In the autism diagnosis study, we developed a discriminative few shot learning method to analyze hour-long video data and explored the fusion of facial dynamics for facial trait classification of autism spectrum disorder (ASD) in three severity levels. The results show outstanding performance of the proposed fusion-based few-shot framework on the dataset. Besides, we further explored the possibility of performing face micro- expression spotting and feature analysis on autism video data to classify ASD and control groups. The results indicate the effectiveness of subtle facial expression changes on autism diagnosis
Robust Knowledge Adaptation for Federated Unsupervised Person ReID
Person Re-identification (ReID) has been extensively studied in recent years due to the increasing demand in the public security sector. However, collecting and modelling with sensitive personal data raises privacy concerns.
Therefore, federated learning has been studied for Person ReID, which aims to share minimal sensitive data between different parties (clients).
However, the statistical heterogeneity between client domains under a federated setting remains as a challenge, which limits the process of knowledge aggregation across clients and often leads to inferior identification accuracy.
Additionally, existing federated learning-based person ReID methods generally rely on laborious and time-consuming data annotations, which has the scalability issues to deploy them for real-world Person ReID.
Therefore, this thesis aims to address the unsupervised person ReID problem under a federated learning scheme. Specifically, two methods are devised:
1.In Chapter 3, we introduce a federated unsupervised cluster-contrastive (FedUCC) learning method for Person ReID. FedUCC introduces a three-stage modelling strategy following a coarse-to-fine manner. In detail, generic knowledge, specialized knowledge and patch knowledge are discovered using a deep neural network. This enables mutual knowledge sharing among clients while retaining local domain-specific knowledge based on the categories of the network components and their parameter settings.
2.In Chapter 4, we propose a novel deep transformer-based architecture - context and camera invariant transformer, namely CCIT, in pursuit of homogeneous image representations by ignoring the irrelevant context and camera bias
Multimedia Forensics
This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field
Multimedia Forensics
This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field
Volume XI, 1984 Speech Association of Minnesota Journal
Complete digitized volume (volume 11, 1984) of Speech Association of Minnesota Journal
Computer Vision Applications for Autonomous Aerial Vehicles
Undoubtedly, unmanned aerial vehicles (UAVs) have experienced a great leap forward over the last decade. It is not surprising anymore to see a UAV being used to accomplish a certain task, which was previously carried out by humans or a former technology. The proliferation of special vision sensors, such as depth cameras, lidar sensors and thermal cameras, and major breakthroughs in computer vision and machine learning fields accelerated the advance of UAV research and technology. However, due to certain unique challenges imposed by UAVs, such as limited payload capacity, unreliable communication link with the ground stations and data safety, UAVs are compelled to perform many tasks on their onboard embedded processing units, which makes it difficult to readily implement the most advanced algorithms on UAVs. This thesis focuses on computer vision and machine learning applications for UAVs equipped with onboard embedded platforms, and presents algorithms that utilize data from multiple modalities. The presented work covers a broad spectrum of algorithms and applications for UAVs, such as indoor UAV perception, 3D understanding with deep learning, UAV localization, and structural inspection with UAVs.
Visual guidance and scene understanding without relying on pre-installed tags or markers is the desired approach for fully autonomous navigation of UAVs in conjunction with the global positioning systems (GPS), or especially when GPS information is either unavailable or unreliable. Thus, semantic and geometric understanding of the surroundings become vital to utilize vision as guidance in the autonomous navigation pipelines. In this context, first, robust altitude measurement, safe landing zone detection and doorway detection methods are presented for autonomous UAVs operating indoors. These approaches are implemented on Google Project Tango platform, which is an embedded platform equipped with various sensors including a depth camera. Next, a modified capsule network for 3D object classification is presented with weight optimization so that the network can be fit and run on memory-constrained platforms. Then, a semantic segmentation method for 3D point clouds is developed for a more general visual perception on a UAV equipped with a 3D vision sensor.
Next, this thesis presents algorithms for structural health monitoring applications involving UAVs. First, a 3D point cloud-based, drift-free and lightweight localization method is presented for depth camera-equipped UAVs that perform bridge inspection, where GPS signal is unreliable. Next, a thermal leakage detection algorithm is presented for detecting thermal anomalies on building envelopes using aerial thermography from UAVs. Then, building on our thermal anomaly identification expertise gained on the previous task, a novel performance anomaly identification metric (AIM) is presented for more reliable performance evaluation of thermal anomaly identification methods
Design of Attention Mechanisms for Robust and Efficient Vehicle Re-Identification from Images and Videos
In this work we explore the problem of Vehicle Re-identification using images and videos with applications in smart transportation systems. One of the sectors that can greatly benefit from the value of captured data from sensors on the road is transportation. Data-driven algorithms enable transportation systems to realise intelligent applications to improve operations, safety and experience of road users.
Vehicle Re-identification refers to the task of retrieving images of a particular vehicle identity in a large gallery set, composed of images taken from different times, locations, in diverse orientations and within a network of traffic cameras. This task is extremely challenging as not only vehicles with different identities can be of the same make, model and color, but also a given vehicle can appear differently depending on the view-point, occlusion and lighting conditions, making it challenging to either distinguish or associate vehicle instances. To tackle these problems, in this dissertation, we develop a series of attention mechanisms to account for local discriminative regions and generate more robust visual representations of vehicles.
In our first work, we propose the Adaptive Attention Vehicle Re-identification (AAVER) model that is equipped with an attention mechanism learned in a supervised manner to locate local regions in the form of key-points of vehicles and extract discriminative features along two parallel paths. The model combines the embeddings of two paths and outputs a single visual representation of the input image.
While AAVER highlights how attention can benefit the discriminative capability of a re-identification system by identifying identity-dependant cues such as key-points or vehicle parts, we note that this requires access to abundant additional annotations that are expensive to collect and more often than not are accompanied by noise. In an effort to re-design the vehicle re-identification pipeline without the need for such expensive annotations, we propose Self-supervised Vehicle Re-identification (SAVER) model to automatically highlight salient regions in a vehicle image and mine discriminative representations.
SAVER generates robust embeddings; however, it requires a forward pass through a computationally expensive network to generate points of attention at inference stage which imposes a bottleneck and limits its potential adoption in real-time and large-scale applications. Therefore, in our next work, we formulated a training strategy inspired by the notion of curriculum learning and designed the Excited Vehicle Re-identification (EVER) model that benefits from a semi-supervised attention mechanism and only relies on the attention generated by SAVER in the course of training.
Recent advancements in the area of self-supervised representation learning have been able to close the performance gap between self-supervised and fully-supervised methods in a spectacular manner. This motivated us to explore these findings in the context of vehicle re-identification and come up with a design that can preserve the lightweightness of EVER while matching or beating the performance of SAVER. Based on this, in our followup work, we proposed the Self-supervised Boosted Vehicle Re-identification model (SSBVER) that is trained in a hybrid manner and learns an implicit attention mechanism
Finally, we propose a real-time and city-scale multi-camera vehicle tracking system that detects, tracks and re-identifies vehicles across traffic cameras on a large scale. The proposed system, has been integrated into the Regional Integrated Transportation Information System (RITIS) platform which is a data-driven platform from the University of Maryland for transportation analysis, monitoring, and data visualization
- …