2,295 research outputs found

    Deep Learning for Real-time Information Hiding and Forensics

    Get PDF

    Robust Mobile Visual Recognition System: From Bag of Visual Words to Deep Learning

    Get PDF
    With billions of images captured by mobile users everyday, automatically recognizing contents in such images has become a particularly important feature for various mobile apps, including augmented reality, product search, visual-based authentication etc. Traditionally, a client-server architecture is adopted such that the mobile client sends captured images/video frames to a cloud server, which runs a set of task-specific computer vision algorithms and sends back the recognition results. However, such scheme may cause problems related to user privacy, network stability/availability and device energy.In this dissertation, we investigate the problem of building a robust mobile visual recognition system that achieves high accuracy, low latency, low energy cost and privacy protection. Generally, we study two broad types of recognition methods: the bag of visual words (BOVW) based retrieval methods, which search the nearest neighbor image to a query image, and the state-of-the-art deep learning based methods, which recognize a given image using a trained deep neural network. The challenges of deploying BOVW based retrieval methods include: size of indexed image database, query latency, feature extraction efficiency and re-ranking performance. To address such challenges, we first proposed EMOD which enables efficient on-device image retrieval on a downloaded context-dependent partial image database. The efficiency is achieved by analyzing the BOVW processing pipeline and optimizing each module with algorithmic improvement.Recent deep learning based recognition approaches have been shown to greatly exceed the performance of traditional approaches. We identify several challenges of applying deep learning based recognition methods on mobile scenarios, namely energy efficiency and privacy protection for real-time visual processing, and mobile visual domain biases. Thus, we proposed two techniques to address them, (i) efficiently splitting the workload across heterogeneous computing resources, i.e., mobile devices and the cloud using our Moca framework, and (ii) using mobile visual domain adaptation as proposed in our collaborative edge-mediated platform DeepCham. Our extensive experiments on large-scale benchmark datasets and off-the-shelf mobile devices show our solutions provide better results than the state-of-the-art solutions

    Smart Road Danger Detection and Warning

    Get PDF
    Road dangers have caused numerous accidents, thus detecting them and warning users are critical to improving traffic safety. However, it is challenging to recognize road dangers from numerous normal data and warn road users due to cluttered real-world backgrounds, ever-changing road danger appearances, high intra-class differences, limited data for one party, and high privacy leakage risk of sensitive information. To address these challenges, in this thesis, three novel road danger detection and warning frameworks are proposed to improve the performance of real-time road danger prediction and notification in challenging real-world environments in four main aspects, i.e., accuracy, latency, communication efficiency, and privacy. Firstly, many existing road danger detection systems mainly process data on clouds. However, they cannot warn users timely about road dangers due to long distances. Meanwhile, supervised machine learning algorithms are usually used in these systems requiring large and precisely labeled datasets to perform well. The EcRD is proposed to improve latency and reduce labeling cost, which is an Edge-cloud-based Road Damage detection and warning framework that leverages the fast-responding advantage of edges and the large storage and computation resources advantages of the cloud. In EcRD, a simple yet efficient road segmentation algorithm is introduced for fast and accurate road area detection by filtering out noisy backgrounds. Additionally, a light-weighted road damage detector is developed based on Gray Level Co-occurrence Matrix (GLCM) features on edges for rapid hazardous road damage detection and warning. Further, a multi-types road damage detection model is proposed for long-term road management on the cloud, embedded with a novel image-label generator based on Cycle-Consistent Adversarial Networks, which automatically generates images with corresponding labels to improve road damage detection accuracy further. EcRD achieves 91.96% accuracy with only 0.0043s latency, which is around 579 times faster than cloud-based approaches without affecting users' experience while requiring very low storage and labeling cost. Secondly, although EcRD relieves the problem of high latency by edge computing techniques, road users can only achieve warnings of hazardous road damages within a small area due to the limited communication range of edges. Besides, untrusted edges might misuse users' personal information. A novel FedRD named FedRD is developed to improve the coverage range of warning information and protect data privacy. In FedRD, a new hazardous road damage detection model is proposed leveraging the advantages of feature fusion. A novel adaptive federated learning strategy is designed for high-performance model learning from different edges. A new individualized differential privacy approach with pixelization is proposed to protect users' privacy before sharing data. Simulation results show that FedRD achieves similar high detection performance (i.e., 90.32% accuracy) but with more than 1000 times wider coverage than the state-of-the-art, and works well when some edges only have limited samples; besides, it largely preserves users' privacy. Finally, despite the success of EcRD and FedRD in improving latency and protecting privacy, they are only based on a single modality (i.e., image/video) while nowadays, different modalities data becomes ubiquitous. Also, the communication cost of EcRD and FedRD are very high due to undifferentiated data transmission (both normal and dangerous data) and frequent model exchanges in its federated learning setting, respectively. A novel edge-cloud-based privacy-preserving Federated Multimodal learning framework for Road Danger detection and warning named FedMRD is introduced to leverage the multi-modality data in the real-world and reduce communication costs. In FedMRD, a novel multimodal road danger detection model considering both inter-and intra-class relations is developed. A communication-efficient federated learning strategy is proposed for collaborative model learning from edges with non-iid and imbalanced data. Further, a new multimodal differential privacy technique for high dimensional multimodal data with multiple attributes is introduced to protect data privacy directly on users' devices before uploading to edges. Experimental results demonstrate that FedMRD achieves around 96.42% higher accuracy with only 0.0351s latency and up to 250 times less communication cost compared with the state-of-the-art, and enables collaborative learning from multiple edges with non-iid and imbalanced data in different modalities while preservers users' privacy.2021-11-2

    A Survey of Multimodal Information Fusion for Smart Healthcare: Mapping the Journey from Data to Wisdom

    Full text link
    Multimodal medical data fusion has emerged as a transformative approach in smart healthcare, enabling a comprehensive understanding of patient health and personalized treatment plans. In this paper, a journey from data to information to knowledge to wisdom (DIKW) is explored through multimodal fusion for smart healthcare. We present a comprehensive review of multimodal medical data fusion focused on the integration of various data modalities. The review explores different approaches such as feature selection, rule-based systems, machine learning, deep learning, and natural language processing, for fusing and analyzing multimodal data. This paper also highlights the challenges associated with multimodal fusion in healthcare. By synthesizing the reviewed frameworks and theories, it proposes a generic framework for multimodal medical data fusion that aligns with the DIKW model. Moreover, it discusses future directions related to the four pillars of healthcare: Predictive, Preventive, Personalized, and Participatory approaches. The components of the comprehensive survey presented in this paper form the foundation for more successful implementation of multimodal fusion in smart healthcare. Our findings can guide researchers and practitioners in leveraging the power of multimodal fusion with the state-of-the-art approaches to revolutionize healthcare and improve patient outcomes.Comment: This work has been submitted to the ELSEVIER for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Proceedings of the 2019 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    In 2019 fand wieder der jährliche Workshop des Fraunhofer IOSB und des Lehrstuhls für Interaktive Echtzeitsysteme des Karlsruher Insitut für Technologie statt. Die Doktoranden beider Institutionen präsentierten den Fortschritt ihrer Forschung in den Themen Maschinelles Lernen, Machine Vision, Messtechnik, Netzwerksicherheit und Usage Control. Die Ideen dieses Workshops sind in diesem Buch gesammelt in der Form technischer Berichte
    • …
    corecore