6 research outputs found

    Deep Features and Clustering Based Keyframes Selection with Security

    Get PDF
    The digital world is developing more quickly than ever. Multimedia processing and distribution, however become vulnerable issues due to the enormous quantity and significance of vital information. Therefore, extensive technologies and algorithms are required for the safe transmission of messages, images, and video files. This paper proposes a secure framework by acute integration of video summarization and image encryption. Three parts comprise the proposed cryptosystem framework. The informative frames are first extracted using an efficient and lightweight technique that make use of the color histogram-clustering (RGB-HSV) approach's processing capabilities. Each frame of a video is represented by deep features, which are based on an enhanced pre-trained Inception-v3 network. After that summary is obtain using the K-means optimal clustering algorithm. The representative keyframes then extracted using the clusters highest possible entropy nodes. Experimental validation on two well-known standard datasets demonstrates the proposed methods superiority to numerous state-of-the-art approaches. Finally, the proposed framework performs an efficient image encryption and decryption algorithm by employing a general linear group function GLn (F). The analysis and testing outcomes prove the superiority of the proposed adaptive RSA

    A hybrid egocentric video summarization method to improve the healthcare for Alzheimer patients

    Get PDF
    Alzheimer patients face difficulty to remember the identity of persons and performing daily life activities. This paper presents a hybrid method to generate the egocentric video summary of important people, objects and medicines to facilitate the Alzheimer patients to recall their deserted memories. Lifelogging video data analysis is used to recall the human memory; however, the massive amount of lifelogging data makes it a challenging task to select the most relevant content to educate the Alzheimer’s patient. To address the challenges associated with massive lifelogging content, static video summarization approach is applied to select the key-frames that are more relevant in the context of recalling the deserted memories of the Alzheimer patients. This paper consists of three main modules that are face, object, and medicine recognition. Histogram of oriented gradient features are used to train the multi-class SVM for face recognition. SURF descriptors are employed to extract the features from the input video frames that are then used to find the corresponding points between the objects in the input video and the reference objects stored in the database. Morphological operators are applied followed by the optical character recognition to recognize and tag the medicines for Alzheimer patients. The performance of the proposed system is evaluated on 18 real-world homemade videos. Experimental results signify the effectiveness of the proposed system in terms of providing the most relevant content to enhance the memory of Alzheimer patients

    Generative AI in the Construction Industry: A State-of-the-art Analysis

    Full text link
    The construction industry is a vital sector of the global economy, but it faces many productivity challenges in various processes, such as design, planning, procurement, inspection, and maintenance. Generative artificial intelligence (AI), which can create novel and realistic data or content, such as text, image, video, or code, based on some input or prior knowledge, offers innovative and disruptive solutions to address these challenges. However, there is a gap in the literature on the current state, opportunities, and challenges of generative AI in the construction industry. This study aims to fill this gap by providing a state-of-the-art analysis of generative AI in construction, with three objectives: (1) to review and categorize the existing and emerging generative AI opportunities and challenges in the construction industry; (2) to propose a framework for construction firms to build customized generative AI solutions using their own data, comprising steps such as data collection, dataset curation, training custom large language model (LLM), model evaluation, and deployment; and (3) to demonstrate the framework via a case study of developing a generative model for querying contract documents. The results show that retrieval augmented generation (RAG) improves the baseline LLM by 5.2, 9.4, and 4.8% in terms of quality, relevance, and reproducibility. This study provides academics and construction professionals with a comprehensive analysis and practical framework to guide the adoption of generative AI techniques to enhance productivity, quality, safety, and sustainability across the construction industry.Comment: 74 pages, 11 figures, 20 table

    Image Restoration Under Adverse Illumination for Various Applications

    Get PDF
    Many images are captured in sub-optimal environment, resulting in various kinds of degradations, such as noise, blur, and shadow. Adverse illumination is one of the most important factors resulting in image degradation with color and illumination distortion or even unidentified image content. Degradation caused by the adverse illumination makes the images suffer from worse visual quality, which might also lead to negative effects on high-level perception tasks, e.g., object detection. Image restoration under adverse illumination is an effective way to remove such kind of degradations to obtain visual pleasing images. Existing state-of-the-art deep neural networks (DNNs) based image restoration methods have achieved impressive performance for image visual quality improvement. However, different real-world applications require the image restoration under adverse illumination to achieve different goals. For example, in the computational photography field, visually pleasing image is desired in the smartphone photography. Nevertheless, for traffic surveillance and autonomous driving in the low light or nighttime scenario, high-level perception tasks, \e.g., object detection, become more important to ensure safe and robust driving performance. Therefore, in this dissertation, we try to explore DNN-based image restoration solutions for images captured under adverse illumination in three important applications: 1) image visual quality enhancement, 2) object detection improvement, and 3) enhanced image visual quality and better detection performance simultaneously. First, in the computational photography field, visually pleasing images are desired. We take shadow removal task as an example to fully explore image visual quality enhancement. Shadow removal is still a challenging task due to its inherent background-dependent and spatial-variant properties, leading to unknown and diverse shadow patterns. We propose a novel solution by formulating this task as an exposure fusion problem to address the challenges. We propose shadow-aware FusionNet to `smartly\u27 fuse multiple over-exposure images with pixel-wise fusion weight maps, and boundary-aware RefineNet to eliminate the remaining shadow trace further. Experiment results show that our method outperforms other CNN-based methods in three datasets. Second, we explore the application of CNN-based night-to-day image translation for improving vehicle detection in traffic surveillance that is important for safe and robust driving. We propose a detail-preserving method to implement the nighttime to daytime image translation and thus adapt daytime trained detection model to nighttime vehicle detection. We utilize StyleMix method to acquire paired images of daytime and nighttime for the nighttime to daytime image translation training. The translation is implemented based on kernel prediction network to avoid texture corruption. Experimental results showed that the proposed method can better address the nighttime vehicle detection task by reusing the daytime domain knowledge. Third, we explore the image visual quality and facial landmark detection improvement simultaneously. For the portrait images captured in the wild, the facial landmark detection can be affected by the cast shadow. We construct a novel benchmark SHAREL covering diverse face shadow patterns with different intensities, sizes, shapes, and locations to study the effects of shadow removal on facial landmark detection. Moreover, we propose a novel adversarial shadow attack to mine hard shadow patterns. We conduct extensive analysis on three shadow removal methods and three landmark detectors. Then, we design a novel landmark detection-aware shadow removal framework, which empowers shadow removal to achieve higher restoration quality and enhances the shadow robustness of deployed facial landmark detectors

    Video Summarization Using Unsupervised Deep Learning

    Get PDF
    In this thesis, we address the task of video summarization using unsupervised deep-learning architectures. Video summarization aims to generate a short summary by selecting the most informative and important frames (key-frames) or fragments (key-fragments) of the full-length video, and presenting them in temporally-ordered fashion. Our objective is to overcome observed weaknesses of existing video summarization approaches that utilize RNNs for modeling the temporal dependence of frames, related to: i) the small influence of the estimated frame-level importance scores in the created video summary, ii) the insufficiency of RNNs to model long-range frames' dependence, and iii) the small amount of parallelizable operations during the training of RNNs. To address the first weakness, we propose a new unsupervised network architecture, called AC-SUM-GAN, which formulates the selection of important video fragments as a sequence generation task and learns this task by embedding an Actor-Critic model in a Generative Adversarial Network. The feedback of a trainable Discriminator is used as a reward by the Actor-Critic model in order to explore a space of actions and learn a value function (Critic) and a policy (Actor) for video fragment selection. To tackle the remaining weaknesses, we investigate the use of attention mechanisms for video summarization and propose a new supervised network architecture, called PGL-SUM, that combines global and local multi-head attention mechanisms which take into account the temporal position of the video frames, in order to discover different modelings of the frames' dependencies at different levels of granularity. Based on the acquired experience, we then propose a new unsupervised network architecture, called CA-SUM, which estimates the frames' importance using a novel concentrated attention mechanism that focuses on non-overlapping blocks in the main diagonal of the attention matrix and takes into account the attentive uniqueness and diversity of the associated frames of the video. All the proposed architectures have been extensively evaluated on the most commonly-used benchmark datasets, demonstrating their competitiveness against other approaches and documenting the contribution of our proposals on advancing the current state-of-the-art on video summarization. Finally, we make a first attempt on producing explanations for the video summarization results. Inspired by relevant works in the Natural Language Processing domain, we propose an attention-based method for explainable video summarization and we evaluate the performance of various explanation signals using our CA-SUM architecture and two benchmark datasets for video summarization. The experimental results indicate the advanced performance of explanation signals formed using the inherent attention weights, and demonstrate the ability of the proposed method to explain the video summarization results using clues about the focus of the attention mechanism
    corecore