9 research outputs found

    Can you tell a face from a HEVC bitstream?

    Full text link
    Image and video analytics are being increasingly used on a massive scale. Not only is the amount of data growing, but the complexity of the data processing pipelines is also increasing, thereby exacerbating the problem. It is becoming increasingly important to save computational resources wherever possible. We focus on one of the poster problems of visual analytics -- face detection -- and approach the issue of reducing the computation by asking: Is it possible to detect a face without full image reconstruction from the High Efficiency Video Coding (HEVC) bitstream? We demonstrate that this is indeed possible, with accuracy comparable to conventional face detection, by training a Convolutional Neural Network on the output of the HEVC entropy decoder

    SLEPX: An Efficient Lightweight Cipher for Visual Protection of Scalable HEVC Extension

    Get PDF
    This paper proposes a lightweight cipher scheme aimed at the scalable extension of the High Efficiency Video Coding (HEVC) codec, referred to as the Scalable HEVC (SHVC) standard. This stream cipher, Symmetric Cipher for Lightweight Encryption based on Permutation and EXlusive OR (SLEPX), applies Selective Encryption (SE) over suitable coding syntax elements in the SHVC layers. This is achieved minimal computational complexity and delay. The algorithm also conserves most SHVC functionalities, i.e. preservation of bit-length, decoder format-compliance, and error resilience. For comparative analysis, results were taken and compared with other state-of-art ciphers i.e. Exclusive-OR (XOR) and the Advanced Encryption Standard (AES). The performance of SLEPX is also compared with existing video SE solutions to confirm the efficiency of the adopted scheme. The experimental results demonstrate that SLEPX is as secure as AES in terms of visual protection, while computationally efficient comparable with a basic XOR cipher. Visual quality assessment, security analysis and extensive cryptanalysis (based on numerical values of selected binstrings) also showed the effectiveness of SLEPX’s visual protection scheme for SHVC compared to previously-employed cryptographic technique

    Beyond the pixels: learning and utilising video compression features for localisation of digital tampering.

    Get PDF
    Video compression is pervasive in digital society. With rising usage of deep convolutional neural networks (CNNs) in the fields of computer vision, video analysis and video tampering detection, it is important to investigate how patterns invisible to human eyes may be influencing modern computer vision techniques and how they can be used advantageously. This work thoroughly explores how video compression influences accuracy of CNNs and shows how optimal performance is achieved when compression levels in the training set closely match those of the test set. A novel method is then developed, using CNNs, to derive compression features directly from the pixels of video frames. It is then shown that these features can be readily used to detect inauthentic video content with good accuracy across multiple different video tampering techniques. Moreover, the ability to explain these features allows predictions to be made about their effectiveness against future tampering methods. The problem is motivated with a novel investigation into recent video manipulation methods, which shows that there is a consistent drive to produce convincing, photorealistic, manipulated or synthetic video. Humans, blind to the presence of video tampering, are also blind to the type of tampering. New detection techniques are required and, in order to compensate for human limitations, they should be broadly applicable to multiple tampering types. This thesis details the steps necessary to develop and evaluate such techniques

    Low-cost system to detect people who wear a mask and their temperature through neural networks

    Get PDF
    Trabajo de Fin de Máster en Ingeniería Informática, Facultad de Informática UCM, Departamento de Arquitectura de Computadores y Automática, Curso 2020/2021El presente trabajo propone un sistema para la detección de personas que hagan uso de mascarilla, y a su vez para medir la temperatura cutánea mediante un dispositivo AIoT de bajo costo, compuesto por el MaixCube y el sensor AMG8833. En lo que se refiere a la detección de personas con mascarilla, se ha hecho uso de redes neuronales convolucionales, las cuales reciben como entrada una imagen tomada con la cámara del MaixCube. Para la clasificación de objetos, se han generado dos modelos de redes neuronales convolucionales de clasificación, basados en MobileNet y Tiny Yolo v2, como arquitectura interna para el sistema YOLO. Dichas redes neuronales ejecutan la fase de inferencia sobre el acelerador específico del MaixCube, consiguiendo 5-6 FPS con una precisión mayor del 80% en el caso de MobileNet. Además de conseguir un sistema funcional, entre las principales contribuciones del trabajo se encuentran la modificación del firmware MaixPy y la creación de un driver para el sensor AMG8833, los cuales se han integrado en el repositorio oficial de MaixPy.This document proposes a system for the detection of people who use a mask; also, to measure skin temperature using a low-cost AIoT device made of the MaixCube and the AMG8833 sensor. Regarding detection of people who use a mask, convolutional neural networks have been used, which receive an image taken with the MaixCube camera as input. For object classification, two models of convolutional neural networks have been generated, based on MobileNet and Tiny Yolo v2 as internal architecture for the YOLO system. These neural networks execute the inference phase on the specific Maixcube neural network accelerator, achieving 5-6 FPS with an accuracy higher than 80% in the case of MobileNet. In addition to achieving a functional system, among the main contributions of the work are the modification of the MaixPy firmware and the creation of a driver for the AMG8833 sensor. These have been uploaded to the official MaixPy repository.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Modes of feedback in ESL writing: Implications of shifting from text to screencast

    Get PDF
    For second language writing (SLW) instructors, decisions regarding technology-mediated feedback are particularly complex as they must also navigate student language proficiency, which may vary across different areas such as reading or listening. Yet technology-mediated feedback remains an underexplored realm in SLW especially with regard to how modes of technology affect feedback and how students interact with and understand it. With the expanding pervasiveness of video and increased access to screencasting (screen recording), SLW instructors have ever-growing access to video modes for feedback, yet little research to inform their choices. Further, with video potentially requiring substantial investment from institutions through hosting solutions, a research-informed perspective for adoption is advisable. However, few existing studies address SLW feedback given in the target language (common in ESL) or standalone (rather than supplemental) screencast feedback. This dissertation begins to expand SLW feedback research and fill this void through three investigations of screencast (video) and text (MS Word comments) feedback in ESL writing. The first paper uses a crossover design to investigate student perceptions and use of screencast feedback over four assignments given to 12 students in an intermediate ESL writing class through a combination of a series of surveys, a group interview and screen recorded observations of students working with the feedback. The second paper argues for appraisal an outgrowth of systemic functional linguistics (SFL) focused on evaluative language and interpersonal meaning, as a framework for understanding interpersonal differences in modes of feedback through an analysis of 16 text and 16 video feedback files from Paper 1. Paper 3 applies a more intricate version of the appraisal framework to the analysis of video and text feedback collected in a similar crossover design from three ESL writing instructors. Paper 1 demonstrates the added insights offered by recording students’ screens and their spoken interactions and shows that students needed to ask for help and switched to the L1 when working with text feedback but not video. The screencast feedback was found to be easier to understand and use, as MS Word comments were seen as being difficult to connect to the text. While students found both types of feedback to be helpful, they championed video feedback for its efficiency, clarity, ease of use and heightened understanding and would greatly prefer it for future feedback. Successful changes were made at similar rates for both types of feedback. The results of Paper 2 suggest possible variation between the video and text feedback in reviewer positioning and feedback purpose. Specifically, video seems to position the reviewer as holding only one of many possible perspectives with feedback focused on possibility and suggestion while the text seems to position the reviewer as authority with feedback focused on correctness. The findings suggest that appraisal can aid in the understanding of multimodal feedback and identifying differences between feedback modes. Building on these findings, Paper 3 shows substantial reduction in negative appreciation of the student text overall and for each instructor individually in video feedback as compared to text. Text feedback showed a higher proportion of negative attitude overall and positioned the instructor as a single authority. Video feedback, on the other hand, preserved student autonomy in its balanced use of praise and criticism, offered suggestion and advice and positioned the instructor as one of many possible opinions. Findings held true in sum and for each instructor individually suggesting that interpersonal considerations varied across modes. This study offers future feedback research a way to consider the interpersonal aspects of feedback across multiple modes and situations. It provides standardization procedures for applying and quantifying appraisal analysis in feedback that allow for comparability across studies. Future work applying the framework to other modes, such as audio, and situations, such as instructor conferences, peer review, or tutoring are encouraged. The study also posits the framework as a tool in instructor reflection and teacher training. Taken together the three studies deepen our understanding of the impact of our technological choices in the context of feedback. Video feedback seems to be a viable replacement for text feedback as it was found to be at least as effective for revision, while being greatly preferred by students for its ease of use and understanding. With the understanding of how students use feedback in different modes, instructors can better craft feedback and training for their students. For instance, instructors must remember to pause after comments in screencast feedback to allow students time to hit pause or revise. Video was also seen to allow for greater student agency in their work and position instructor feedback as suggestions that the student could act upon. These insights can help instructors choose and employ technology in ways that will best support their pedagogical purposes

    Benchmarking of Embedded Object Detection in Optical and RADAR Scenes

    Get PDF
    A portable, real-time vital sign estimation protoype is developed using neural network- based localization, multi-object tracking, and embedded processing optimizations. The system estimates heart and respiration rates of multiple subjects using directional of arrival techniques on RADAR data. This system is useful in many civilian and military applications including search and rescue. The primary contribution from this work is the implementation and benchmarking of neural networks for real time detection and localization on various systems including the testing of eight neural networks on a discrete GPU and Jetson Xavier devices. Mean average precision (mAP) and inference speed benchmarks were performed. We have shown fast and accurate detection and tracking using synthetic and real RADAR data. Another major contribution is the quantification of the relationship between neural network mAP performance and data augmentations. As an example, we focused on image and video compression methods, such as JPEG, WebP, H264, and H265. The results show WebP at a quantization level of 50 and H265 at a constant rate factor of 30 provide the best balance between compression and acceptable mAP. Other minor contributions are achieved in enhancing the functionality of the real-time prototype system. This includes the implementation and benchmarking of neural network op- timizations, such as quantization and pruning. Furthermore, an appearance-based synthetic RADAR and real RADAR datasets are developed. The latter contains simultaneous optical and RADAR data capture and cross-modal labels. Finally, multi-object tracking methods are benchmarked and a support vector machine is utilized for cross-modal association. In summary, the implementation, benchmarking, and optimization of methods for detection and tracking helped create a real-time vital sign system on a low-profile embedded device. Additionally, this work established a relationship between compression methods and different neural networks for optimal file compression and network performance. Finally, methods for RADAR and optical data collection and cross-modal association are implemented

    Técnicas de compresión de imágenes hiperespectrales sobre hardware reconfigurable

    Get PDF
    Tesis de la Universidad Complutense de Madrid, Facultad de Informática, leída el 18-12-2020Sensors are nowadays in all aspects of human life. When possible, sensors are used remotely. This is less intrusive, avoids interferces in the measuring process, and more convenient for the scientist. One of the most recurrent concerns in the last decades has been sustainability of the planet, and how the changes it is facing can be monitored. Remote sensing of the earth has seen an explosion in activity, with satellites now being launched on a weekly basis to perform remote analysis of the earth, and planes surveying vast areas for closer analysis...Los sensores aparecen hoy en día en todos los aspectos de nuestra vida. Cuando es posible, de manera remota. Esto es menos intrusivo, evita interferencias en el proceso de medida, y además facilita el trabajo científico. Una de las preocupaciones recurrentes en las últimas décadas ha sido la sotenibilidad del planeta, y cómo menitoirzar los cambios a los que se enfrenta. Los estudios remotos de la tierra han visto un gran crecimiento, con satélites lanzados semanalmente para analizar la superficie, y aviones sobrevolando grades áreas para análisis más precisos...Fac. de InformáticaTRUEunpu
    corecore