5,944 research outputs found

    Efficient Method For Scratch Lines Noise Removal From Video

    Get PDF
    The digitalization and transfer of older films into high definition (HD) formats imply that high quality of restoration is necessary. Now a day�s Digital film restoration is an area under discussion of increasing interest to researchers and film archives alike. Old films, including cultural heritage masterpieces, are being digitally premastered and transferred into novel, higher quality formats and distributed through various means such as DVD, Blu-ray or HD pictures. Detection of Line scratches in old movies is a particularly difficult problem due to the variable spatiotemporal characteristics of this deficiency. Some of the main problems consist of sensitivity to noise and texture, and false detections due to thin vertical structures belonging to the scene. Automatic finding of image damaged regions is the key to automatic video image in-painting. Vertical scratches are the common damages in the old film. As the film is a collection of number of frames arrayed together to produce a motion sequence hence it becomes a lengthy and tedious work to process any video format in any manner. Normally if any scratch or noise generated on films it remains as it is on many frames in sequence in film which can be benefitted by the removal process by initially checking noise area on earlier slide. Hence proposed system is aimed at designing and developing of line scratches detection from old films and remove it. A line scratches detection algorithm based on edge detection is proposed. Edge detection is nothing but an image processing technique for finding the boundaries of objects inside images. The proposed algorithm first uses the operator which has the largest response to the vertical edge in Sobel operator to detect edges, and then uses canny operator to detect edges further. Third, we detect vertical lines in the image through probabilistic Hough transform. Finally, we obtain the true locations of the vertical lines scratches through morphology and width constraints. We contribute for removal of scratches using a new nonlinear continued fraction method dealing with both spatial and temporal information around the scratch is investigated in the restoration stage

    Review Paper on Automatic Scratch Lines Noise Removal from Video

    Get PDF
    The digitalization and transfer of older films into high definition (HD) formats imply that high quality of restoration is necessary. Now a day?s Digital film restoration is an area under discussion of increasing interest to researchers and film archives alike. Old films, including cultural heritage masterpieces, are being digitally premastered and transferred into novel, higher quality formats and distributed through various means such as DVD, Blu-ray or HD pictures. Detection of Line scratches in old movies is a particularly difficult problem due to the variable spatiotemporal characteristics of this deficiency. Some of the main problems consist of sensitivity to noise and texture, and fake detections due to thin vertical structures belonging to the scene. Automatic finding of image damaged regions is the key to automatic video image inpainting. Vertical scratches are the common damages in the old film. As the film is a collection of number of frames arrayed together to produce a motion sequence hence it becomes a lengthy and tedious work to process any video format in any manner. Normally if any scratch or noise generated on films it remains as it is on many frames in sequence in film which can be benefitted by the removal process by initially checking noise area on earlier slide. Hence proposed system is aimed at designing and developing of line scratches detection from old films and remove it

    Video inpainting for non-repetitive motion

    Get PDF
    Master'sMASTER OF SCIENC

    Grid computing for the numerical reconstruction of digital holograms

    Get PDF
    Digital holography has the potential to greatly extend holography's applications and move it from the lab into the field: a single CCD or other solid-state sensor can capture any number of holograms while numerical reconstruction within a computer eliminates the need for chemical processing and readily allows further processing and visualisation of the holographic image. The steady increase in sensor pixel count and resolution leads to the possibilities of larger sample volumes and of higher spatial resolution sampling, enabling the practical use of digital off-axis holography. However this increase in pixel count also drives a corresponding expansion of the computational effort needed to numerically reconstruct such holograms to an extent where the reconstruction process for a single depth slice takes significantly longer than the capture process for each single hologram. Grid computing - a recent innovation in largescale distributed processing -provides a convenient means of harnessing significant computing resources in an ad-hoc fashion that might match the field deployment of a holographic instrument. In this paper we consider the computational needs of digital holography and discuss the deployment of numericals reconstruction software over an existing Grid testbed. The analysis of marine organisms is used as an exemplar for work flow and job execution of in-line digital holography

    Learning Representations for Controllable Image Restoration

    Get PDF
    Deep Convolutional Neural Networks have sparked a renaissance in all the sub-fields of computer vision. Tremendous progress has been made in the area of image restoration. The research community has pushed the boundaries of image deblurring, super-resolution, and denoising. However, given a distorted image, most existing methods typically produce a single restored output. The tasks mentioned above are inherently ill-posed, leading to an infinite number of plausible solutions. This thesis focuses on designing image restoration techniques capable of producing multiple restored results and granting users more control over the restoration process. Towards this goal, we demonstrate how one could leverage the power of unsupervised representation learning. Image restoration is vital when applied to distorted images of human faces due to their social significance. Generative Adversarial Networks enable an unprecedented level of generated facial details combined with smooth latent space. We leverage the power of GANs towards the goal of learning controllable neural face representations. We demonstrate how to learn an inverse mapping from image space to these latent representations, tuning these representations towards a specific task, and finally manipulating latent codes in these spaces. For example, we show how GANs and their inverse mappings enable the restoration and editing of faces in the context of extreme face super-resolution and the generation of novel view sharp videos from a single motion-blurred image of a face. This thesis also addresses more general blind super-resolution, denoising, and scratch removal problems, where blur kernels and noise levels are unknown. We resort to contrastive representation learning and first learn the latent space of degradations. We demonstrate that the learned representation allows inference of ground-truth degradation parameters and can guide the restoration process. Moreover, it enables control over the amount of deblurring and denoising in the restoration via manipulation of latent degradation features

    Preventing School-bullying through Automated Video Analysis

    Get PDF
    Currently, humanity strives to prevent discrimination, whether through offensive words or violent attitudes. Most teenagers who suffer bullying in school have difficulties in the learning process and consequently low grades. Most of the recent studies carried out by professionals in the health department show that the marks left by events of this type can bring illnesses such as depression, low self-esteem, and self-destructive behaviors. To address this problem non-profit institutions appear to prevent this kind of action through sensibility campaigns. However, these institutions have limitations that make it impossible to diagnose most of these occurrences, creating a lack of assistance for the victim. These reasons motivate us to search for new solutions with the help of automated systems that will make it possible to detect, at the exact moment, the persons involved in bullying actions in school property. With the help of a Portuguese non-profit bullying organization, a study was made to collect information about the most known behaviors of persons involved in bullying actions and their effects on society to have good guidelines to identify this events. Next, we carried out an investigation about technologies used in computer vision and artificial intelligence that allow the analysis of videos captured by surveillance cameras and can predict which type of action is inhered in each one. We present a variety of architectures since the first model capable to classify human behavior on videos, until the current times, where state-of-the art architectures, composed by two 3D convolutions streams, able to extract spatial and temporal features were developed. To search previous studies in the deep learning area related to bullying recognition in school videos, three scientific papers were found that already had investigated this kind of problem. Our analysis derived by the studies shows us the need to create a novel dataset able to represent all types of existing bullying actions and a new model architecture capable of identifying these events with high accuracy. Following the previous studies made in Chapter 2 and 3, a few guidelines were created to mimic bullying behavior on school grounds with a group of teenagers. Three hundred fifty clips were shot in bathrooms, classrooms, hallways, and canteens with five kids aged 7 to 18 years old. Another 200 films were acquired from the Internet and categorized alongside the recorded videos, producing a balanced dataset of 550 trimmed videos. The data cleaning process removed audio and black sidebars. The Kinetics 400 was downloaded and applied for fine-tuning deep learning pipelines. In terms of models, the SlowFast, I3D, C2D, and FGN architectures were used to construct the application. The FGN was the only model that produced plausible results when trained from scratch, finishing the training process with an accuracy on the test dataset of around 70%. However, when the ideal threshold is employed, this value drops to around 51%. Following the successful training from scratch with the FGN, a training strategy known as K-Fold Cross Validation was implemented, which divided the dataset into ten pieces to test the entire dataset. The final result is the average of the ten models, which attained an accuracy of 65.67%. When trained from scratch, the other three models could not converge to a minimum and only got satisfying performance when fine-tuned using the Kinetics 400 weights. These three models do not perform well when trained from scratch since they contain numerous parameters that must be changed, signaling that more extensive datasets are required. The SlowFast model obtained approximately 83% when selecting the class with highest probability. However, this score was maintained when adopting the optimum threshold. The I3D model scored 81% on the test dataset, when considered the class with highest probability. However, determining the appropriate threshold achieved the best accuracy of approximately 87%. Finally, the C2D model obtained approximately 77% accuracy on the test dataset. This model maintained this performance when computed and utilizing the optimum threshold. These thresholds were determined using the ROC Curve, which looked for the best threshold with the highest number of true positives and the lowest amount of false positives. Ultimately, this study offered a unique bullying dataset with activities that highlight the bullying theme and have more attributes than well-known conflict datasets. After cleaning and labeling the dataset, 550 bullying and non-bullying trimming films were produced. Due to the sensitivity of the topic and the requirement for authorization from the student’s responsible entity, the filming procedure of the movies, getting the school locations and students, was challenging. It was suggested for future work to use network compression techniques through knowledge distillation, teaching a student model with a smaller size with knowledge derived from a huge model, to reduce the number of parameters and thus the number of computing resources while maintaining accuracy. This approach has advantages since it allows the model to be performed in inference mode on IoT devices rather than transferring data over the Internet to large data centers. This method provides an additional security layer to an application because of the sensitive bullying topic and school video information. Another enhancement proposal is to record new bullying and non-bullying films to offer more features and variation to the dataset.Atualmente, a humanidade luta contra a discriminação, seja ela praticada através de palavras ofensivas ou atitudes violentas. Muitos dos adolescentes que sofrem de bullying na escola têm dificuldades no processo de aprendizagem e consequentemente resultados negativos. Os mais recentes estudos feitos por profissionais da área de saúde mostram que o bullying pode deixar marcas na vida dos adolescentes através do surgimento de doenças tais como depressão, baixa autoestima, comportamentos auto-destrutivos, entre outras. Obviamente, estes problemas reduzem drasticamente a qualidade de vida da pessoa, uma vez que podem despoletar traumas socais, físicos e psicológicos na vítima. Foram criadas organizações sem fins lucrativos com o intuito de prevenir a ocorrência de ações de bullying nas escolas através de campanhas de sensibilização. Mas para além dessas campanhas, as instituições têm dificuldade em identificar esses acontecimentos, o que impede que se possa dar um correto e rápido suporte à vitima. Estes fatores levam-nos a procurar novas soluções com ajuda de sistemas automáticos, capazes de detetar, no exato momento, a ocorrência de um ato de bullying numa escola e consequentemente as pessoas envolvidas no mesmo. Com a ajuda de uma associação sem fins lucrativos portuguesa, foi realizado um estudo que procura identificar os comportamentos mais comuns nas pessoas que se encontram envolvidas nestes atos, e os efeitos que podem trazer para a sociedade, com o objetivo de tornar claro os padrões intrínsecos aos atos de bullying, possibilitando desta forma reconhecer com maior facilidade estas ações. De seguida, foi realizado um estudo aprofundado acerca das tecnologias e ferramentas utilizadas na área de visão computacional e inteligência artificial, que possibilitam a análise de vídeos capturados em câmaras de vigilância, e consequentemente identificam os tipos de ações humanas existentes. Este estudo começa com as abordagens clássicas de aprendizagem profunda, redes neuronais convolucionais 2D e termina com a utilização de redes avançadas onde são implementadas duas redes neuronais convolucionais 3D, cada uma com funções diferentes, uma responsável pela extração de características estáticas e a outra responsável pela análise do movimento. Antes de se prosseguir para o desenvolvimento, foi realizado um estudo científico em vários trabalhos já efetuados, que abordaram o tema de bullying, no contexo das tecnologias de aprendizagem profunda. Foram encontrados três artigos que estudaram a possibilidade de utilizar diversas arquiteturas de redes convolucionais e diferentes conjuntos de dados para abordar o problema. Com a leitura e análise desses documentos, concluí-se que existe a necessidade de criar um conjunto de dados que caracterizem o problema através de um grande leque de videos com ações de bullying, e a necessidade de desenvolver um modelo que consiga identificar com uma grande taxa de acerto estas ações em vídeos capturados em cenários realistas. Depois do estudo realizado nos dois capítulos anteriores, foram criados vários guiões para planear cenários encenados de ações de bullying e não-bullying com estudantes em propriedade escolar. As gravções originaram 350 videos, tendo como cenário casas de banho, salas de aula, cantinas e parques exteriores. Outros 200 vídeos foram transferidos da Internet através do site World Star HipHop. Posteriormente, os 550 videos sofreram um processo de limpeza onde foi removido som e as barras pretas presentes nas laterais. O processo de anotação criou vídeos com sequências de tempo entre os 5s e os 12s. O dataset Kinetics 400 também foi transferido e utilizado para os métodos de destilação de conhecimento e ajuste dos pesos com o dataset YNF. Em relação aos modelos utilizados na fase de desenvolvimento, foram implementadas as arquiteturas SlowFast, I3D, C2D, e FGN. FGN foi o único modelo capaz de convergir para um mínimo quando treinado com pesos incializados aleatoriamente. No final do processo de treino e validação o modelo atingiu uma taxa de acerto no conjunto de teste perto dos 70%, sofrendo uma redução significativa para os 51% quando utilizado o valor de separação ótimo entre as duas classes. Esta redução ocorreu devido à taxa de acerto inicial ter sido calculada com base no valor de separação de 0.5, enquanto que o valor que garante o maior número de verdadeiros positivos e o menor número de falsos positivos é de aproximadamente 0.87. Uma vez que o conjunto de dados recolhido é de apenas 550 videos, o que implica um reduzido número de instâncias de teste, foi implementada a técnica de treino K-Fold Cross Validation, no modelo FGN. Este processo atingiu uma taxa de acerto de 65.67%. Os restantes 3 modelos foram incializados com os pesos do conjunto de dados Kinetics 400 e sofreram um ajuste dos pesos atráves do processo de treino com o conjunto de dados YNF. O facto de estes modelos terem um grande número de parâmetros para atualizar ao longo do treino, implica o uso de grandes conjuntos de dados para convergir para um mínimo quando treinados com pesos inicializados aleatoriamente. O facto de o conjunto de dados recolhido ter apenas 550 vídeos impediu que estes atingissem um bom desempenho quando treinados sem qualquer conhecimento prévio. A arquitetura de rede SlowFast atingiu uma taxa de acerto de aproximadamente 83%, quando utilizado o valor de separação entre as duas classes de 0.5. A taxa de acerto no conjunto de teste foi igual quando utilizado o valor ótimo de separação através da métrica ROC Curve. O segundo modelo, I3D atingiu uma taxa de acerto de 81% no conjunto de teste e quando contabilizado o valor de separação ótimo, aumentou o desempenho para aproximadamente 87%. O último modelo treinado, C2D atingiu uma taxa de acerto no conjunto de teste de aproximadamente 77%, acabando por manter a mesma taxa de acerto quando contabilizado o valor ótimo de separação entre classes. Os valores ótimos de separação foram calculados atráves da métrica ROC Curve, que procurou o melhor valor de forma a reduzir o número de instâncias falsas positivas e aumentar o número de instâncias verdadeiras positivas. Em conclusão, este trabalho apresentou um conjunto de dados que expressa várias ações de bullying e não-bullying entre estudantes em propriedade escolar. Este foi criado devido à inexistência de dados que retratem o problema de bullying na sua totalidade, para além de violência física, focando-se em situações de gozo, roubo e intimidação. Com o conjunto de dados anotado e limpo, foram utilizados no processo de treino e validação de 5 modelos de aprendizagem profunda para análise de vídeo com o intuito de criar uma aplicação capaz de diferenciar ações de bullying e não-bullying. O modelo que foi capaz de realizar essa distinção com a melhor taxa de acerto foi a arquitetura I3D, inicializado com os pesos do conjunto de dados Kinetics 400, atingindo 87 % no conjunto de teste, com o valor ótimo de separação entre classes. Para trabalho futuro é mencionada a técnica de destilação de conhecimento utilizada para reduzir o tamanho das redes profundas, diminuindo consequentemente os recursos computacionais necessários para executar os modelos. Uma das vantagens do uso desta técnica é a possibilidade de fazer o desenvolvimento de aplicações de inteligência artificial em dispositivos IoT com poucos recursos de energia e processamento, mantendo a mesma taxa de acerto adquirida com modelos de maiores dimensões. Devido à sensibilidade da comunidade relativamente ao tema de bullying e partilha de dados visuais relativos a crianças menores de idade em escolas, a possibilidade de realizar inferência sem enviar dados pela Internet para grandes data-centers, adiciona uma camada de segurança às aplicações. Outra das sugestões para melhorar o desempenho da aplicação apresentada nesta dissertação é a gravação de novos vídeos, aumentando substancialmente a variedade de ações
    corecore