531 research outputs found

    The Role of Multiple Articulatory Channels of Sign-Supported Speech Revealed by Visual Processing

    Get PDF
    Purpose The use of sign-supported speech (SSS) in the education of deaf students has been recently discussed in relation to its usefulness with deaf children using cochlear implants. To clarify the benefits of SSS for comprehension, 2 eye-tracking experiments aimed to detect the extent to which signs are actively processed in this mode of communication. Method Participants were 36 deaf adolescents, including cochlear implant users and native deaf signers. Experiment 1 attempted to shift observers' foveal attention to the linguistic source in SSS from which most information is extracted, lip movements or signs, by magnifying the face area, thus modifying lip movements perceptual accessibility (magnified condition), and by constraining the visual field to either the face or the sign through a moving window paradigm (gaze contingent condition). Experiment 2 aimed to explore the reliance on signs in SSS by occasionally producing a mismatch between sign and speech. Participants were required to concentrate upon the orally transmitted message. Results In Experiment 1, analyses revealed a greater number of fixations toward the signs and a reduction in accuracy in the gaze contingent condition across all participants. Fixations toward signs were also increased in the magnified condition. In Experiment 2, results indicated less accuracy in the mismatching condition across all participants. Participants looked more at the sign when it was inconsistent with speech. Conclusions All participants, even those with residual hearing, rely on signs when attending SSS, either peripherally or through overt attention, depending on the perceptual conditions.Unión Europea, Grant Agreement 31674

    Selective Scene Text Removal

    Full text link
    Scene text removal (STR) is the image transformation task to remove text regions in scene images. The conventional STR methods remove all scene text. This means that the existing methods cannot select text to be removed. In this paper, we propose a novel task setting named selective scene text removal (SSTR) that removes only target words specified by the user. Although SSTR is a more complex task than STR, the proposed multi-module structure enables efficient training for SSTR. Experimental results show that the proposed method can remove target words as expected.Comment: 12 pages, 8 figures, Accepted at the 34th British Machine Vision Conferenc

    From North to South: African Librarianhip in the new millennium

    Get PDF
    ANNUAL PUBLIC LECTURE ON AFRICAN LIBRARIANSHIP IN THE 21ST CENTURY, HOSTED BY UNISA LIBRARY IN PARTNERSHIP WTH IFLA REGIONAL OFFICE FOR AFRICAIFLA REGIONAL OFFICE FOR AFRIC

    A Review of Fog Computing Concept, Architecture, Application, Parameters and Challenges

    Get PDF
    The Internet of Things (IoT) has become an integral part of our daily lives, growing exponentially from a facility to a necessity. IoT has been utilized extensively through cloud computing and has proven an excellent technology for deploying in various fields. The data generated by the IoT devices gets transmitted to the cloud for processing and storage. However, with this approach, there are specific issues like latency, energy, computation resources availability, bandwidth, heterogeneity, storage, and network failure. To overcome these obstacles, fog computing is utilized as a middle tier. Fog computing gathers and processes the generated data closer to the user end before transmitting it to the cloud. This paper aims to conduct a structured review of the current state of fog computing and its architectures deployed across multiple industries. This paper also focuses on the implementation and critical parameters for introducing fog computing in IoT-cloud architecture. A detailed comparative analysis has been carried out for 5 different architectures considering various crucial parameters to identify how the quality of service and quality of experience for end users can be optimized. Finally, this paper looks at the multiple challenges that fog computing faces in a structured six-level approach. These challenges will also lead the way for future research in resource management, green computing, and security

    Box2Poly: Memory-Efficient Polygon Prediction of Arbitrarily Shaped and Rotated Text

    Full text link
    Recently, Transformer-based text detection techniques have sought to predict polygons by encoding the coordinates of individual boundary vertices using distinct query features. However, this approach incurs a significant memory overhead and struggles to effectively capture the intricate relationships between vertices belonging to the same instance. Consequently, irregular text layouts often lead to the prediction of outlined vertices, diminishing the quality of results. To address these challenges, we present an innovative approach rooted in Sparse R-CNN: a cascade decoding pipeline for polygon prediction. Our method ensures precision by iteratively refining polygon predictions, considering both the scale and location of preceding results. Leveraging this stabilized regression pipeline, even employing just a single feature vector to guide polygon instance regression yields promising detection results. Simultaneously, the leverage of instance-level feature proposal substantially enhances memory efficiency (>50% less vs. the state-of-the-art method DPText-DETR) and reduces inference speed (>40% less vs. DPText-DETR) with minor performance drop on benchmarks

    A Fuzzy Logic based Privacy Preservation Clustering method for achieving K- Anonymity using EMD in dLink Model

    Get PDF
    Privacy preservation is the data mining technique which is to be applied on the databases without violating the privacy of individuals. The sensitive attribute can be selected from the numerical data and it can be modified by any data modification technique. After modification, the modified data can be released to any agency. If they can apply data mining techniques such as clustering, classification etc for data analysis, the modified data does not affect the result. In privacy preservation technique, the sensitive data is converted into modified data using S-shaped fuzzy membership function. K-means clustering is applied for both original and modified data to get the clusters. t-closeness requires that the distribution of sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table. Earth Mover Distance (EMD) is used to measure the distance between the two distributions should be no more than a threshold t. Hence privacy is preserved and accuracy of the data is maintained

    Backbones-Review: Feature Extraction Networks for Deep Learning and Deep Reinforcement Learning Approaches

    Full text link
    To understand the real world using various types of data, Artificial Intelligence (AI) is the most used technique nowadays. While finding the pattern within the analyzed data represents the main task. This is performed by extracting representative features step, which is proceeded using the statistical algorithms or using some specific filters. However, the selection of useful features from large-scale data represented a crucial challenge. Now, with the development of convolution neural networks (CNNs), the feature extraction operation has become more automatic and easier. CNNs allow to work on large-scale size of data, as well as cover different scenarios for a specific task. For computer vision tasks, convolutional networks are used to extract features also for the other parts of a deep learning model. The selection of a suitable network for feature extraction or the other parts of a DL model is not random work. So, the implementation of such a model can be related to the target task as well as the computational complexity of it. Many networks have been proposed and become the famous networks used for any DL models in any AI task. These networks are exploited for feature extraction or at the beginning of any DL model which is named backbones. A backbone is a known network trained in many other tasks before and demonstrates its effectiveness. In this paper, an overview of the existing backbones, e.g. VGGs, ResNets, DenseNet, etc, is given with a detailed description. Also, a couple of computer vision tasks are discussed by providing a review of each task regarding the backbones used. In addition, a comparison in terms of performance is also provided, based on the backbone used for each task

    Visual Objectification in Films: Towards a New AI Task for Video Interpretation

    Full text link
    In film gender studies, the concept of 'male gaze' refers to the way the characters are portrayed on-screen as objects of desire rather than subjects. In this article, we introduce a novel video-interpretation task, to detect character objectification in films. The purpose is to reveal and quantify the usage of complex temporal patterns operated in cinema to produce the cognitive perception of objectification. We introduce the ObyGaze12 dataset, made of 1914 movie clips densely annotated by experts for objectification concepts identified in film studies and psychology. We evaluate recent vision models, show the feasibility of the task and where the challenges remain with concept bottleneck models. Our new dataset and code are made available to the community.Comment: 12 pages, 3 figures, 2 table

    Advances in AI-Generated Images and Videos.

    Get PDF
    In recent years generative AI models and tools have experienced a significant increase, especially techniques to generate synthetic multimedia content, such as images or videos. These methodologies present a wide range of possibilities; however, they can also present several risks that should be taken into account. In this survey we describe in detail different techniques for generating synthetic multimedia content, and we also analyse the most recent techniques for their detection. In order to achieve these objectives, a key aspect is the availability of datasets, so we have also described the main datasets available in the state of the art. Finally, from our analysis we have extracted the main trends for the future, such as transparency and interpretability, the generation of multimodal multimedia content, the robustness of models and the increased use of diffusion models. We find a roadmap of deep challenges, including temporal consistency, computation requirements, generalizability, ethical aspects, and constant adaptation

    Deep Learning for activity recognition in real-time video streams

    Get PDF
    Dissertação de mestrado integrado em Engenharia InformáticaIn an ever more connected world, smart cities are becoming ever more present in our society. In these smart cities, use cases in which innovations that will benefit its inhabitants are also growing, improving their quality of life. One of these areas is safety, in which Machine Learning (ML) models reveal potential in real-time video-stream analysis in order to determine if violence exists in them. These ML approaches concern the field of Computer Vision, a field responsible for traducing digital images and videos, and be able to extract knowledge and understandable information from them, in order to be used in diverse contexts. Some of the available alternatives to recognise actions in video streams are based on ML approaches, such as Deep Learning (DL), that grew in popularity in the last years, as it was realised that it had massive potential in several applications that could benefit from having a machine recognising diverse human actions. In this project, the creation of a ML model that can determine if violence exists in a video-stream is proposed. This model will leverage technology being used in State of the Art methods, such as video classifiers, but also audio classifiers, and Early/Late Fusion (EF / LF) schemes that allow the merging different modalities, in the case of the present work: audio and video. Conclusions will also be drawn as to the accuracy rates of the different types of classifiers, to determine if any other type of classifiers should have more prominence in the State of the Art. This document begins with an introduction to the work being conducted, in which both the its context, mo tivation and objectives are explained. Afterwards, the methodology used in order to more efficiently conduct the research in this Thesis is clarified. Following that, the State of the Art concerning ML based approaches to Action Recognition and Violence Detection is explored. After being brought to date in what are the State of the Art approaches, one is able to move forward to the following chapter, in which the Training method that will be employed to train the models that were seen as the best candidates to detect violence is detailed. Subsequently, the selected models are scrutinized in an effort to better understand their architecture, and why they are suited to detect violence. Afterwards, the results achieved by these models are explored, in order to better comprehend how well these performed. Lastly, the conclusions that were reached after conducting this research are stated, and possibilities for expanding this work further are also presented. The obtained results prove the success and prevalence of video classifiers, and also show the efficacy of models that make use of some kind of fusion.Num mundo cada vez mais conetado, as cidades inteligentes tornam-se cada vez mais presentes na nossa sociedade. Nestas cidades inteligentes, crescem também os casos de uso nos quais podem ser aplicadas inovações que beneficiarão os seus habitantes, melhorando a sua qualidade de vida. Uma dessas áreas é a da segurança, na qual modelos de Aprendizagem Máquina (AM) apresentam potencial para analisar streams de vídeo em tempo real e determinar se nestas existe violência. Estas abordagens de AM são referentes ao campo de Visão por Computador, um campo responsável pela tradução de imagens e vídeos digitais, e pela extração de conhecimento e informação inteligível dos mesmos, de modo a ser utilizada em diversos contextos. Algumas das alternativas disponíveis para reconhecer ações em streams de vídeo são baseados em abordagens de AM, tais como Aprendizagem Profunda (AP), que cresceu em popularidade nos últimos anos, à medida que se tornou claro o massivo potencial que tinha em diversas aplicações, que poderiam beneficiar de ter uma máquina a reconhecer diversas ações humanas. Neste projeto, é proposta a criação de um modelo de Machine Learning que permita determinar a existência de violência numa stream de vídeo. Este modelo tomará partido de tecnologia utilizada em métodos do Estado da Arte como classificadores de vídeo, mas também de classificadores áudio, e esquemas de Fusão Antecipada / Tardia (FA / FT) que permitem a combinação de várias modalidades de dados, neste caso: áudio e vídeo. Serão tiradas também conclusões sobre as taxas de acerto dos diversos tipos de classificadores, de modo a determinar se algum outro tipo de classificador deveria de ter mais prominência Este documento começa com uma introdução ao trabalho levado a cabo, em que o seu contexto, motivação, e objetivos são explicados. Seguidamente, a metodologia utilizada de modo a mais eficientemente levar a cabo a pesquisa nesta Tese é clarificada. Após isso, o Estado da Arte no que concerne abordagens baseadas em AM para Reconhecimento de Ações e Deteção de Violência é explorado. Depois de ser atualizado em relação a quais são consideradas abordagens de Estado da Arte, é possível avançar para o capítulo seguinte, onde o método utilisado para treinar os modelos que foram considerados como os melhores candidatos para detetar violência é detalhado. Subsequentemente, os modelos selecionados são escrutinizados de modo a melhor entender a sua arquitetura, e porque são adequados para detetar violência. Depois, os resultados conseguidos por estes modelos são explorados, de modo a melhor compreender o desempenho conseguido. Finalmente, as conclusões que foram chegadas a são apresentadas, tais como possibilidades para expandir e melhorar esta pesquisa. Os resultados obtidos comprovam o sucesso e a prevalência dos classificadores de vídeo, e mostram também a eficácia dos modelos que tomam partido de algum tipo de fusão
    corecore