531 research outputs found
The Role of Multiple Articulatory Channels of Sign-Supported Speech Revealed by Visual Processing
Purpose
The use of sign-supported speech (SSS) in the education of deaf students has been recently discussed in relation to its usefulness with deaf children using cochlear implants. To clarify the benefits of SSS for comprehension, 2 eye-tracking experiments aimed to detect the extent to which signs are actively processed in this mode of communication.
Method
Participants were 36 deaf adolescents, including cochlear implant users and native deaf signers. Experiment 1 attempted to shift observers' foveal attention to the linguistic source in SSS from which most information is extracted, lip movements or signs, by magnifying the face area, thus modifying lip movements perceptual accessibility (magnified condition), and by constraining the visual field to either the face or the sign through a moving window paradigm (gaze contingent condition). Experiment 2 aimed to explore the reliance on signs in SSS by occasionally producing a mismatch between sign and speech. Participants were required to concentrate upon the orally transmitted message.
Results
In Experiment 1, analyses revealed a greater number of fixations toward the signs and a reduction in accuracy in the gaze contingent condition across all participants. Fixations toward signs were also increased in the magnified condition. In Experiment 2, results indicated less accuracy in the mismatching condition across all participants. Participants looked more at the sign when it was inconsistent with speech.
Conclusions
All participants, even those with residual hearing, rely on signs when attending SSS, either peripherally or through overt attention, depending on the perceptual conditions.Unión Europea, Grant Agreement 31674
Selective Scene Text Removal
Scene text removal (STR) is the image transformation task to remove text
regions in scene images. The conventional STR methods remove all scene text.
This means that the existing methods cannot select text to be removed. In this
paper, we propose a novel task setting named selective scene text removal
(SSTR) that removes only target words specified by the user. Although SSTR is a
more complex task than STR, the proposed multi-module structure enables
efficient training for SSTR. Experimental results show that the proposed method
can remove target words as expected.Comment: 12 pages, 8 figures, Accepted at the 34th British Machine Vision
Conferenc
From North to South: African Librarianhip in the new millennium
ANNUAL PUBLIC LECTURE ON AFRICAN LIBRARIANSHIP IN THE 21ST CENTURY, HOSTED BY UNISA LIBRARY IN PARTNERSHIP WTH IFLA REGIONAL OFFICE FOR AFRICAIFLA REGIONAL OFFICE FOR AFRIC
A Review of Fog Computing Concept, Architecture, Application, Parameters and Challenges
The Internet of Things (IoT) has become an integral part of our daily lives, growing exponentially from a facility to a necessity. IoT has been utilized extensively through cloud computing and has proven an excellent technology for deploying in various fields. The data generated by the IoT devices gets transmitted to the cloud for processing and storage. However, with this approach, there are specific issues like latency, energy, computation resources availability, bandwidth, heterogeneity, storage, and network failure. To overcome these obstacles, fog computing is utilized as a middle tier. Fog computing gathers and processes the generated data closer to the user end before transmitting it to the cloud. This paper aims to conduct a structured review of the current state of fog computing and its architectures deployed across multiple industries. This paper also focuses on the implementation and critical parameters for introducing fog computing in IoT-cloud architecture. A detailed comparative analysis has been carried out for 5 different architectures considering various crucial parameters to identify how the quality of service and quality of experience for end users can be optimized. Finally, this paper looks at the multiple challenges that fog computing faces in a structured six-level approach. These challenges will also lead the way for future research in resource management, green computing, and security
Box2Poly: Memory-Efficient Polygon Prediction of Arbitrarily Shaped and Rotated Text
Recently, Transformer-based text detection techniques have sought to predict
polygons by encoding the coordinates of individual boundary vertices using
distinct query features. However, this approach incurs a significant memory
overhead and struggles to effectively capture the intricate relationships
between vertices belonging to the same instance. Consequently, irregular text
layouts often lead to the prediction of outlined vertices, diminishing the
quality of results. To address these challenges, we present an innovative
approach rooted in Sparse R-CNN: a cascade decoding pipeline for polygon
prediction. Our method ensures precision by iteratively refining polygon
predictions, considering both the scale and location of preceding results.
Leveraging this stabilized regression pipeline, even employing just a single
feature vector to guide polygon instance regression yields promising detection
results. Simultaneously, the leverage of instance-level feature proposal
substantially enhances memory efficiency (>50% less vs. the state-of-the-art
method DPText-DETR) and reduces inference speed (>40% less vs. DPText-DETR)
with minor performance drop on benchmarks
A Fuzzy Logic based Privacy Preservation Clustering method for achieving K- Anonymity using EMD in dLink Model
Privacy preservation is the data mining technique which is to be applied on the databases without violating the privacy of individuals. The sensitive attribute can be selected from the numerical data and it can be modified by any data modification technique. After modification, the modified data can be released to any agency. If they can apply data mining techniques such as clustering, classification etc for data analysis, the modified data does not affect the result. In privacy preservation technique, the sensitive data is converted into modified data using S-shaped fuzzy membership function. K-means clustering is applied for both original and modified data to get the clusters. t-closeness requires that the distribution of sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table. Earth Mover Distance (EMD) is used to measure the distance between the two distributions should be no more than a threshold t. Hence privacy is preserved and accuracy of the data is maintained
Backbones-Review: Feature Extraction Networks for Deep Learning and Deep Reinforcement Learning Approaches
To understand the real world using various types of data, Artificial
Intelligence (AI) is the most used technique nowadays. While finding the
pattern within the analyzed data represents the main task. This is performed by
extracting representative features step, which is proceeded using the
statistical algorithms or using some specific filters. However, the selection
of useful features from large-scale data represented a crucial challenge. Now,
with the development of convolution neural networks (CNNs), the feature
extraction operation has become more automatic and easier. CNNs allow to work
on large-scale size of data, as well as cover different scenarios for a
specific task. For computer vision tasks, convolutional networks are used to
extract features also for the other parts of a deep learning model. The
selection of a suitable network for feature extraction or the other parts of a
DL model is not random work. So, the implementation of such a model can be
related to the target task as well as the computational complexity of it. Many
networks have been proposed and become the famous networks used for any DL
models in any AI task. These networks are exploited for feature extraction or
at the beginning of any DL model which is named backbones. A backbone is a
known network trained in many other tasks before and demonstrates its
effectiveness. In this paper, an overview of the existing backbones, e.g. VGGs,
ResNets, DenseNet, etc, is given with a detailed description. Also, a couple of
computer vision tasks are discussed by providing a review of each task
regarding the backbones used. In addition, a comparison in terms of performance
is also provided, based on the backbone used for each task
Visual Objectification in Films: Towards a New AI Task for Video Interpretation
In film gender studies, the concept of 'male gaze' refers to the way the
characters are portrayed on-screen as objects of desire rather than subjects.
In this article, we introduce a novel video-interpretation task, to detect
character objectification in films. The purpose is to reveal and quantify the
usage of complex temporal patterns operated in cinema to produce the cognitive
perception of objectification. We introduce the ObyGaze12 dataset, made of 1914
movie clips densely annotated by experts for objectification concepts
identified in film studies and psychology. We evaluate recent vision models,
show the feasibility of the task and where the challenges remain with concept
bottleneck models. Our new dataset and code are made available to the
community.Comment: 12 pages, 3 figures, 2 table
Advances in AI-Generated Images and Videos.
In recent years generative AI models and tools have experienced a significant increase, especially techniques to generate synthetic multimedia content, such as images or videos. These methodologies present a wide range of possibilities; however, they can also present several risks that should be taken into account. In this survey we describe in detail different techniques for generating synthetic multimedia content, and we also analyse the most recent techniques for their detection. In order to achieve these objectives, a key aspect is the availability of datasets, so we have also described the main datasets available in the state of the art. Finally, from our analysis we have extracted the main trends for the future, such as transparency and interpretability, the generation of multimodal multimedia content, the robustness of models and the increased use of diffusion models. We find a roadmap of deep challenges, including temporal consistency, computation requirements, generalizability, ethical aspects, and constant adaptation
Deep Learning for activity recognition in real-time video streams
Dissertação de mestrado integrado em Engenharia InformáticaIn an ever more connected world, smart cities are becoming ever more present in our society. In these smart
cities, use cases in which innovations that will benefit its inhabitants are also growing, improving their quality of life.
One of these areas is safety, in which Machine Learning (ML) models reveal potential in real-time video-stream
analysis in order to determine if violence exists in them.
These ML approaches concern the field of Computer Vision, a field responsible for traducing digital images
and videos, and be able to extract knowledge and understandable information from them, in order to be used
in diverse contexts. Some of the available alternatives to recognise actions in video streams are based on ML
approaches, such as Deep Learning (DL), that grew in popularity in the last years, as it was realised that it had
massive potential in several applications that could benefit from having a machine recognising diverse human
actions.
In this project, the creation of a ML model that can determine if violence exists in a video-stream is proposed.
This model will leverage technology being used in State of the Art methods, such as video classifiers, but also
audio classifiers, and Early/Late Fusion (EF / LF) schemes that allow the merging different modalities, in the case
of the present work: audio and video. Conclusions will also be drawn as to the accuracy rates of the different
types of classifiers, to determine if any other type of classifiers should have more prominence in the State of the
Art.
This document begins with an introduction to the work being conducted, in which both the its context, mo tivation and objectives are explained. Afterwards, the methodology used in order to more efficiently conduct
the research in this Thesis is clarified. Following that, the State of the Art concerning ML based approaches to
Action Recognition and Violence Detection is explored. After being brought to date in what are the State of the
Art approaches, one is able to move forward to the following chapter, in which the Training method that will be
employed to train the models that were seen as the best candidates to detect violence is detailed. Subsequently,
the selected models are scrutinized in an effort to better understand their architecture, and why they are suited
to detect violence. Afterwards, the results achieved by these models are explored, in order to better comprehend
how well these performed. Lastly, the conclusions that were reached after conducting this research are stated,
and possibilities for expanding this work further are also presented.
The obtained results prove the success and prevalence of video classifiers, and also show the efficacy of
models that make use of some kind of fusion.Num mundo cada vez mais conetado, as cidades inteligentes tornam-se cada vez mais presentes na nossa
sociedade. Nestas cidades inteligentes, crescem também os casos de uso nos quais podem ser aplicadas
inovações que beneficiarão os seus habitantes, melhorando a sua qualidade de vida. Uma dessas áreas é a da
segurança, na qual modelos de Aprendizagem Máquina (AM) apresentam potencial para analisar streams de
vídeo em tempo real e determinar se nestas existe violência.
Estas abordagens de AM são referentes ao campo de Visão por Computador, um campo responsável pela
tradução de imagens e vídeos digitais, e pela extração de conhecimento e informação inteligível dos mesmos,
de modo a ser utilizada em diversos contextos. Algumas das alternativas disponíveis para reconhecer ações em
streams de vídeo são baseados em abordagens de AM, tais como Aprendizagem Profunda (AP), que cresceu
em popularidade nos últimos anos, à medida que se tornou claro o massivo potencial que tinha em diversas
aplicações, que poderiam beneficiar de ter uma máquina a reconhecer diversas ações humanas.
Neste projeto, é proposta a criação de um modelo de Machine Learning que permita determinar a existência
de violência numa stream de vídeo. Este modelo tomará partido de tecnologia utilizada em métodos do Estado
da Arte como classificadores de vídeo, mas também de classificadores áudio, e esquemas de Fusão Antecipada
/ Tardia (FA / FT) que permitem a combinação de várias modalidades de dados, neste caso: áudio e vídeo.
Serão tiradas também conclusões sobre as taxas de acerto dos diversos tipos de classificadores, de modo a
determinar se algum outro tipo de classificador deveria de ter mais prominência
Este documento começa com uma introdução ao trabalho levado a cabo, em que o seu contexto, motivação,
e objetivos são explicados. Seguidamente, a metodologia utilizada de modo a mais eficientemente levar a cabo
a pesquisa nesta Tese é clarificada. Após isso, o Estado da Arte no que concerne abordagens baseadas em
AM para Reconhecimento de Ações e Deteção de Violência é explorado. Depois de ser atualizado em relação
a quais são consideradas abordagens de Estado da Arte, é possível avançar para o capítulo seguinte, onde o
método utilisado para treinar os modelos que foram considerados como os melhores candidatos para detetar
violência é detalhado. Subsequentemente, os modelos selecionados são escrutinizados de modo a melhor
entender a sua arquitetura, e porque são adequados para detetar violência. Depois, os resultados conseguidos
por estes modelos são explorados, de modo a melhor compreender o desempenho conseguido. Finalmente, as
conclusões que foram chegadas a são apresentadas, tais como possibilidades para expandir e melhorar esta
pesquisa.
Os resultados obtidos comprovam o sucesso e a prevalência dos classificadores de vídeo, e mostram também
a eficácia dos modelos que tomam partido de algum tipo de fusão
- …
