17,597 research outputs found
Security and Privacy Problems in Voice Assistant Applications: A Survey
Voice assistant applications have become omniscient nowadays. Two models that
provide the two most important functions for real-life applications (i.e.,
Google Home, Amazon Alexa, Siri, etc.) are Automatic Speech Recognition (ASR)
models and Speaker Identification (SI) models. According to recent studies,
security and privacy threats have also emerged with the rapid development of
the Internet of Things (IoT). The security issues researched include attack
techniques toward machine learning models and other hardware components widely
used in voice assistant applications. The privacy issues include technical-wise
information stealing and policy-wise privacy breaches. The voice assistant
application takes a steadily growing market share every year, but their privacy
and security issues never stopped causing huge economic losses and endangering
users' personal sensitive information. Thus, it is important to have a
comprehensive survey to outline the categorization of the current research
regarding the security and privacy problems of voice assistant applications.
This paper concludes and assesses five kinds of security attacks and three
types of privacy threats in the papers published in the top-tier conferences of
cyber security and voice domain.Comment: 5 figure
The Metaverse: Survey, Trends, Novel Pipeline Ecosystem & Future Directions
The Metaverse offers a second world beyond reality, where boundaries are
non-existent, and possibilities are endless through engagement and immersive
experiences using the virtual reality (VR) technology. Many disciplines can
benefit from the advancement of the Metaverse when accurately developed,
including the fields of technology, gaming, education, art, and culture.
Nevertheless, developing the Metaverse environment to its full potential is an
ambiguous task that needs proper guidance and directions. Existing surveys on
the Metaverse focus only on a specific aspect and discipline of the Metaverse
and lack a holistic view of the entire process. To this end, a more holistic,
multi-disciplinary, in-depth, and academic and industry-oriented review is
required to provide a thorough study of the Metaverse development pipeline. To
address these issues, we present in this survey a novel multi-layered pipeline
ecosystem composed of (1) the Metaverse computing, networking, communications
and hardware infrastructure, (2) environment digitization, and (3) user
interactions. For every layer, we discuss the components that detail the steps
of its development. Also, for each of these components, we examine the impact
of a set of enabling technologies and empowering domains (e.g., Artificial
Intelligence, Security & Privacy, Blockchain, Business, Ethics, and Social) on
its advancement. In addition, we explain the importance of these technologies
to support decentralization, interoperability, user experiences, interactions,
and monetization. Our presented study highlights the existing challenges for
each component, followed by research directions and potential solutions. To the
best of our knowledge, this survey is the most comprehensive and allows users,
scholars, and entrepreneurs to get an in-depth understanding of the Metaverse
ecosystem to find their opportunities and potentials for contribution
Multi-modal Facial Affective Analysis based on Masked Autoencoder
Human affective behavior analysis focuses on analyzing human expressions or
other behaviors to enhance the understanding of human psychology. The CVPR 2023
Competition on Affective Behavior Analysis in-the-wild (ABAW) is dedicated to
providing high-quality and large-scale Aff-wild2 for the recognition of
commonly used emotion representations, such as Action Units (AU), basic
expression categories(EXPR), and Valence-Arousal (VA). The competition is
committed to making significant strides in improving the accuracy and
practicality of affective analysis research in real-world scenarios. In this
paper, we introduce our submission to the CVPR 2023: ABAW5. Our approach
involves several key components. First, we utilize the visual information from
a Masked Autoencoder(MAE) model that has been pre-trained on a large-scale face
image dataset in a self-supervised manner. Next, we finetune the MAE encoder on
the image frames from the Aff-wild2 for AU, EXPR and VA tasks, which can be
regarded as a static and uni-modal training. Additionally, we leverage the
multi-modal and temporal information from the videos and implement a
transformer-based framework to fuse the multi-modal features. Our approach
achieves impressive results in the ABAW5 competition, with an average F1 score
of 55.49\% and 41.21\% in the AU and EXPR tracks, respectively, and an average
CCC of 0.6372 in the VA track. Our approach ranks first in the EXPR and AU
tracks, and second in the VA track. Extensive quantitative experiments and
ablation studies demonstrate the effectiveness of our proposed method
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era
OpenAI has recently released GPT-4 (a.k.a. ChatGPT plus), which is
demonstrated to be one small step for generative AI (GAI), but one giant leap
for artificial general intelligence (AGI). Since its official release in
November 2022, ChatGPT has quickly attracted numerous users with extensive
media coverage. Such unprecedented attention has also motivated numerous
researchers to investigate ChatGPT from various aspects. According to Google
scholar, there are more than 500 articles with ChatGPT in their titles or
mentioning it in their abstracts. Considering this, a review is urgently
needed, and our work fills this gap. Overall, this work is the first to survey
ChatGPT with a comprehensive review of its underlying technology, applications,
and challenges. Moreover, we present an outlook on how ChatGPT might evolve to
realize general-purpose AIGC (a.k.a. AI-generated content), which will be a
significant milestone for the development of AGI.Comment: A Survey on ChatGPT and GPT-4, 29 pages. Feedback is appreciated
([email protected]
Sign Language Translation from Instructional Videos
The advances in automatic sign language translation (SLT) to spoken languages
have been mostly benchmarked with datasets of limited size and restricted
domains. Our work advances the state of the art by providing the first baseline
results on How2Sign, a large and broad dataset.
We train a Transformer over I3D video features, using the reduced BLEU as a
reference metric for validation, instead of the widely used BLEU score. We
report a result of 8.03 on the BLEU score, and publish the first open-source
implementation of its kind to promote further advances.Comment: Paper accepted at WiCV @CVPR2
Audio-Visual Automatic Speech Recognition Towards Education for Disabilities
Education is a fundamental right that enriches everyone’s life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition
Corporate Social Responsibility: the institutionalization of ESG
Understanding the impact of Corporate Social Responsibility (CSR) on firm performance as it relates to industries reliant on technological innovation is a complex and perpetually evolving challenge. To thoroughly investigate this topic, this dissertation will adopt an economics-based structure to address three primary hypotheses. This structure allows for each hypothesis to essentially be a standalone empirical paper, unified by an overall analysis of the nature of impact that ESG has on firm performance. The first hypothesis explores the evolution of CSR to the modern quantified iteration of ESG has led to the institutionalization and standardization of the CSR concept. The second hypothesis fills gaps in existing literature testing the relationship between firm performance and ESG by finding that the relationship is significantly positive in long-term, strategic metrics (ROA and ROIC) and that there is no correlation in short-term metrics (ROE and ROS). Finally, the third hypothesis states that if a firm has a long-term strategic ESG plan, as proxied by the publication of CSR reports, then it is more resilience to damage from controversies. This is supported by the finding that pro-ESG firms consistently fared better than their counterparts in both financial and ESG performance, even in the event of a controversy. However, firms with consistent reporting are also held to a higher standard than their nonreporting peers, suggesting a higher risk and higher reward dynamic. These findings support the theory of good management, in that long-term strategic planning is both immediately economically beneficial and serves as a means of risk management and social impact mitigation. Overall, this contributes to the literature by fillings gaps in the nature of impact that ESG has on firm performance, particularly from a management perspective
Neural Architecture Search: Insights from 1000 Papers
In the past decade, advances in deep learning have resulted in breakthroughs
in a variety of areas, including computer vision, natural language
understanding, speech recognition, and reinforcement learning. Specialized,
high-performing neural architectures are crucial to the success of deep
learning in these areas. Neural architecture search (NAS), the process of
automating the design of neural architectures for a given task, is an
inevitable next step in automating machine learning and has already outpaced
the best human-designed architectures on many tasks. In the past few years,
research in NAS has been progressing rapidly, with over 1000 papers released
since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized
and comprehensive guide to neural architecture search. We give a taxonomy of
search spaces, algorithms, and speedup techniques, and we discuss resources
such as benchmarks, best practices, other surveys, and open-source libraries
A scoping review of natural language processing of radiology reports in breast cancer
Various natural language processing (NLP) algorithms have been applied in the literature to analyze radiology reports pertaining to the diagnosis and subsequent care of cancer patients. Applications of this technology include cohort selection for clinical trials, population of large-scale data registries, and quality improvement in radiology workflows including mammography screening. This scoping review is the first to examine such applications in the specific context of breast cancer. Out of 210 identified articles initially, 44 met our inclusion criteria for this review. Extracted data elements included both clinical and technical details of studies that developed or evaluated NLP algorithms applied to free-text radiology reports of breast cancer. Our review illustrates an emphasis on applications in diagnostic and screening processes over treatment or therapeutic applications and describes growth in deep learning and transfer learning approaches in recent years, although rule-based approaches continue to be useful. Furthermore, we observe increased efforts in code and software sharing but not with data sharing
Self-Consistent Learning: Cooperation between Generators and Discriminators
Using generated data to improve the performance of downstream discriminative
models has recently gained popularity due to the great development of
pre-trained language models. In most previous studies, generative models and
discriminative models are trained separately and thus could not adapt to any
changes in each other. As a result, the generated samples can easily deviate
from the real data distribution, while the improvement of the discriminative
model quickly reaches saturation. Generative adversarial networks (GANs) train
generative models via an adversarial process with discriminative models to
achieve joint training. However, the training of standard GANs is notoriously
unstable and often falls short of convergence. In this paper, to address these
issues, we propose a framework, in which a
discriminator and a generator are cooperatively trained in a closed-loop form.
The discriminator and the generator enhance each other during multiple rounds
of alternating training until a scoring consensus is reached. This framework
proves to be easy to train and free from instabilities such as mode collapse
and non-convergence. Extensive experiments on sentence semantic matching
demonstrate the effectiveness of the proposed framework: the discriminator
achieves 10+ AP of improvement on the zero-shot setting and new
state-of-the-art performance on the full-data setting
- …