31 research outputs found
Collaborative Chinese Text Recognition with Personalized Federated Learning
In Chinese text recognition, to compensate for the insufficient local data
and improve the performance of local few-shot character recognition, it is
often necessary for one organization to collect a large amount of data from
similar organizations. However, due to the natural presence of private
information in text data, such as addresses and phone numbers, different
organizations are unwilling to share private data. Therefore, it becomes
increasingly important to design a privacy-preserving collaborative training
framework for the Chinese text recognition task. In this paper, we introduce
personalized federated learning (pFL) into the Chinese text recognition task
and propose the pFedCR algorithm, which significantly improves the model
performance of each client (organization) without sharing private data.
Specifically, pFedCR comprises two stages: multiple rounds of global model
training stage and the the local personalization stage. During stage 1, an
attention mechanism is incorporated into the CRNN model to adapt to various
client data distributions. Leveraging inherent character data characteristics,
a balanced dataset is created on the server to mitigate character imbalance. In
the personalization phase, the global model is fine-tuned for one epoch to
create a local model. Parameter averaging between local and global models
combines personalized and global feature extraction capabilities. Finally, we
fine-tune only the attention layers to enhance its focus on local personalized
features. The experimental results on three real-world industrial scenario
datasets show that the pFedCR algorithm can improve the performance of local
personalized models by about 20\% while also improving their generalization
performance on other client data domains. Compared to other state-of-the-art
personalized federated learning methods, pFedCR improves performance by 6\%
8\%
Orientation-Independent Chinese Text Recognition in Scene Images
Scene text recognition (STR) has attracted much attention due to its broad
applications. The previous works pay more attention to dealing with the
recognition of Latin text images with complex backgrounds by introducing
language models or other auxiliary networks. Different from Latin texts, many
vertical Chinese texts exist in natural scenes, which brings difficulties to
current state-of-the-art STR methods. In this paper, we take the first attempt
to extract orientation-independent visual features by disentangling content and
orientation information of text images, thus recognizing both horizontal and
vertical texts robustly in natural scenes. Specifically, we introduce a
Character Image Reconstruction Network (CIRN) to recover corresponding printed
character images with disentangled content and orientation information. We
conduct experiments on a scene dataset for benchmarking Chinese text
recognition, and the results demonstrate that the proposed method can indeed
improve performance through disentangling content and orientation information.
To further validate the effectiveness of our method, we additionally collect a
Vertical Chinese Text Recognition (VCTR) dataset. The experimental results show
that the proposed method achieves 45.63% improvement on VCTR when introducing
CIRN to the baseline model.Comment: IJCAI 202
Leveraging Stack4Things for Federated Learning in Intelligent Cyber Physical Systems
During the last decade, the Internet of Things acted as catalyst for the big data phenomenon. As result, modern edge devices can access a huge amount of data that can be exploited to build useful services. In such a context, artificial intelligence has a key role to develop intelligent systems (e.g., intelligent cyber physical systems) that create a connecting bridge with the physical world. However, as time goes by, machine and deep learning applications are becoming more complex, requiring increasing amounts of data and training time, which makes the use of centralized approaches unsuitable. Federated learning is an emerging paradigm which enables the cooperation of edge devices to learn a shared model (while keeping private their training data), thereby abating the training time. Although federated learning is a promising technique, its implementation is difficult and brings a lot of challenges. In this paper, we present an extension of Stack4Things, a cloud platform developed in our department; leveraging its functionalities, we enabled the deployment of federated learning on edge devices without caring their heterogeneity. Experimental results show a comparison with a centralized approach and demonstrate the effectiveness of the proposed approach in terms of both training time and model accuracy
A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation
Body language (BL) refers to the non-verbal communication expressed through
physical movements, gestures, facial expressions, and postures. It is a form of
communication that conveys information, emotions, attitudes, and intentions
without the use of spoken or written words. It plays a crucial role in
interpersonal interactions and can complement or even override verbal
communication. Deep multi-modal learning techniques have shown promise in
understanding and analyzing these diverse aspects of BL. The survey emphasizes
their applications to BL generation and recognition. Several common BLs are
considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and
Talking Head (TH), and we have conducted an analysis and established the
connections among these four BL for the first time. Their generation and
recognition often involve multi-modal approaches. Benchmark datasets for BL
research are well collected and organized, along with the evaluation of SOTA
methods on these datasets. The survey highlights challenges such as limited
labeled data, multi-modal learning, and the need for domain adaptation to
generalize models to unseen speakers or languages. Future research directions
are presented, including exploring self-supervised learning techniques,
integrating contextual information from other modalities, and exploiting
large-scale pre-trained multi-modal models. In summary, this survey paper
provides a comprehensive understanding of deep multi-modal learning for various
BL generations and recognitions for the first time. By analyzing advancements,
challenges, and future directions, it serves as a valuable resource for
researchers and practitioners in advancing this field. n addition, we maintain
a continuously updated paper list for deep multi-modal learning for BL
recognition and generation: https://github.com/wentaoL86/awesome-body-language
Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications
Communication systems to date primarily aim at reliably communicating bit
sequences. Such an approach provides efficient engineering designs that are
agnostic to the meanings of the messages or to the goal that the message
exchange aims to achieve. Next generation systems, however, can be potentially
enriched by folding message semantics and goals of communication into their
design. Further, these systems can be made cognizant of the context in which
communication exchange takes place, providing avenues for novel design
insights. This tutorial summarizes the efforts to date, starting from its early
adaptations, semantic-aware and task-oriented communications, covering the
foundations, algorithms and potential implementations. The focus is on
approaches that utilize information theory to provide the foundations, as well
as the significant role of learning in semantics and task-aware communications.Comment: 28 pages, 14 figure
Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications
Communication systems to date primarily aim at reliably communicating bit sequences. Such an approach provides efficient engineering designs that are agnostic to the meanings of the messages or to the goal that the message exchange aims to achieve. Next generation systems, however, can be potentially enriched by folding message semantics and goals of communication into their design. Further, these systems can be made cognizant of the context in which communication exchange takes place, thereby providing avenues for novel design insights. This tutorial summarizes the efforts to date, starting from its early adaptations, semantic-aware and task-oriented communications, covering the foundations, algorithms and potential implementations. The focus is on approaches that utilize information theory to provide the foundations, as well as the significant role of learning in semantics and task-aware communications
Federated learning for edge computing: A survey
New technologies bring opportunities to deploy AI and machine learning to the edge of the network, allowing edge devices to train simple models that can then be deployed in practice. Federated learning (FL) is a distributed machine learning technique to create a global model by learning from multiple decentralized edge clients. Although FL methods offer several advantages, including scalability and data privacy, they also introduce some risks and drawbacks in terms of computational complexity in the case of heterogeneous devices. Internet of Things (IoT) devices may have limited computing resources, poorer connection quality, or may use different operating systems. This paper provides an overview of the methods used in FL with a focus on edge devices with limited computational resources. This paper also presents FL frameworks that are currently popular and that provide communication between clients and servers. In this context, various topics are described, which include contributions and trends in the literature. This includes basic models and designs of system architecture, possibilities of application in practice, privacy and security, and resource management. Challenges related to the computational requirements of edge devices such as hardware heterogeneity, communication overload or limited resources of devices are discussed.Web of Science1218art. no. 912