124 research outputs found
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
In the rapidly advancing field of multi-modal machine learning (MMML), the
convergence of multiple data modalities has the potential to reshape various
applications. This paper presents a comprehensive overview of the current
state, advancements, and challenges of MMML within the sphere of engineering
design. The review begins with a deep dive into five fundamental concepts of
MMML:multi-modal information representation, fusion, alignment, translation,
and co-learning. Following this, we explore the cutting-edge applications of
MMML, placing a particular emphasis on tasks pertinent to engineering design,
such as cross-modal synthesis, multi-modal prediction, and cross-modal
information retrieval. Through this comprehensive overview, we highlight the
inherent challenges in adopting MMML in engineering design, and proffer
potential directions for future research. To spur on the continued evolution of
MMML in engineering design, we advocate for concentrated efforts to construct
extensive multi-modal design datasets, develop effective data-driven MMML
techniques tailored to design applications, and enhance the scalability and
interpretability of MMML models. MMML models, as the next generation of
intelligent design tools, hold a promising future to impact how products are
designed
Learning from Very Few Samples: A Survey
Few sample learning (FSL) is significant and challenging in the field of
machine learning. The capability of learning and generalizing from very few
samples successfully is a noticeable demarcation separating artificial
intelligence and human intelligence since humans can readily establish their
cognition to novelty from just a single or a handful of examples whereas
machine learning algorithms typically entail hundreds or thousands of
supervised samples to guarantee generalization ability. Despite the long
history dated back to the early 2000s and the widespread attention in recent
years with booming deep learning technologies, little surveys or reviews for
FSL are available until now. In this context, we extensively review 300+ papers
of FSL spanning from the 2000s to 2019 and provide a timely and comprehensive
survey for FSL. In this survey, we review the evolution history as well as the
current progress on FSL, categorize FSL approaches into the generative model
based and discriminative model based kinds in principle, and emphasize
particularly on the meta learning based FSL approaches. We also summarize
several recently emerging extensional topics of FSL and review the latest
advances on these topics. Furthermore, we highlight the important FSL
applications covering many research hotspots in computer vision, natural
language processing, audio and speech, reinforcement learning and robotic, data
analysis, etc. Finally, we conclude the survey with a discussion on promising
trends in the hope of providing guidance and insights to follow-up researches.Comment: 30 page
Semantics-Empowered Communication: A Tutorial-cum-Survey
Along with the springing up of the semantics-empowered communication (SemCom)
research, it is now witnessing an unprecedentedly growing interest towards a
wide range of aspects (e.g., theories, applications, metrics and
implementations) in both academia and industry. In this work, we primarily aim
to provide a comprehensive survey on both the background and research taxonomy,
as well as a detailed technical tutorial. Specifically, we start by reviewing
the literature and answering the "what" and "why" questions in semantic
transmissions. Afterwards, we present the ecosystems of SemCom, including
history, theories, metrics, datasets and toolkits, on top of which the taxonomy
for research directions is presented. Furthermore, we propose to categorize the
critical enabling techniques by explicit and implicit reasoning-based methods,
and elaborate on how they evolve and contribute to modern content & channel
semantics-empowered communications. Besides reviewing and summarizing the
latest efforts in SemCom, we discuss the relations with other communication
levels (e.g., conventional communications) from a holistic and unified
viewpoint. Subsequently, in order to facilitate future developments and
industrial applications, we also highlight advanced practical techniques for
boosting semantic accuracy, robustness, and large-scale scalability, just to
mention a few. Finally, we discuss the technical challenges that shed light on
future research opportunities.Comment: Submitted to an IEEE journal. Copyright might be transferred without
further notic
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
False textual information detection, a deep learning approach
Many approaches exist for analysing fact checking for fake news identification, which is the focus of this thesis. Current approaches still perform badly on a large scale due to a lack of authority, or insufficient evidence, or in certain cases reliance on a single piece of evidence.
To address the lack of evidence and the inability of models to generalise across domains, we propose a style-aware model for detecting false information and improving existing performance. We discovered that our model was effective at detecting false information when we evaluated its generalisation ability using news articles and Twitter corpora.
We then propose to improve fact checking performance by incorporating warrants. We developed a highly efficient prediction model based on the results and demonstrated that incorporating is beneficial for fact checking. Due to a lack of external warrant data, we develop a novel model for generating warrants that aid in determining the credibility of a claim. The results indicate that when a pre-trained language model is combined with a multi-agent model, high-quality, diverse warrants are generated that contribute to task performance improvement.
To resolve a biased opinion and making rational judgments, we propose a model that can generate multiple perspectives on the claim. Experiments confirm that our Perspectives Generation model allows for the generation of diverse perspectives with a higher degree of quality and diversity than any other baseline model.
Additionally, we propose to improve the model's detection capability by generating an explainable alternative factual claim assisting the reader in identifying subtle issues that result in factual errors. The examination demonstrates that it does indeed increase the veracity of the claim.
Finally, current research has focused on stance detection and fact checking separately, we propose a unified model that integrates both tasks. Classification results demonstrate that our proposed model outperforms state-of-the-art methods
A Review of Deep Learning Techniques for Speech Processing
The field of speech processing has undergone a transformative shift with the
advent of deep learning. The use of multiple processing layers has enabled the
creation of models capable of extracting intricate features from speech data.
This development has paved the way for unparalleled advancements in speech
recognition, text-to-speech synthesis, automatic speech recognition, and
emotion recognition, propelling the performance of these tasks to unprecedented
heights. The power of deep learning techniques has opened up new avenues for
research and innovation in the field of speech processing, with far-reaching
implications for a range of industries and applications. This review paper
provides a comprehensive overview of the key deep learning models and their
applications in speech-processing tasks. We begin by tracing the evolution of
speech processing research, from early approaches, such as MFCC and HMM, to
more recent advances in deep learning architectures, such as CNNs, RNNs,
transformers, conformers, and diffusion models. We categorize the approaches
and compare their strengths and weaknesses for solving speech-processing tasks.
Furthermore, we extensively cover various speech-processing tasks, datasets,
and benchmarks used in the literature and describe how different deep-learning
networks have been utilized to tackle these tasks. Additionally, we discuss the
challenges and future directions of deep learning in speech processing,
including the need for more parameter-efficient, interpretable models and the
potential of deep learning for multimodal speech processing. By examining the
field's evolution, comparing and contrasting different approaches, and
highlighting future directions and challenges, we hope to inspire further
research in this exciting and rapidly advancing field
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System
Automated machine learning (AutoML) seeks to build ML models with minimal
human effort. While considerable research has been conducted in the area of
AutoML in general, aiming to take humans out of the loop when building
artificial intelligence (AI) applications, scant literature has focused on how
AutoML works well in open-environment scenarios such as the process of training
and updating large models, industrial supply chains or the industrial
metaverse, where people often face open-loop problems during the search
process: they must continuously collect data, update data and models, satisfy
the requirements of the development and deployment environment, support massive
devices, modify evaluation metrics, etc. Addressing the open-environment issue
with pure data-driven approaches requires considerable data, computing
resources, and effort from dedicated data engineers, making current AutoML
systems and platforms inefficient and computationally intractable.
Human-computer interaction is a practical and feasible way to tackle the
problem of open-environment AI. In this paper, we introduce OmniForce, a
human-centered AutoML (HAML) system that yields both human-assisted ML and
ML-assisted human techniques, to put an AutoML system into practice and build
adaptive AI in open-environment scenarios. Specifically, we present OmniForce
in terms of ML version management; pipeline-driven development and deployment
collaborations; a flexible search strategy framework; and widely provisioned
and crowdsourced application algorithms, including large models. Furthermore,
the (large) models constructed by OmniForce can be automatically turned into
remote services in a few minutes; this process is dubbed model as a service
(MaaS). Experimental results obtained in multiple search spaces and real-world
use cases demonstrate the efficacy and efficiency of OmniForce
Computational Approaches to Explainable Artificial Intelligence:Advances in Theory, Applications and Trends
Deep Learning (DL), a groundbreaking branch of Machine Learning (ML), has emerged as a driving force in both theoretical and applied Artificial Intelligence (AI). DL algorithms, rooted in complex and non-linear artificial neural systems, excel at extracting high-level features from data. DL has demonstrated human-level performance in real-world tasks, including clinical diagnostics, and has unlocked solutions to previously intractable problems in virtual agent design, robotics, genomics, neuroimaging, computer vision, and industrial automation. In this paper, the most relevant advances from the last few years in Artificial Intelligence (AI) and several applications to neuroscience, neuroimaging, computer vision, and robotics are presented, reviewed and discussed. In this way, we summarize the state-of-the-art in AI methods, models and applications within a collection of works presented at the 9 International Conference on the Interplay between Natural and Artificial Computation (IWINAC). The works presented in this paper are excellent examples of new scientific discoveries made in laboratories that have successfully transitioned to real-life applications
Extracting knowledge from complex unstructured corpora: Text classification and a case study on the safeguarding domain
The advances in internet, data collection and sharing technologies have lead to an increase in the amount of unstructured information in the form of news, articles, and
social media. Additionally, many specialised domains such as the medical, law, and social science-related domains use unstructured documents as a main platform for collecting, storing and sharing domain-specific knowledge. However, the manual processing of these documents is a resource-consuming and error-prone process. This is especially apparent when the volume of the documents that need annotating constantly
increases over time. Therefore, automated information extraction techniques have been widely used to efficiently analyse text and discover patterns. Specifically, text classification methods have become valuable for specialised domains for organising content, such as patient notes, and help fast topic-based retrieval of information. However,
many specialised domains suffer from lack of data and class imbalance problems because documents are hard to obtain. In addition, the manual annotation needs to be
performed by experts which can be costly. This makes the application of supervised classification approaches a challenging task.
In this thesis, we research methods for improving the performance of text classifiers
for specialised domains with limited amounts of data and highly domain-specific terminology where the annotation of documents is performed by domain experts. First,
we study the applicability of traditional feature enhancement approaches using publicly available resources for improving classifiers performance for specialised domains.
Then, we conduct extensive research into suitability of existing classification algorithms and the importance of both domain and task specific data for few-shot classification
which helps identify classification strategies applicable to small datasets. This gives the
basis for the development of a methodology for improving a classifier’s performance
for few-shot settings using text generation-based data augmentation techniques. Specifically, we aim to improve quality of generated data by using strategies for selecting
class representative samples from the original dataset used to produce additional training instances. We perform extensive analysis, considering multiple strategies, datasets,
and few-shot text classification settings.
Our study uses a corpus of safeguarding reports as an exemplary case study of a specialised domain with a small volume of data. The safeguarding reports contain valuable information about learning experiences and reflections on tackling serious crimes
involving children and vulnerable adults. They carry great potential to improve multiagency work and help develop better crime prevention strategies. However, the lack
of centralised access and the constant growth of the collection, make the manual analysis of the reports unfeasible. Therefore, we collaborated with the Crime and Security
Research Institute (CSRI) at Cardiff University for the creation of a Wales Safeguarding Repository (WSR) for providing a centralised access to the safeguarding reports
and means for automatic information extraction. The aim of the repository is to facilitate efficient searchability of the collection and thus help free up resources and
assist practitioners from health and social care agencies in making faster and more accurate decisions. In particular, we apply methods identified in the thesis, in order to
support automated annotation of the documents using a thematic framework, created
by subject-matter experts. Our close work with domain experts throughout the thesis
allowed incorporating experts‘ knowledge into classification and augmentation techniques which proved beneficial for the improvement of automated supervised methods
for specialised domains
- …