7,928 research outputs found
Recommended from our members
Ensuring Access to Safe and Nutritious Food for All Through the Transformation of Food Systems
BotMoE: Twitter Bot Detection with Community-Aware Mixtures of Modal-Specific Experts
Twitter bot detection has become a crucial task in efforts to combat online
misinformation, mitigate election interference, and curb malicious propaganda.
However, advanced Twitter bots often attempt to mimic the characteristics of
genuine users through feature manipulation and disguise themselves to fit in
diverse user communities, posing challenges for existing Twitter bot detection
models. To this end, we propose BotMoE, a Twitter bot detection framework that
jointly utilizes multiple user information modalities (metadata, textual
content, network structure) to improve the detection of deceptive bots.
Furthermore, BotMoE incorporates a community-aware Mixture-of-Experts (MoE)
layer to improve domain generalization and adapt to different Twitter
communities. Specifically, BotMoE constructs modal-specific encoders for
metadata features, textual content, and graphical structure, which jointly
model Twitter users from three modal-specific perspectives. We then employ a
community-aware MoE layer to automatically assign users to different
communities and leverage the corresponding expert networks. Finally, user
representations from metadata, text, and graph perspectives are fused with an
expert fusion layer, combining all three modalities while measuring the
consistency of user information. Extensive experiments demonstrate that BotMoE
significantly advances the state-of-the-art on three Twitter bot detection
benchmarks. Studies also confirm that BotMoE captures advanced and evasive
bots, alleviates the reliance on training data, and better generalizes to new
and previously unseen user communities.Comment: Accepted at SIGIR 202
Learning disentangled speech representations
A variety of informational factors are contained within the speech signal and a single short recording of speech reveals much more than the spoken words. The best method to extract and represent informational factors from the speech signal ultimately depends on which informational factors are desired and how they will be used. In addition, sometimes methods will capture more than one informational factor at the same time such as speaker identity, spoken content, and speaker prosody.
The goal of this dissertation is to explore different ways to deconstruct the speech signal into abstract representations that can be learned and later reused in various speech technology tasks. This task of deconstructing, also known as disentanglement, is a form of distributed representation learning. As a general approach to disentanglement, there are some guiding principles that elaborate what a learned representation should contain as well as how it should function. In particular, learned representations should contain all of the requisite information in a more compact manner, be interpretable, remove nuisance factors of irrelevant information, be useful in downstream tasks, and independent of the task at hand. The learned representations should also be able to answer counter-factual questions.
In some cases, learned speech representations can be re-assembled in different ways according to the requirements of downstream applications. For example, in a voice conversion task, the speech content is retained while the speaker identity is changed. And in a content-privacy task, some targeted content may be concealed without affecting how surrounding words sound. While there is no single-best method to disentangle all types of factors, some end-to-end approaches demonstrate a promising degree of generalization to diverse speech tasks.
This thesis explores a variety of use-cases for disentangled representations including phone recognition, speaker diarization, linguistic code-switching, voice conversion, and content-based privacy masking. Speech representations can also be utilised for automatically assessing the quality and authenticity of speech, such as automatic MOS ratings or detecting deep fakes. The meaning of the term "disentanglement" is not well defined in previous work, and it has acquired several meanings depending on the domain (e.g. image vs. speech). Sometimes the term "disentanglement" is used interchangeably with the term "factorization". This thesis proposes that disentanglement of speech is distinct, and offers a viewpoint of disentanglement that can be considered both theoretically and practically
Data-to-text generation with neural planning
In this thesis, we consider the task of data-to-text generation, which takes non-linguistic
structures as input and produces textual output. The inputs can take the form of
database tables, spreadsheets, charts, and so on. The main application of data-to-text
generation is to present information in a textual format which makes it accessible to
a layperson who may otherwise find it problematic to understand numerical figures.
The task can also automate routine document generation jobs, thus improving human
efficiency. We focus on generating long-form text, i.e., documents with multiple paragraphs. Recent approaches to data-to-text generation have adopted the very successful
encoder-decoder architecture or its variants. These models generate fluent (but often
imprecise) text and perform quite poorly at selecting appropriate content and ordering
it coherently. This thesis focuses on overcoming these issues by integrating content
planning with neural models. We hypothesize data-to-text generation will benefit from
explicit planning, which manifests itself in (a) micro planning, (b) latent entity planning, and (c) macro planning. Throughout this thesis, we assume the input to our
generator are tables (with records) in the sports domain. And the output are summaries
describing what happened in the game (e.g., who won/lost, ..., scored, etc.).
We first describe our work on integrating fine-grained or micro plans with data-to-text generation. As part of this, we generate a micro plan highlighting which records
should be mentioned and in which order, and then generate the document while taking
the micro plan into account.
We then show how data-to-text generation can benefit from higher level latent entity planning. Here, we make use of entity-specific representations which are dynam ically updated. The text is generated conditioned on entity representations and the
records corresponding to the entities by using hierarchical attention at each time step.
We then combine planning with the high level organization of entities, events, and
their interactions. Such coarse-grained macro plans are learnt from data and given
as input to the generator. Finally, we present work on making macro plans latent
while incrementally generating a document paragraph by paragraph. We infer latent
plans sequentially with a structured variational model while interleaving the steps of
planning and generation. Text is generated by conditioning on previous variational
decisions and previously generated text.
Overall our results show that planning makes data-to-text generation more interpretable, improves the factuality and coherence of the generated documents and re duces redundancy in the output document
Recommended from our members
Brain signal recognition using deep learning
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityBrain Computer Interface (BCI) has the potential to offer a new generation of applications independent of
muscular activity and controlled by the human brain. Brain imaging technologies are used to transfer the
cognitive tasks into control commands for a BCI system. The electroencephalography (EEG) technology
serves as the best available non-invasive solution for extracting signals from the brain. On the other hand,
speech is the primary means of communication, but for patients suffering from locked-in syndrome, there
is no easy way to communicate. Therefore, an ideal communication system for locked-in patients is a
thought-to-speech BCI system.
This research aims to investigate methods for the recognition of imagined speech from EEG signals
using deep learning techniques. In order to design an optimal imagined speech recognition BCI, variety
of issues have been solved. These include 1) proposing new feature extraction and classification
framework for recognition of imagined speech from EEG signals, 2) grammatical class recognition of
imagined words from EEG signals, 3) discriminating different cognitive tasks associated with speech in
the brain such as overt speech, covert speech, and visual imagery. In this work machine learning, deep
learning methods were used to analyze EEG signals.
For recognition of imagined speech from EEG signals, a new EEG database was collected while the
participants mentally spoke (imagined speech) the presented words. Along with imagined speech, EEG
data was recorded for visual imagery (imagining a scene or an image) and overt speech (verbal speech).
Spectro-temporal and spatio-temporal domain features were investigated for the classification of imagined
words from EEG signals. Further, a deep learning framework using the convolutional network
and attention mechanism was implemented for learning features in the spatial, temporal, and spectral
domains. The method achieved a recognition rate of 76.6% for three binary word pairs. These experiments
show that deep learning algorithms are ideal for imagined speech recognition from EEG signals
due to their ability to interpret features from non-linear and non-stationary signals. Grammatical classes
of imagined words from EEG signals were also recognized using a multi-channel convolution network
framework. This method was extended to a multi-level recognition system for multi-class classification
of imagined words which achieved an accuracy of 52.9% for 10 words, which is much better in
comparison to previous work.
In order to investigate the difference between imagined speech with verbal speech and visual imagery
from EEG signals, we used multivariate pattern analysis (MVPA). MVPA provided the time segments
when the neural oscillation for the different cognitive tasks was linearly separable. Further, frequencies
that result in most discrimination between the different cognitive tasks were also explored. A framework
was proposed to discriminate two cognitive tasks based on the spatio-temporal patterns in EEG signals.
The proposed method used the K-means clustering algorithm to find the best electrode combination and
convolutional-attention network for feature extraction and classification. The proposed method achieved
a high recognition rate of 82.9% and 77.7%.
The results in this research suggest that a communication based BCI system can be designed using
deep learning methods. Further, this work add knowledge to the existing work in the field of communication
based BCI system
Linguistic- and Acoustic-based Automatic Dementia Detection using Deep Learning Methods
Dementia can affect a person's speech and language abilities, even in the early stages. Dementia is incurable, but early detection can enable treatment that can slow down and maintain mental function. Therefore, early diagnosis of dementia is of great importance. However, current dementia detection procedures in clinical practice are expensive, invasive, and sometimes inaccurate. In comparison, computational tools based on the automatic analysis of spoken language have the potential to be applied as a cheap, easy-to-use, and objective clinical assistance tool for dementia detection.
In recent years, several studies have shown promise in this area. However, most studies focus heavily on the machine learning aspects and, as a consequence, often lack sufficient incorporation of clinical knowledge. Many studies also concentrate on clinically less relevant tasks such as the distinction between HC and people with AD which is relatively easy and therefore less interesting both in terms of the machine learning and the clinical application.
The studies in this thesis concentrate on automatically identifying signs of neurodegenerative dementia in the early stages and distinguishing them from other clinical, diagnostic categories related to memory problems: (FMD, MCI, and HC). A key focus, when designing the proposed systems has been to better consider (and incorporate) currently used clinical knowledge and also to bear in mind how these machine-learning based systems could be translated for use in real clinical settings.
Firstly, a state-of-the-art end-to-end system is constructed for extracting linguistic information from automatically transcribed spontaneous speech. The system's architecture is based on hierarchical principles thereby mimicking those used in clinical practice where information at both word-, sentence- and paragraph-level is used when extracting information to be used for diagnosis. Secondly, hand-crafted features are designed that are based on clinical knowledge of the importance of pausing and rhythm. These are successfully joined with features extracted from the end-to-end system. Thirdly, different classification tasks are explored, each set up so as to represent the types of diagnostic decision-making that is relevant in clinical practice. Finally, experiments are conducted to explore how to better deal with the known problem of confounding and overlapping symptoms on speech and language from age and cognitive decline. A multi-task system is constructed that takes age into account while predicting cognitive decline. The studies use the publicly available DementiaBank dataset as well as the IVA dataset, which has been collected by our collaborators at the Royal Hallamshire Hospital, UK. In conclusion, this thesis proposes multiple methods of using speech and language information for dementia detection with state-of-the-art deep learning technologies, confirming the automatic system's potential for dementia detection
Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021
With the growth of social media platform influence, the effect of their
misuse becomes more and more impactful. The importance of automatic detection
of threatening and abusive language can not be overestimated. However, most of
the existing studies and state-of-the-art methods focus on English as the
target language, with limited work on low- and medium-resource languages. In
this paper, we present two shared tasks of abusive and threatening language
detection for the Urdu language which has more than 170 million speakers
worldwide. Both are posed as binary classification tasks where participating
systems are required to classify tweets in Urdu into two classes, namely: (i)
Abusive and Non-Abusive for the first task, and (ii) Threatening and
Non-Threatening for the second. We present two manually annotated datasets
containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening
and Non-Threatening. The abusive dataset contains 2400 annotated tweets in the
train part and 1100 annotated tweets in the test part. The threatening dataset
contains 6000 annotated tweets in the train part and 3950 annotated tweets in
the test part. We also provide logistic regression and BERT-based baseline
classifiers for both tasks. In this shared task, 21 teams from six countries
registered for participation (India, Pakistan, China, Malaysia, United Arab
Emirates, and Taiwan), 10 teams submitted their runs for Subtask A, which is
Abusive Language Detection and 9 teams submitted their runs for Subtask B,
which is Threatening Language detection, and seven teams submitted their
technical reports. The best performing system achieved an F1-score value of
0.880 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based
transformer model showed the best performance
Graphical scaffolding for the learning of data wrangling APIs
In order for students across the sciences to avail themselves of modern data streams, they must first know how to wrangle data: how to reshape ill-organised, tabular data into another format, and how to do this programmatically, in languages such as Python and R. Despite the cross-departmental demand and the ubiquity of data wrangling in analytical workflows, the research on how to optimise the instruction of it has been minimal. Although data wrangling as a programming domain presents distinctive challenges - characterised by on-the-fly syntax lookup and code example integration - it also presents opportunities. One such opportunity is how tabular data structures are easily visualised. To leverage the inherent visualisability of data wrangling, this dissertation evaluates three types of graphics that could be employed as scaffolding for novices: subgoal graphics, thumbnail graphics, and parameter graphics. Using a specially built e-learning platform, this dissertation documents a multi-institutional, randomised, and controlled experiment that investigates the pedagogical effects of these. Our results indicate that the graphics are well-received, that subgoal graphics boost the completion rate, and that thumbnail graphics improve navigability within a command menu. We also obtained several non-significant results, and indications that parameter graphics are counter-productive. We will discuss these findings in the context of general scaffolding dilemmas, and how they fit into a wider research programme on data wrangling instruction
Characterization of Scintillation Light in Large Liquid Argon Detectors and the Implications for Proton Decay Searches
The Deep Underground Neutrino Experiment (DUNE) is a planned long baseline neutrino experi- ment. The detector will be comprised of four modules with 10kt of active volume each, making it an ideal target to neutrino oscillation physics and searches for proton decay. ProtoDUNE-SP was a single-phase liquid argon time projection chamber - a prototype for the first far detector module of DUNE with an active volume of 700 tons operating until 2020. It was installed at the CERN Neutrino Platform and took particle beam and cosmic ray data over its two year lifespan. Liquid argon scin- tillation light is still an active subject of study with open questions about the impact of scattering and absorption in such a large detector. Here, we combine ProtoDUNE-SP cosmic-ray data with its large photon detector coverage and large drift volume to measure the Rayleigh scattering length of pure liquid argon, nitrogen contaminated argon, and a xenon doped nitrogen – argon mixture. The rayleigh scattering length of the xenon mixture was then implemented in a study of the proton decay sensitivity of a single DUNE module, to see the effects of xenon doping
- …