5,672 research outputs found
ALBERTI, a Multilingual Domain Specific Language Model for Poetry Analysis
The computational analysis of poetry is limited by the scarcity of tools to
automatically analyze and scan poems. In a multilingual settings, the problem
is exacerbated as scansion and rhyme systems only exist for individual
languages, making comparative studies very challenging and time consuming. In
this work, we present \textsc{Alberti}, the first multilingual pre-trained
large language model for poetry. Through domain-specific pre-training (DSP), we
further trained multilingual BERT on a corpus of over 12 million verses from 12
languages. We evaluated its performance on two structural poetry tasks: Spanish
stanza type classification, and metrical pattern prediction for Spanish,
English and German. In both cases, \textsc{Alberti} outperforms multilingual
BERT and other transformers-based models of similar sizes, and even achieves
state-of-the-art results for German when compared to rule-based systems,
demonstrating the feasibility and effectiveness of DSP in the poetry domain.Comment: Accepted for publication at SEPLN 2023: 39th International Conference
of the Spanish Society for Natural Language Processin
Towards Suicide Prevention from Bipolar Disorder with Temporal Symptom-Aware Multitask Learning
Bipolar disorder (BD) is closely associated with an increased risk of
suicide. However, while the prior work has revealed valuable insight into
understanding the behavior of BD patients on social media, little attention has
been paid to developing a model that can predict the future suicidality of a BD
patient. Therefore, this study proposes a multi-task learning model for
predicting the future suicidality of BD patients by jointly learning current
symptoms. We build a novel BD dataset clinically validated by psychiatrists,
including 14 years of posts on bipolar-related subreddits written by 818 BD
patients, along with the annotations of future suicidality and BD symptoms. We
also suggest a temporal symptom-aware attention mechanism to determine which
symptoms are the most influential for predicting future suicidality over time
through a sequence of BD posts. Our experiments demonstrate that the proposed
model outperforms the state-of-the-art models in both BD symptom identification
and future suicidality prediction tasks. In addition, the proposed temporal
symptom-aware attention provides interpretable attention weights, helping
clinicians to apprehend BD patients more comprehensively and to provide timely
intervention by tracking mental state progression.Comment: KDD 2023 accepte
Modular lifelong machine learning
Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge.
Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand.
This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems.
First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures.
Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations.
Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods.
Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer
Automatic Caption Generation for Aerial Images: A Survey
Aerial images have attracted attention from researcher community since long time. Generating a caption for an aerial image describing its content in comprehensive way is less studied but important task as it has applications in agriculture, defence, disaster management and many more areas. Though different approaches were followed for natural image caption generation, generating a caption for aerial image remains a challenging task due to its special nature. Use of emerging techniques from Artificial Intelligence (AI) and Natural Language Processing (NLP) domains have resulted in generation of accepted quality captions for aerial images. However lot needs to be done to fully utilize potential of aerial image caption generation task. This paper presents detail survey of the various approaches followed by researchers for aerial image caption generation task. The datasets available for experimentation, criteria used for performance evaluation and future directions are also discussed
The Globalization of Artificial Intelligence: African Imaginaries of Technoscientific Futures
Imaginaries of artificial intelligence (AI) have transcended geographies of the Global North and become increasingly entangled with narratives of economic growth, progress, and modernity in Africa. This raises several issues such as the entanglement of AI with global technoscientific capitalism and its impact on the dissemination of AI in Africa. The lack of African perspectives on the development of AI exacerbates concerns of raciality and inclusion in the scientific research, circulation, and adoption of AI. My argument in this dissertation is that innovation in AI, in both its sociotechnical imaginaries and political economies, excludes marginalized countries, nations and communities in ways that not only bar their participation in the reception of AI, but also as being part and parcel of its creation.
Underpinned by decolonial thinking, and perspectives from science and technology studies and African studies, this dissertation looks at how AI is reconfiguring the debate about development and modernization in Africa and the implications for local sociotechnical practices of AI innovation and governance. I examined AI in international development and industry across Kenya, Ghana, and Nigeria, by tracing Canadaâs AI4D Africa program and following AI start-ups at AfriLabs. I used multi-sited case studies and discourse analysis to examine the data collected from interviews, participant observations, and documents.
In the empirical chapters, I first examine how local actors understand the notion of decolonizing AI and show that it has become a sociotechnical imaginary. I then investigate the political economy of AI in Africa and argue that despite Western efforts to integrate the African AI ecosystem globally, the AI epistemic communities in the continent continue to be excluded from dominant AI innovation spaces. Finally, I examine the emergence of a Pan-African AI imaginary and argue that AI governance can be understood as a state-building experiment in post-colonial Africa. The main issue at stake is that the lack of African perspectives in AI leads to negative impacts on innovation and limits the fair distribution of the benefits of AI across nations, countries, and communities, while at the same time excludes globally marginalized epistemic communities from the imagination and creation of AI
A view of colonial life in South Australia: An osteological investigation of the health status among 19th-century migrant settlers
Studies of human skeletal remains contribute to understanding the extent to which conditions
prevailing in various past communities were detrimental to health. Few of these studies have
evaluated the situation in which the first European colonists of South Australia lived.
Colonial Australian skeletal collections are scarce, especially for research purposes. This
makes the 19th-century skeletal remains of individuals, excavated from St Maryâs Cemetery,
South Australia, a rare and valuable collection.
The overarching aim of this thesis was to investigate the general and oral health of this
specific group of 19th-century settlers, through the examination of their skeletons and
dentitions. Four research papers in this thesis address this overarching aim. The first two
papers determine the general skeletal health of the settlers, with a focus on pathological
manifestations on bones associated with metabolic deficiencies and the demands of
establishing an industrial society. Paper 3 investigated whether Large Volume Micro-
Computed Tomography (LV Micro-CT) could be used as a single technique for the analysis
of the in situ dentoalveolar complex of individuals from St Maryâs. This led to a detailed
investigation of the dentitions of the St Maryâs sample, in paper 4, with the aims of
determining the oral health status of these individuals, and understanding how oral conditions
may have influenced their general health.
The skeletal remains of 65 individuals (20 adults and 45 subadults) from St Maryâs sample
were available for the four component investigations using non-destructive techniques -
macroscopic, radiographic and micro-CT methods.
Signs of nutritional deficiencies (vitamin C and iron) were identified in Paper 1. The findings
of paper 2 showed joint diseases and traumatic fractures were seen and that gastrointestinal and pulmonary conditions were the leading causes of death in subadults and adults
respectively. Paper 3 found that the LV Micro-CT technique was the only method able to
generate images that allowed the full range of detailed measurements across all the oral
health categories studied. A combination of macroscopic and radiographic techniques
covered a number of these categories, but was more time-consuming, and did not provide the
same level of accuracy or include all measurements. Results for paper 4 confirmed that
extensive carious lesions, antemortem tooth loss and evidence of periodontal disease were
present in the St Maryâs sample. Developmental defects of enamel (EH) and areas of
interglobular dentine (IGD) were identified. Many individuals with dental defects also had
skeletal signs of co-morbidities. St Maryâs individuals had a similar percentage of carious
lesions as the British sample, which was more than other historic Australian samples, but less
than a contemporary New Zealand sample.
The 19th-century migrants to the colony of South Australia were faced with multiple
challenges such as adapting to local environmental conditions as well as participating in the
development of settlements, infrastructure and new industries. Evidence of joint diseases,
traumatic injuries and health insults, seen as pathological changes and/ or abnormalities on
the bone and/or teeth, confirmed that the settlers' health had been affected. The number of
burials in the âfree groundâ area between the 1840s -1870s was greater than the number in the
leased plots, reflecting the economic problems of the colony during these early years.
Validation of the reliability and accuracy of the LV Micro-CT system for the analysis of the
dentoalveolar complex, in situ within archaeological human skull samples, provided a
microanalytical approach for the in-depth investigations of the St Maryâs dentition. Extensive
carious lesions, antemortem tooth loss and periodontal disease seen in this group would have
affected their general health status. The presence of developmental defects (EH and IGD)
indicated that many of the settlers had suffered health insults in childhood to young adulthood. Contemporaneous Australian, New Zealand and British samples had comparable
findings suggesting that little improvement had occurred in their oral health since arriving in
South Australia.
In conclusion, the findings of this investigation largely fulfilled the initial aims. Our
understanding of the extent to which conditions prevailing in the new colony were
detrimental to human health has increased, as has our knowledge of why pathological
manifestations and/or abnormalities were seen on the bones and teeth of individuals from the
St Maryâs sample. A multiple-method approach, to derive enhanced information has been
shown to be effective, whilst establishing a new methodology (LV Micro-CT) for the analysis
of dentition in situ in human archaeological skulls. Further, this investigation has digitally
preserved data relating to this historical group of individuals for future comparisons.Thesis (Ph.D.) -- University of Adelaide, School of Biomedicine, 202
Deep Transfer Learning Applications in Intrusion Detection Systems: A Comprehensive Review
Globally, the external Internet is increasingly being connected to the
contemporary industrial control system. As a result, there is an immediate need
to protect the network from several threats. The key infrastructure of
industrial activity may be protected from harm by using an intrusion detection
system (IDS), a preventive measure mechanism, to recognize new kinds of
dangerous threats and hostile activities. The most recent artificial
intelligence (AI) techniques used to create IDS in many kinds of industrial
control networks are examined in this study, with a particular emphasis on
IDS-based deep transfer learning (DTL). This latter can be seen as a type of
information fusion that merge, and/or adapt knowledge from multiple domains to
enhance the performance of the target task, particularly when the labeled data
in the target domain is scarce. Publications issued after 2015 were taken into
account. These selected publications were divided into three categories:
DTL-only and IDS-only are involved in the introduction and background, and
DTL-based IDS papers are involved in the core papers of this review.
Researchers will be able to have a better grasp of the current state of DTL
approaches used in IDS in many different types of networks by reading this
review paper. Other useful information, such as the datasets used, the sort of
DTL employed, the pre-trained network, IDS techniques, the evaluation metrics
including accuracy/F-score and false alarm rate (FAR), and the improvement
gained, were also covered. The algorithms, and methods used in several studies,
or illustrate deeply and clearly the principle in any DTL-based IDS subcategory
are presented to the reader
A novel Auto-ML Framework for Sarcasm Detection
Many domains have sarcasm or verbal irony presented in the text of reviews, tweets, comments, and dialog discussions. The purpose of this research is to classify sarcasm for multiple domains using the deep learning based AutoML framework. The proposed AutoML framework has five models in the model search pipeline, these five models are the combination of convolutional neural network (CNN), Long Short-Term Memory (LSTM), deep neural network (DNN), and Bidirectional Long Short-Term Memory (BiLSTM). The hybrid combination of CNN, LSTM, and DNN models are presented as CNN-LSTM-DNN, LSTM-DNN, BiLSTM-DNN, and CNN-BiLSTM-DNN. This work has proposed the algorithms that contrast polarities between terms and phrases, which are categorized into implicit and explicit incongruity categories. The incongruity and pragmatic features like punctuation, exclamation marks, and others integrated into the AutoML DeepConcat framework models. That integration was possible when the DeepConcat AutoML framework initiate a model search pipeline for five models to achieve better performance. Conceptually, DeepConcat means that model will integrate with generalized features. It was evident that the pretrain model BiLSTM achieved a better performance of 0.98 F1 when compared with the other five model performances. Similarly, the AutoML based BiLSTM-DNN model achieved the best performance of 0.98 F1, which is better than core approaches and existing state-of-the-art Tweeter tweet dataset, Amazon reviews, and dialog discussion comments. The proposed AutoML framework has compared performance metrics F1 and AUC and discovered that F1 is better than AUC. The integration of all feature categories achieved a better performance than the individual category of pragmatic and incongruity features. This research also evaluated the performance of the dropout layer hyperparameter and it achieved better performance than the fixed percentage like 10% of dropout parameter of the AutoML based Bayesian optimization. Proposed AutoML framework DeepConcat evaluated best pretrain models BiLSTM-DNN and CNN-CNN-DNN to transfer knowledge across domains like Amazon reviews and Dialog discussion comments (text) using the last strategy, full layer, and our fade-out freezing strategies. In the transfer learning fade-out strategy outperformed the existing state-of-the-art model BiLSTM-DNN, the performance is 0.98 F1 on tweets, 0.85 F1 on Amazon reviews, and 0.87 F1 on the dialog discussion SCV2-Gen dataset. Further, all strategies with various domains can be compared for the best model selection
Neural Natural Language Generation: A Survey on Multilinguality, Multimodality, Controllability and Learning
Developing artificial learning systems that can understand and generate natural language has been one of the long-standing goals of artificial intelligence. Recent decades have witnessed an impressive progress on both of these problems, giving rise to a new family of approaches. Especially, the advances in deep learning over the past couple of years have led to neural approaches to natural language generation (NLG). These methods combine generative language learning techniques with neural-networks based frameworks. With a wide range of applications in natural language processing, neural NLG (NNLG) is a new and fast growing field of research. In this state-of-the-art report, we investigate the recent developments and applications of NNLG in its full extent from a multidimensional view, covering critical perspectives such as multimodality, multilinguality, controllability and learning strategies. We summarize the fundamental building blocks of NNLG approaches from these aspects and provide detailed reviews of commonly used preprocessing steps and basic neural architectures. This report also focuses on the seminal applications of these NNLG models such as machine translation, description generation, automatic speech recognition, abstractive summarization, text simplification, question answering and generation, and dialogue generation. Finally, we conclude with a thorough discussion of the described frameworks by pointing out some open research directions.This work has been partially supported by the European Commission ICT COST Action âMulti-task, Multilingual, Multi-modal Language Generationâ (CA18231). AE was supported by BAGEP 2021 Award of the Science Academy. EE was supported in part by TUBA GEBIP 2018 Award. BP is in in part funded by Independent Research Fund Denmark (DFF) grant 9063-00077B. IC has received funding from the European Unionâs Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 838188. EL is partly funded by Generalitat Valenciana and the Spanish Government throught projects PROMETEU/2018/089 and RTI2018-094649-B-I00, respectively. SMI is partly funded by UNIRI project uniri-drustv-18-20. GB is partly supported by the Ministry of Innovation and the National Research, Development and Innovation Office within the framework of the Hungarian Artificial Intelligence National Laboratory Programme. COT is partially funded by the Romanian Ministry of European Investments and Projects through the Competitiveness Operational Program (POC) project âHOLOTRAINâ (grant no. 29/221 ap2/07.04.2020, SMIS code: 129077) and by the German Academic Exchange Service (DAAD) through the project âAWAKEN: content-Aware and netWork-Aware faKE News mitigationâ (grant no. 91809005). ESA is partially funded by the German Academic Exchange Service (DAAD) through the project âDeep-Learning Anomaly Detection for Human and Automated Users Behaviorâ (grant no. 91809358)
- âŠ