5,967 research outputs found
Machine Learning
Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
A review of natural language processing in contact centre automation
Contact centres have been highly valued by organizations for a long time. However, the COVID-19 pandemic has highlighted their critical importance in ensuring business continuity, economic activity, and quality customer support. The pandemic has led to an increase in customer inquiries related to payment extensions, cancellations, and stock inquiries, each with varying degrees of urgency. To address this challenge, organizations have taken the opportunity to re-evaluate the function of contact centres and explore innovative solutions. Next-generation platforms that incorporate machine learning techniques and natural language processing, such as self-service voice portals and chatbots, are being implemented to enhance customer service. These platforms offer robust features that equip customer agents with the necessary tools to provide exceptional customer support. Through an extensive review of existing literature, this paper aims to uncover research gaps and explore the advantages of transitioning to a contact centre that utilizes natural language solutions as the norm. Additionally, we will examine the major challenges faced by contact centre organizations and offer reco
Argument Component Classification for Classroom Discussions
This paper focuses on argument component classification for transcribed
spoken classroom discussions, with the goal of automatically classifying
student utterances into claims, evidence, and warrants. We show that an
existing method for argument component classification developed for another
educationally-oriented domain performs poorly on our dataset. We then show that
feature sets from prior work on argument mining for student essays and online
dialogues can be used to improve performance considerably. We also provide a
comparison between convolutional neural networks and recurrent neural networks
when trained under different conditions to classify argument components in
classroom discussions. While neural network models are not always able to
outperform a logistic regression model, we were able to gain some useful
insights: convolutional networks are more robust than recurrent networks both
at the character and at the word level, and specificity information can help
boost performance in multi-task training
ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications
Personal assistants, automatic speech recognizers and dialogue understanding
systems are becoming more critical in our interconnected digital world. A clear
example is air traffic control (ATC) communications. ATC aims at guiding
aircraft and controlling the airspace in a safe and optimal manner. These
voice-based dialogues are carried between an air traffic controller (ATCO) and
pilots via very-high frequency radio channels. In order to incorporate these
novel technologies into ATC (low-resource domain), large-scale annotated
datasets are required to develop the data-driven AI systems. Two examples are
automatic speech recognition (ASR) and natural language understanding (NLU). In
this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering
research on the challenging ATC field, which has lagged behind due to lack of
annotated data. The ATCO2 corpus covers 1) data collection and pre-processing,
2) pseudo-annotations of speech data, and 3) extraction of ATC-related named
entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set
corpus contains 4 hours of ATC speech with manual transcripts and a subset with
gold annotations for named-entity recognition (callsign, command, value). 2)
The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched
with automatic transcripts from an in-domain speech recognizer, contextual
information, speaker turn information, signal-to-noise ratio estimate and
English language detection score per sample. Both available for purchase
through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3)
The ATCO2-test-set-1h corpus is a one-hour subset from the original test set
corpus, that we are offering for free at https://www.atco2.org/data. We expect
the ATCO2 corpus will foster research on robust ASR and NLU not only in the
field of ATC communications but also in the general research community.Comment: Manuscript under review; The code will be available at
https://github.com/idiap/atco2-corpu
Corpus based classification of text in Australian contracts
Written contracts are a fundamental
framework for commercial and cooperative transactions and relationships. Limited research has been published on the application of machine learning and natural language processing (NLP) to contracts.
In this paper we report the classification of components of contract texts using machine learning and hand-coded methods.
Authors studying a range of domains have found that combining machine learning and rule based approaches increases accuracy of machine learning. We find similar results which suggest the utility of considering leveraging hand coded classification rules for machine learning. We attained an average accuracy of 83.48% on a multiclass labelling task on 20 contracts combining machine learning and rule based approaches, increasing performance over machine learning alone
- …