2,349 research outputs found
Arabic Tweet Act: A Weighted Ensemble Pre-Trained Transformer Model for Classifying Arabic Speech Acts on Twitter
Speech acts are a speakers actions when performing an utterance within a
conversation, such as asking, recommending, greeting, or thanking someone,
expressing a thought, or making a suggestion. Understanding speech acts helps
interpret the intended meaning and actions behind a speakers or writers words.
This paper proposes a Twitter dialectal Arabic speech act classification
approach based on a transformer deep learning neural network. Twitter and
social media, are becoming more and more integrated into daily life. As a
result, they have evolved into a vital source of information that represents
the views and attitudes of their users. We proposed a BERT based weighted
ensemble learning approach to integrate the advantages of various BERT models
in dialectal Arabic speech acts classification. We compared the proposed model
against several variants of Arabic BERT models and sequence-based models. We
developed a dialectal Arabic tweet act dataset by annotating a subset of a
large existing Arabic sentiment analysis dataset (ASAD) based on six speech act
categories. We also evaluated the models on a previously developed Arabic Tweet
Act dataset (ArSAS). To overcome the class imbalance issue commonly observed in
speech act problems, a transformer-based data augmentation model was
implemented to generate an equal proportion of speech act categories. The
results show that the best BERT model is araBERTv2-Twitter models with a
macro-averaged F1 score and an accuracy of 0.73 and 0.84, respectively. The
performance improved using a BERT-based ensemble method with a 0.74 and 0.85
averaged F1 score and accuracy on our dataset, respectively.Comment: 16 pages, 6 figure
Natural Language Processing: Emerging Neural Approaches and Applications
This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains
Accessing spoken interaction through dialogue processing [online]
Zusammenfassung
Unser Leben, unsere Leistungen und unsere Umgebung, alles wird
derzeit durch Schriftsprache dokumentiert. Die rasante
Fortentwicklung der technischen Möglichkeiten Audio, Bilder und
Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt
werden um die schriftliche Dokumentation von menschlicher
Kommunikation, zum Beispiel Meetings, zu unterstützen, zu
ergänzen oder gar zu ersetzen. Diese neuen Technologien können
uns in die Lage versetzen Information aufzunehmen, die
anderweitig verloren gehen, die Kosten der Dokumentation zu
senken und hochwertige Dokumente mit audiovisuellem Material
anzureichern. Die Indizierung solcher Aufnahmen stellt die
Kerntechnologie dar um dieses Potential auszuschöpfen. Diese
Arbeit stellt effektive Alternativen zu schlüsselwortbasierten
Indizes vor, die Suchraumeinschränkungen bewirken und teilweise
mit einfachen Mitteln zu berechnen sind.
Die Indizierung von Sprachdokumenten kann auf verschiedenen
Ebenen erfolgen: Ein Dokument gehört stilistisch einer
bestimmten Datenbasis an, welche durch sehr einfache Merkmale
bei hoher Genauigkeit automatisch bestimmt werden kann.
Durch diese Art von Klassifikation kann eine Reduktion des
Suchraumes um einen Faktor der Größenordnung 410 erfolgen. Die
Anwendung von thematischen Merkmalen zur Textklassifikation
bei einer Nachrichtendatenbank resultiert in einer Reduktion um
einen Faktor 18. Da Sprachdokumente sehr lang sein können müssen
sie in thematische Segmente unterteilt werden. Ein neuer
probabilistischer Ansatz sowie neue Merkmale (Sprecherinitia
tive und Stil) liefern vergleichbare oder bessere Resultate als
traditionelle schlüsselwortbasierte Ansätze. Diese thematische
Segmente können durch die vorherrschende Aktivität
charakterisiert werden (erzählen, diskutieren, planen, ...),
die durch ein neuronales Netz detektiert werden kann. Die
Detektionsraten sind allerdings begrenzt da auch Menschen
diese Aktivitäten nur ungenau bestimmen. Eine maximale
Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten
Daten theoretisch möglich. Eine thematische Klassifikation
dieser Segmente wurde ebenfalls auf einer Datenbasis
durchgeführt, die Detektionsraten für diesen Index sind jedoch
gering.
Auf der Ebene der einzelnen Äußerungen können Dialogakte wie
Aussagen, Fragen, Rückmeldungen (aha, ach ja, echt?, ...) usw.
mit einem diskriminativ trainierten Hidden Markov Model erkannt
werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen
wie Frage/AntwortSpielen erweitert werden (Dialogspiele).
Dialogakte und spiele können eingesetzt werden um
Klassifikatoren für globale Sprechstile zu bauen. Ebenso
könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz
erinnern und versuchen, diese in einer grafischen
Repräsentation wiederzufinden.
In einer Studie mit sehr pessimistischen Annahmen konnten
Benutzer eines aus vier ähnlichen und gleichwahrscheinlichen
Gesprächen mit einer Genauigkeit von ~ 43% durch eine graphische
Repräsentation von Aktivität bestimmt.
Dialogakte könnte in diesem Szenario ebenso nützlich sein, die
Benutzerstudie konnte aufgrund der geringen Datenmenge darüber
keinen endgültigen Aufschluß geben. Die Studie konnte allerdings
für detailierte Basismerkmale wie Formalität und
Sprecheridentität keinen Effekt zeigen.
Abstract
Written language is one of our primary means for documenting our
lives, achievements, and environment. Our capabilities to
record, store and retrieve audio, still pictures, and video are
undergoing a revolution and may support, supplement or even
replace written documentation. This technology enables us to
record information that would otherwise be lost, lower the cost
of documentation and enhance highquality documents with
original audiovisual material.
The indexing of the audio material is the key technology to
realize those benefits. This work presents effective
alternatives to keyword based indices which restrict the search
space and may in part be calculated with very limited resources.
Indexing speech documents can be done at a various levels:
Stylistically a document belongs to a certain database which can
be determined automatically with high accuracy using very simple
features. The resulting factor in search space reduction is in
the order of 410 while topic classification yielded a factor
of 18 in a news domain.
Since documents can be very long they need to be segmented into
topical regions. A new probabilistic segmentation framework as
well as new features (speaker initiative and style) prove to be
very effective compared to traditional keyword based methods. At
the topical segment level activities (storytelling, discussing,
planning, ...) can be detected using a machine learning approach
with limited accuracy; however even human annotators do not
annotate them very reliably. A maximum search space reduction
factor of 6 is theoretically possible on the databases used. A
topical classification of these regions has been attempted
on one database, the detection accuracy for that index, however,
was very low.
At the utterance level dialogue acts such as statements,
questions, backchannels (aha, yeah, ...), etc. are being
recognized using a novel discriminatively trained HMM procedure.
The procedure can be extended to recognize short sequences such
as question/answer pairs, so called dialogue games.
Dialog acts and games are useful for building classifiers for
speaking style. Similarily a user may remember a certain dialog
act sequence and may search for it in a graphical
representation.
In a study with very pessimistic assumptions users are able to
pick one out of four similar and equiprobable meetings correctly
with an accuracy ~ 43% using graphical activity information.
Dialogue acts may be useful in this situation as well but the
sample size did not allow to draw final conclusions. However the
user study fails to show any effect for detailed basic features
such as formality or speaker identity
A Comprehensive Overview of Large Language Models
Large Language Models (LLMs) have shown excellent generalization capabilities
that have led to the development of numerous models. These models propose
various new architectures, tweaking existing architectures with refined
training strategies, increasing context length, using high-quality training
data, and increasing training time to outperform baselines. Analyzing new
developments is crucial for identifying changes that enhance training stability
and improve generalization in LLMs. This survey paper comprehensively analyses
the LLMs architectures and their categorization, training strategies, training
datasets, and performance evaluations and discusses future research directions.
Moreover, the paper also discusses the basic building blocks and concepts
behind LLMs, followed by a complete overview of LLMs, including their important
features and functions. Finally, the paper summarizes significant findings from
LLM research and consolidates essential architectural and training strategies
for developing advanced LLMs. Given the continuous advancements in LLMs, we
intend to regularly update this paper by incorporating new sections and
featuring the latest LLM models
Participative Urban Health and Healthy Aging in the Age of AI
This open access book constitutes the refereed proceedings of the 18th International Conference on String Processing and Information Retrieval, ICOST 2022, held in Paris, France, in June 2022. The 15 full papers and 10 short papers presented in this volume were carefully reviewed and selected from 33 submissions. They cover topics such as design, development, deployment, and evaluation of AI for health, smart urban environments, assistive technologies, chronic disease management, and coaching and health telematics systems
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
A Conversational Movie Recommender System
Master's thesis in Electrical and Computer EngineeringThe purpose of a Conversational Recommender System is to help the users achieve their recommendation specific goals using a multi-turn dialogue. In recent years, numerous studies are conducted on improving the quality attributes of a conversational recommender system. Multiple conversational movie recommender systems are proposed. However, there is a need for a conversational system for a movie recommendation, which can be used for research purposes.
The main goal of this thesis is to create Jarvis, an open-source, rule-based conversational movie recommendation system focusing on understanding the users' goals and adapting to their changing requirements. In order to understand the users' goals, a database is created, which contains the attributes with higher coverage of possible users' goals. A multi-model chat interface is designed for Jarvis. This interface introduces the components for better user interaction and providing users a guide during the conversation.
The success of a conversational system is measured in terms of the quality of the conversation and the satisfaction of the users. To guarantee the success of Jarvis, the conversation of the system with different users is recorded. Moreover, the users are requested to rate their conversation and give feedback about the system. The behavior of the system during the conversation and user feedback is studied to improve Jarvis.
The results have shown that conversational data and users' feedback plays an essential role in improving the performance of Jarvis. The users' satisfaction has improved, and the system adapts better to the previously unknown scenarios in the conversation. However, to make the system more adjustable and user-friendly, more users are required to test the system.submittedVersio
Ubiquitous Technologies for Emotion Recognition
Emotions play a very important role in how we think and behave. As such, the emotions we feel every day can compel us to act and influence the decisions and plans we make about our lives. Being able to measure, analyze, and better comprehend how or why our emotions may change is thus of much relevance to understand human behavior and its consequences. Despite the great efforts made in the past in the study of human emotions, it is only now, with the advent of wearable, mobile, and ubiquitous technologies, that we can aim to sense and recognize emotions, continuously and in real time. This book brings together the latest experiences, findings, and developments regarding ubiquitous sensing, modeling, and the recognition of human emotions
- …