1,192 research outputs found
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and âenablersâ, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Knowledge Augmentation in Language Models to Overcome Domain Adaptation and Scarce Data Challenges in Clinical Domain
The co-existence of two scenarios, âthe massive amount of unstructured text data that humanity producesâ and âthe scarcity of sufficient training data to train language models,â in the healthcare domain have multifold increased the need for intelligent tools and techniques to process, interpret and extract different types of knowledge from the data.
My research goal in this thesis is to develop intelligent methods and models to automatically better interpret human language and sentiments, particularly its structure and semantics, to solve multiple higher-level Natural Language Processing (NLP) downstream tasks and beyond.
This thesis is spread over six chapters and is divided into two parts based on the contributions. The first part is centered on best practices for modeling data and injecting domain knowledge to enrich data semantics applied to tackle several classification tasks in the healthcare domain and beyond. The contribution is to reduce the training time, improve the performance of classification models, and use world knowledge as a source of domain knowledge when working with limited/small training data. The second part introduces the one of its kind high-quality dataset of Motivational Interviewing (MI), AnnoMI, followed by the experimental benchmarking analysis for AnnoMI. The contribution accounts to provide a publicly accessible dataset of Motivational Interviewing and methods to overcome data scarcity challenges in complex domains (such as mental health). The overall organization of the thesis is as follows: \\
The first chapter provides a high-level introduction to the tools and techniques applied in the scope of the thesis.
The second chapter presents optimal methods for (i) feature selection, (ii) eliminating irrelevant and superfluous attributes from the dataset, (iii) data preprocessing, and (iv) advanced data representation methods (word embedding and bag-of-words) to model data.
The third chapter introduces the Language Model (LM), K-LM, a combination of Generative Pretrained Transformer (GPT)-2 and Bidirectional Encoder Representations from Transformers (BERT) that uses knowledge graphs to inject domain knowledge for domain adaptation tasks. The end goal of this chapter is to reduce the training time and improve the performance of classification models when working with limited/small training data.
The fourth chapter introduces the high-quality dataset of expert-annotated MI (AnnoMI), comprised of 133 therapy session transcriptions distributed over 44 topics (including smoking cessation, anxiety management, weight loss, etc.), and provides an in-depth analysis of the dataset. \\
The fifth chapter presents the experimental analysis with AnnoMI, which includes (i) augmentation techniques to generate data and (ii) fairness and bias assessments of the employed Classical Machine Learning (CML) and Deep Learning (DL) approach to develop reliable classification models.
Finally, the sixth chapter provides the conclusion and outcomes of all the work presented in this thesis. The scientific contributions of this thesis include the solution to overcome the challenges of scarce training data in complex domains and domain adaptation in LMs. The practical contributions of the thesis are data resources and the language model for a range of quantitative and qualitative NLP applications.
Keywords: Natural Language Processing, Domain Adaptation, Motivational Interviewing, AI Fairness and Bias, Data Augmentation, GPT, BERT, Healthcare
Argument-based generation and explanation of recommendations
In the recommender systems literature, it has been shown that, in addition to improving system effectiveness, explaining recommendations may increase user satisfaction, trust, persuasion and loyalty. In general, explanations focus on the filtering algorithms or the users and items involved in the generation of recommendations. However, on certain domains that are rich on user-generated textual content, it would be valuable to provide justifications of recommendations according to arguments that are explicit, underlying or related with the data used by the systems, e.g., the reasons for customers' opinions in reviews of e-commerce sites, and the requests and claims in citizens' proposals and debates of e-participation platforms. In this context, there is a need and challenging task to automatically extract and exploit the arguments given for and against evaluated items. We thus advocate to focus not only on user preferences and item features, but also on associated arguments. In other words, we propose to not only consider what is said about items, but also why it is said. Hence, arguments would not only be part of the recommendation explanations, but could also be used by the recommendation algorithms themselves. To this end, in this thesis, we propose to use argument mining techniques and tools that allow retrieving and relating argumentative information from textual content, and investigate recommendation methods that exploit that information before, during and after their filtering processesThe author thanks his supervisor IvĂĄn Cantador for his valuable support and guidance in defining this thesis project. The work
is supported by the Spanish Ministry of Science and Innovation (PID2019-108965GB-I00
Feature Augmentation for Improved Topic Modeling of Youtube Lecture Videos using Latent Dirichlet Allocation
Application of Topic Models in text mining of educational data and more specifically, the text data obtained from lecture videos, is an area of research which is largely unexplored yet holds great potential. This work seeks to find empirical evidence for an improvement in Topic Modeling by pre- extracting bigram tokens and adding them as additional features in the Latent Dirichlet Allocation (LDA) algorithm, a widely-recognized topic modeling technique. The dataset considered for analysis is a collection of transcripts of video lectures on Machine Learning scraped from YouTube. Using the cosine similarity distance measure as a metric, the experiment showed a statistically significant improvement in topic model performance against the baseline topic model which did not use extra features, thus confirming the hypothesis. By introducing explainable features before modeling and using deep learning based text representation only at the post-modeling evaluation stage, the overall model interpretability is retained. This empowers educators and researchers alike to not only benefit from the LDA model in their own fields but also to play a substantial role in eorts to improve model performance. It also sets the direction for future work which could use the feature augmented topic model as the input to other more common text mining tasks like document categorization and information retrieval
Accessing spoken interaction through dialogue processing [online]
Zusammenfassung
Unser Leben, unsere Leistungen und unsere Umgebung, alles wird
derzeit durch Schriftsprache dokumentiert. Die rasante
Fortentwicklung der technischen Möglichkeiten Audio, Bilder und
Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt
werden um die schriftliche Dokumentation von menschlicher
Kommunikation, zum Beispiel Meetings, zu unterstĂŒtzen, zu
ergÀnzen oder gar zu ersetzen. Diese neuen Technologien können
uns in die Lage versetzen Information aufzunehmen, die
anderweitig verloren gehen, die Kosten der Dokumentation zu
senken und hochwertige Dokumente mit audiovisuellem Material
anzureichern. Die Indizierung solcher Aufnahmen stellt die
Kerntechnologie dar um dieses Potential auszuschöpfen. Diese
Arbeit stellt effektive Alternativen zu schlĂŒsselwortbasierten
Indizes vor, die SuchraumeinschrÀnkungen bewirken und teilweise
mit einfachen Mitteln zu berechnen sind.
Die Indizierung von Sprachdokumenten kann auf verschiedenen
Ebenen erfolgen: Ein Dokument gehört stilistisch einer
bestimmten Datenbasis an, welche durch sehr einfache Merkmale
bei hoher Genauigkeit automatisch bestimmt werden kann.
Durch diese Art von Klassifikation kann eine Reduktion des
Suchraumes um einen Faktor der GröĂenordnung 4Â10 erfolgen. Die
Anwendung von thematischen Merkmalen zur Textklassifikation
bei einer Nachrichtendatenbank resultiert in einer Reduktion um
einen Faktor 18. Da Sprachdokumente sehr lang sein können mĂŒssen
sie in thematische Segmente unterteilt werden. Ein neuer
probabilistischer Ansatz sowie neue Merkmale (SprecherinitiaÂ
tive und Stil) liefern vergleichbare oder bessere Resultate als
traditionelle schlĂŒsselwortbasierte AnsĂ€tze. Diese thematische
Segmente können durch die vorherrschende AktivitÀt
charakterisiert werden (erzÀhlen, diskutieren, planen, ...),
die durch ein neuronales Netz detektiert werden kann. Die
Detektionsraten sind allerdings begrenzt da auch Menschen
diese AktivitÀten nur ungenau bestimmen. Eine maximale
Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten
Daten theoretisch möglich. Eine thematische Klassifikation
dieser Segmente wurde ebenfalls auf einer Datenbasis
durchgefĂŒhrt, die Detektionsraten fĂŒr diesen Index sind jedoch
gering.
Auf der Ebene der einzelnen ĂuĂerungen können Dialogakte wie
Aussagen, Fragen, RĂŒckmeldungen (aha, ach ja, echt?, ...) usw.
mit einem diskriminativ trainierten Hidden Markov Model erkannt
werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen
wie Frage/AntwortÂSpielen erweitert werden (Dialogspiele).
Dialogakte und Âspiele können eingesetzt werden um
Klassifikatoren fĂŒr globale Sprechstile zu bauen. Ebenso
könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz
erinnern und versuchen, diese in einer grafischen
ReprÀsentation wiederzufinden.
In einer Studie mit sehr pessimistischen Annahmen konnten
Benutzer eines aus vier Àhnlichen und gleichwahrscheinlichen
GesprÀchen mit einer Genauigkeit von ~ 43% durch eine graphische
ReprÀsentation von AktivitÀt bestimmt.
Dialogakte könnte in diesem Szenario ebenso nĂŒtzlich sein, die
Benutzerstudie konnte aufgrund der geringen Datenmenge darĂŒber
keinen endgĂŒltigen AufschluĂ geben. Die Studie konnte allerdings
fĂŒr detailierte Basismerkmale wie FormalitĂ€t und
SprecheridentitÀt keinen Effekt zeigen.
Abstract
Written language is one of our primary means for documenting our
lives, achievements, and environment. Our capabilities to
record, store and retrieve audio, still pictures, and video are
undergoing a revolution and may support, supplement or even
replace written documentation. This technology enables us to
record information that would otherwise be lost, lower the cost
of documentation and enhance highÂquality documents with
original audiovisual material.
The indexing of the audio material is the key technology to
realize those benefits. This work presents effective
alternatives to keyword based indices which restrict the search
space and may in part be calculated with very limited resources.
Indexing speech documents can be done at a various levels:
Stylistically a document belongs to a certain database which can
be determined automatically with high accuracy using very simple
features. The resulting factor in search space reduction is in
the order of 4Â10 while topic classification yielded a factor
of 18 in a news domain.
Since documents can be very long they need to be segmented into
topical regions. A new probabilistic segmentation framework as
well as new features (speaker initiative and style) prove to be
very effective compared to traditional keyword based methods. At
the topical segment level activities (storytelling, discussing,
planning, ...) can be detected using a machine learning approach
with limited accuracy; however even human annotators do not
annotate them very reliably. A maximum search space reduction
factor of 6 is theoretically possible on the databases used. A
topical classification of these regions has been attempted
on one database, the detection accuracy for that index, however,
was very low.
At the utterance level dialogue acts such as statements,
questions, backchannels (aha, yeah, ...), etc. are being
recognized using a novel discriminatively trained HMM procedure.
The procedure can be extended to recognize short sequences such
as question/answer pairs, so called dialogue games.
Dialog acts and games are useful for building classifiers for
speaking style. Similarily a user may remember a certain dialog
act sequence and may search for it in a graphical
representation.
In a study with very pessimistic assumptions users are able to
pick one out of four similar and equiprobable meetings correctly
with an accuracy ~ 43% using graphical activity information.
Dialogue acts may be useful in this situation as well but the
sample size did not allow to draw final conclusions. However the
user study fails to show any effect for detailed basic features
such as formality or speaker identity
Arabic Text Classification Framework Based on Latent Dirichlet Allocation
In this paper, we present a new algorithm based on the LDA (Latent Dirichlet Allocation) and the Support Vector Machine (SVM) used in the classification of Arabic texts.Current research usually adopts Vector Space Model to represent documents in Text Classification applications. In this way, document is coded as a vector of words; n-grams. These features cannot indicate semantic or textual content; it results in huge feature space and semantic loss. The proposed model in this work adopts a âtopicsâ sampled by LDA model as text features. It effectively avoids the above problems. We extracted significant themes (topics) of all texts, each theme is described by a particular distribution of descriptors, then each text is represented on the vectors of these topics. Experiments are conducted using an in-house corpus of Arabic texts. Precision, recall and F-measure are used to quantify categorization effectiveness. The results show that the proposed LDA-SVM algorithm is able to achieve high effectiveness for Arabic text classification task (Macro-averaged F1 88.1% and Micro-averaged F1 91.4%)
- âŠ