601 research outputs found
Multilingual adaptive search for digital libraries
This paper describes a framework for Adaptive Multilingual Information Retrieval (AMIR) which allows multilingual resource discovery and delivery using on-the-fly machine translation of documents and queries. Result documents
are presented to the user in a contextualised manner. Challenges and affordances of both Adaptive and Multilingual IR, with a particular focus on Digital Libraries, are detailed. The framework components are motivated by a series of results from experiments on query logs and documents from The European Library. We conclude that factoring adaptivity and multilinguality aspects into the search process can enhance the user’s experience with online Digital Libraries
Multilingual Information Access: Practices and Perceptions of Bi/multilingual Academic Users
The research reported in this dissertation explored linguistic determinants in online information searching, and examined to what extent bi/multilingual academic users utilize Multilingual Information Access (MLIA) tools and what impact these have on their information searching behavior.
The aim of the study was three-pronged: to provide tangible data that can support recommendations for the effective user-centered design of Multilingual Information Retrieval (MLIR) systems; to provide a user-centered evaluation of existing MLIA tools, and to offer the basis of a framework for Library & Information Science (LIS) professionals in teaching information literacy and library skills for bi/multilingual academic users.
In the first phase of the study, 250 bi/multilingual students participated in a web survey that investigated their language choices while searching for information on the internet and electronic databases. 31 of these participants took part in the second phase which involved a controlled lab-based user experiment and post experiment questionnaire that investigated their use of MLIA tools on Google and WorldCat and their opinions of these tools. In the third phase, 19 students participated in focus groups discussions and 6 librarians were interviewed to find out their perspectives on multilingual information literacy.
Results showed that though machine translation has alleviated some of the linguistic related challenges in online information searching, language barriers do still exist for some users especially at the query formulation stage. Captures from the experiment revealed great diversity in the way MLIA tools were utilized while the focus group discussions and interviews revealed a general lack of awareness by both librarians and students of the tools that could help enhance and promote multilingual information literacy.
The study highlights the roles of both IR system designers as well as LIS professionals in enhancing and promoting multilingual information access and literacy: User- centered design, user-modeling were found to be key aspects in the development of more effective multilingual information retrieval (MLIR) systems. The study also highlights the distinction between being multilingually information literate and being multilingual information literate. Suitable models for instruction for bi/multilingual academic users point towards Specialized Information Literacy Instruction (SILI) and Personalized Information Literacy Instruction (PILI)
Multilingual sentiment analysis in social media.
252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations
Multilingual sentiment analysis in social media.
252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations
IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems
Task-oriented dialogue (ToD) systems have been mostly created for
high-resource languages, such as English and Chinese. However, there is a need
to develop ToD systems for other regional or local languages to broaden their
ability to comprehend the dialogue contexts in various languages. This paper
introduces IndoToD, an end-to-end multi domain ToD benchmark in Indonesian. We
extend two English ToD datasets to Indonesian, comprising four different
domains by delexicalization to efficiently reduce the size of annotations. To
ensure a high-quality data collection, we hire native speakers to manually
translate the dialogues. Along with the original English datasets, these new
Indonesian datasets serve as an effective benchmark for evaluating Indonesian
and English ToD systems as well as exploring the potential benefits of
cross-lingual and bilingual transfer learning approaches.Comment: 2023 1st Workshop in South East Asian Language Processing (SEALP),
Co-located with AACL 202
NeMig -- A Bilingual News Collection and Knowledge Graph about Migration
News recommendation plays a critical role in shaping the public's worldviews
through the way in which it filters and disseminates information about
different topics. Given the crucial impact that media plays in opinion
formation, especially for sensitive topics, understanding the effects of
personalized recommendation beyond accuracy has become essential in today's
digital society. In this work, we present NeMig, a bilingual news collection on
the topic of migration, and corresponding rich user data. In comparison to
existing news recommendation datasets, which comprise a large variety of
monolingual news, NeMig covers articles on a single controversial topic,
published in both Germany and the US. We annotate the sentiment polarization of
the articles and the political leanings of the media outlets, in addition to
extracting subtopics and named entities disambiguated through Wikidata. These
features can be used to analyze the effects of algorithmic news curation beyond
accuracy-based performance, such as recommender biases and the creation of
filter bubbles. We construct domain-specific knowledge graphs from the news
text and metadata, thus encoding knowledge-level connections between articles.
Importantly, while existing datasets include only click behavior, we collect
user socio-demographic and political information in addition to explicit click
feedback. We demonstrate the utility of NeMig through experiments on the tasks
of news recommenders benchmarking, analysis of biases in recommenders, and news
trends analysis. NeMig aims to provide a useful resource for the news
recommendation community and to foster interdisciplinary research into the
multidimensional effects of algorithmic news curation.Comment: Accepted at the 11th International Workshop on News Recommendation
and Analytics (INRA 2023) in conjunction with ACM RecSys 202
- …