Search CORE

212,877 research outputs found

Automatic information search for countering covid-19 misinformation through semantic similarity

Author: Huertas García Álvaro
Publication venue
Publication date: 01/02/2021
Field of study

Trabajo Fin de Máster en Bioinformática y Biología ComputacionalInformation quality in social media is an increasingly important issue and misinformation problem has become even more critical in the current COVID-19 pandemic, leading people exposed to false and potentially harmful claims and rumours. Civil society organizations, such as the World Health Organization, have demanded a global call for action to promote access to health information and mitigate harm from health misinformation. Consequently, this project pursues countering the spread of COVID-19 infodemic and its potential health hazards. In this work, we give an overall view of models and methods that have been employed in the NLP field from its foundations to the latest state-of-the-art approaches. Focusing on deep learning methods, we propose applying multilingual Transformer models based on siamese networks, also called bi-encoders, combined with ensemble and PCA dimensionality reduction techniques. The goal is to counter COVID-19 misinformation by analyzing the semantic similarity between a claim and tweets from a collection gathered from official fact-checkers verified by the International Fact-Checking Network of the Poynter Institute. It is factual that the number of Internet users increases every year and the language spoken determines access to information online. For this reason, we give a special effort in the application of multilingual models to tackle misinformation across the globe. Regarding semantic similarity, we firstly evaluate these multilingual ensemble models and improve the result in the STS-Benchmark compared to monolingual and single models. Secondly, we enhance the interpretability of the models’ performance through the SentEval toolkit. Lastly, we compare these models’ performance against biomedical models in TREC-COVID task round 1 using the BM25 Okapi ranking method as the baseline. Moreover, we are interested in understanding the ins and outs of misinformation. For that purpose, we extend interpretability using machine learning and deep learning approaches for sentiment analysis and topic modelling. Finally, we developed a dashboard to ease visualization of the results. In our view, the results obtained in this project constitute an excellent initial step toward incorporating multilingualism and will assist researchers and people in countering COVID-19 misinformation

Biblos-e Archivo

ENHANCING LITERATURE REVIEW METHODS - TOWARDS MORE EFFICIENT LITERATURE RESEARCH WITH LATENT SEMANTIC INDEXING

Author: Breitner Michael
Gleue Christoph
Koukal André
Publication venue: AIS Electronic Library (AISeL)
Publication date: 07/06/2014
Field of study

Nowadays, the facilitated access to increasing amounts of information and scientific resources means that more and more effort is required to conduct comprehensive literature reviews. Literature search, as a fundamental, complex, and time-consuming step in every literature research process, is part of many established scientific methods. However, it is still predominantly supported by search techniqus based on conventional term-matching methods. We address the lack of semantic approaches in this context by proposing an enhancement of established literature review methods. For this purpose, we followed design science research (DSR) principles in order to develop artifacts and implement a prototype of our Tool for Semantic Indexing and Similarity Quries (TSISQ) based on the core concepts of latent semantic indexing (LSI). Its applicability is demonstrated and evaluated in a case study. Results indicate that the presented approach can help save valuable time in finding basic literature in a desired research field or increasing the comprehensiveness of a review by efficiently identifying sources that otherwise would not have been taken into account. The target audience for our findings includes researchers who need to efficiently gain an overview of a specific research field, deepen their knowledge or refine the theoretical foundations of their research

AIS Electronic Library (AISeL)

Knowledge modelling with the open source tool myCBR

Author: Althoff Klaus-Dieter
Bach Kerstin
Roth-Berghofer Thomas
Sauer Christian
Publication venue: CEUR Workshop Proceedings
Publication date: 19/08/2014
Field of study

Building knowledge intensive Case-Based Reasoning applications requires tools that support this on-going process between domain experts and knowledge engineers. In this paper we will introduce how the open source tool myCBR 3 allows for flexible knowledge elicitation and formalisation form CBR and non CBR experts. We detail on myCBR 3 's versatile approach to similarity modelling and will give an overview of the Knowledge Engineering workbench, providing the tools for the modelling process. We underline our presentation with three case studies of knowledge modelling for technical diagnosis and recommendation systems using myCBR 3

UWL Repository

Recommended from our members

A quantum geometric model of similarity

Author: Busemeyer J. R.
Pothos E. M.
Trueblood J. S.
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2013
Field of study

No other study has had as great an impact on the development of the similarity literature as that of Tversky (1977), which provided compelling demonstrations against all the fundamental assumptions of the popular, and extensively employed, geometric similarity models. Notably, similarity judgments were shown to violate symmetry and the triangle inequality, and also be subject to context effects, so that the same pair of items would be rated differently, depending on the presence of other items. Quantum theory provides a generalized geometric approach to similarity and can address several of Tversky’s (1997) main findings. Similarity is modeled as quantum probability, so that asymmetries emerge as order effects, and the triangle equality violations and the diagnosticity effect can be related to the context-dependent properties of quantum probability. We so demonstrate the promise of the quantum approach for similarity and discuss the implications for representation theory in general

City Research Online

Crossref

An Introduction to Random Indexing

Author: Sahlgren Magnus
Publication venue
Publication date: 01/01/2005
Field of study

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Semantic Source Code Models Using Identifier Embeddings

Author: Efstathiou Vasiliki
Spinellis Diomidis
Publication venue
Publication date: 15/04/2019
Field of study

The emergence of online open source repositories in the recent years has led to an explosion in the volume of openly available source code, coupled with metadata that relate to a variety of software development activities. As an effect, in line with recent advances in machine learning research, software maintenance activities are switching from symbolic formal methods to data-driven methods. In this context, the rich semantics hidden in source code identifiers provide opportunities for building semantic representations of code which can assist tasks of code search and reuse. To this end, we deliver in the form of pretrained vector space models, distributed code representations for six popular programming languages, namely, Java, Python, PHP, C, C++, and C#. The models are produced using fastText, a state-of-the-art library for learning word representations. Each model is trained on data from a single programming language; the code mined for producing all models amounts to over 13.000 repositories. We indicate dissimilarities between natural language and source code, as well as variations in coding conventions in between the different programming languages we processed. We describe how these heterogeneities guided the data preprocessing decisions we took and the selection of the training parameters in the released models. Finally, we propose potential applications of the models and discuss limitations of the models.Comment: 16th International Conference on Mining Software Repositories (MSR 2019): Data Showcase Trac

arXiv.org e-Print Archive

Crossref