10,856 research outputs found
Improving Scientific Article Visibility by Neural Title Simplification
The rapidly growing amount of data that scientific content providers should
deliver to a user makes them create effective recommendation tools. A title of
an article is often the only shown element to attract people's attention. We
offer an approach to automatic generating titles with various levels of
informativeness to benefit from different categories of users. Statistics from
ResearchGate used to bias train datasets and specially designed post-processing
step applied to neural sequence-to-sequence models allow reaching the desired
variety of simplified titles to gain a trade-off between the attractiveness and
transparency of recommendation.Comment: Contribution to the Proceedings of the 8th International Workshop on
Bibliometric-enhanced Information Retrieval (BIR 2019) as part of the 41th
European Conference on Information Retrieval (ECIR 2019), Cologne, Germany,
April 14, 2019. CEUR Workshop Proceedings, CEUR-WS.org 2019. Keywords:
Scientific Text Summarization, Machine Translation, Recommender Systems,
Personalized Simplificatio
Building semantic user profile for Polish web news portal
We present our research at Onet, the largest Polish news portal, aimed at constructing meaningful user profiles that are most descriptive of their interests in the context of the media content they browse. We used two distinct state-of-the-art numerical text-representation techniques: LDA topic modeling and Word2Vec word embeddings. We trained our models on the corpora of articles in Polish and compare them with a baseline model built on a general language corpora.We compared the performance of algorithms on two distinct tasks - similar articles retrieval and users gender classification. Our results show that the choice of text representation depends on the task - Word2Vec is more suitable for text comparison, especially for short texts such as titles. In the user profiling task, the best performance was obtained with a combination of features: topics from the article text and word embeddings from the title
Personalized News Recommendation: A Survey
Personalized news recommendation is an important technique to help users find
their interested news information and alleviate their information overload. It
has been extensively studied over decades and has achieved notable success in
improving users' news reading experience. However, there are still many
unsolved problems and challenges that need to be further studied. To help
researchers master the advances in personalized news recommendation over the
past years, in this paper we present a comprehensive overview of personalized
news recommendation. Instead of following the conventional taxonomy of news
recommendation methods, in this paper we propose a novel perspective to
understand personalized news recommendation based on its core problems and the
associated techniques and challenges. We first review the techniques for
tackling each core problem in a personalized news recommender system and the
challenges they face. Next, we introduce the public datasets and evaluation
methods for personalized news recommendation. We then discuss the key points on
improving the responsibility of personalized news recommender systems. Finally,
we raise several research directions that are worth investigating in the
future. This paper can provide up-to-date and comprehensive views to help
readers understand the personalized news recommendation field. We hope this
paper can facilitate research on personalized news recommendation and as well
as related fields in natural language processing and data mining
Combating Fake News: A Survey on Identification and Mitigation Techniques
The proliferation of fake news on social media has opened up new directions
of research for timely identification and containment of fake news, and
mitigation of its widespread impact on public opinion. While much of the
earlier research was focused on identification of fake news based on its
contents or by exploiting users' engagements with the news on social media,
there has been a rising interest in proactive intervention strategies to
counter the spread of misinformation and its impact on society. In this survey,
we describe the modern-day problem of fake news and, in particular, highlight
the technical challenges associated with it. We discuss existing methods and
techniques applicable to both identification and mitigation, with a focus on
the significant advances in each method and their advantages and limitations.
In addition, research has often been limited by the quality of existing
datasets and their specific application contexts. To alleviate this problem, we
comprehensively compile and summarize characteristic features of available
datasets. Furthermore, we outline new directions of research to facilitate
future development of effective and interdisciplinary solutions
A Topic-Agnostic Approach for Identifying Fake News Pages
Fake news and misinformation have been increasingly used to manipulate
popular opinion and influence political processes. To better understand fake
news, how they are propagated, and how to counter their effect, it is necessary
to first identify them. Recently, approaches have been proposed to
automatically classify articles as fake based on their content. An important
challenge for these approaches comes from the dynamic nature of news: as new
political events are covered, topics and discourse constantly change and thus,
a classifier trained using content from articles published at a given time is
likely to become ineffective in the future. To address this challenge, we
propose a topic-agnostic (TAG) classification strategy that uses linguistic and
web-markup features to identify fake news pages. We report experimental results
using multiple data sets which show that our approach attains high accuracy in
the identification of fake news, even as topics evolve over time.Comment: Accepted for publication in the Companion Proceedings of the 2019
World Wide Web Conference (WWW'19 Companion). Presented in the 2019
International Workshop on Misinformation, Computational Fact-Checking and
Credible Web (MisinfoWorkshop2019). 6 page
Dirichlet belief networks for topic structure learning
Recently, considerable research effort has been devoted to developing deep
architectures for topic models to learn topic structures. Although several deep
models have been proposed to learn better topic proportions of documents, how
to leverage the benefits of deep structures for learning word distributions of
topics has not yet been rigorously studied. Here we propose a new multi-layer
generative process on word distributions of topics, where each layer consists
of a set of topics and each topic is drawn from a mixture of the topics of the
layer above. As the topics in all layers can be directly interpreted by words,
the proposed model is able to discover interpretable topic hierarchies. As a
self-contained module, our model can be flexibly adapted to different kinds of
topic models to improve their modelling accuracy and interpretability.
Extensive experiments on text corpora demonstrate the advantages of the
proposed model.Comment: accepted in NIPS 201
Linking Tweets with Monolingual and Cross-Lingual News using Transformed Word Embeddings
Social media platforms have grown into an important medium to spread
information about an event published by the traditional media, such as news
articles. Grouping such diverse sources of information that discuss the same
topic in varied perspectives provide new insights. But the gap in word usage
between informal social media content such as tweets and diligently written
content (e.g. news articles) make such assembling difficult. In this paper, we
propose a transformation framework to bridge the word usage gap between tweets
and online news articles across languages by leveraging their word embeddings.
Using our framework, word embeddings extracted from tweets and news articles
are aligned closer to each other across languages, thus facilitating the
identification of similarity between news articles and tweets. Experimental
results show a notable improvement over baselines for monolingual tweets and
news articles comparison, while new findings are reported for cross-lingual
comparison.Comment: Presented at CICLing 2017 (18th International Conference on
Intelligent Text Processing and Computational Linguistics). To appear in
International Journal of Computational Linguistics and Applications (IJLCA
Grammar practice : theory and practice
Fil: Luque Colombres, María Candelaria. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Meehan, Patricia. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Oliva, María Belén. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Rius, Natalia. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: de Maussion, Ana. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Neyra, Vanina Pamela. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Our main objective when writing this handbook has been to design some kind of material that
would provide the first-year university student at Facultad de Lenguas with the basic foundations of
English grammar. Although this handout could be used as a self-study grammar guide, the student
should bear in mind it is meant to be used as a complement of class work. Therefore, the material
included in the present publication has not been organized according to the level of difficulty, but
rather in accordance with the syllabus of the subject. Each chapter brings along graded exercises
which have been carefully designed to improve and consolidate the grammar topics included in the
syllabus of the subject. Finally, we would like to point out that to round off each unit, we have
decided to include texts (often authentic ones) in an attempt to offer the student a new perspective
on the subject: one which relates grammatical structure systematically to meaning and use.Fil: Luque Colombres, María Candelaria. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Meehan, Patricia. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Oliva, María Belén. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Rius, Natalia. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: de Maussion, Ana. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Neyra, Vanina Pamela. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina
FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media
Social media has become a popular means for people to consume news.
Meanwhile, it also enables the wide dissemination of fake news, i.e., news with
intentionally false information, which brings significant negative effects to
the society. Thus, fake news detection is attracting increasing attention.
However, fake news detection is a non-trivial task, which requires multi-source
information such as news content, social context, and dynamic information.
First, fake news is written to fool people, which makes it difficult to detect
fake news simply based on news contents. In addition to news contents, we need
to explore social contexts such as user engagements and social behaviors. For
example, a credible user's comment that "this is a fake news" is a strong
signal for detecting fake news. Second, dynamic information such as how fake
news and true news propagate and how users' opinions toward news pieces are
very important for extracting useful patterns for (early) fake news detection
and intervention. Thus, comprehensive datasets which contain news content,
social context, and dynamic information could facilitate fake news propagation,
detection, and mitigation; while to the best of our knowledge, existing
datasets only contains one or two aspects. Therefore, in this paper, to
facilitate fake news related researches, we provide a fake news data repository
FakeNewsNet, which contains two comprehensive datasets that includes news
content, social context, and dynamic information. We present a comprehensive
description of datasets collection, demonstrate an exploratory analysis of this
data repository from different perspectives, and discuss the benefits of
FakeNewsNet for potential applications on fake news study on social media.Comment: 11 pages; the dataset structure and API function are update
- …