10,856 research outputs found

    Improving Scientific Article Visibility by Neural Title Simplification

    Full text link
    The rapidly growing amount of data that scientific content providers should deliver to a user makes them create effective recommendation tools. A title of an article is often the only shown element to attract people's attention. We offer an approach to automatic generating titles with various levels of informativeness to benefit from different categories of users. Statistics from ResearchGate used to bias train datasets and specially designed post-processing step applied to neural sequence-to-sequence models allow reaching the desired variety of simplified titles to gain a trade-off between the attractiveness and transparency of recommendation.Comment: Contribution to the Proceedings of the 8th International Workshop on Bibliometric-enhanced Information Retrieval (BIR 2019) as part of the 41th European Conference on Information Retrieval (ECIR 2019), Cologne, Germany, April 14, 2019. CEUR Workshop Proceedings, CEUR-WS.org 2019. Keywords: Scientific Text Summarization, Machine Translation, Recommender Systems, Personalized Simplificatio

    Building semantic user profile for Polish web news portal

    Get PDF
    We present our research at Onet, the largest Polish news portal, aimed at constructing meaningful user profiles that are most descriptive of their interests in the context of the media content they browse. We used two distinct state-of-the-art numerical text-representation techniques: LDA topic modeling and Word2Vec word embeddings. We trained our models on the corpora of articles in Polish and compare them with a baseline model built on a general language corpora.We compared the performance of algorithms on two distinct tasks - similar articles retrieval and users gender classification. Our results show that the choice of text representation depends on the task - Word2Vec is more suitable for text comparison, especially for short texts such as titles. In the user profiling task, the best performance was obtained with a combination of features: topics from the article text and word embeddings from the title

    Personalized News Recommendation: A Survey

    Full text link
    Personalized news recommendation is an important technique to help users find their interested news information and alleviate their information overload. It has been extensively studied over decades and has achieved notable success in improving users' news reading experience. However, there are still many unsolved problems and challenges that need to be further studied. To help researchers master the advances in personalized news recommendation over the past years, in this paper we present a comprehensive overview of personalized news recommendation. Instead of following the conventional taxonomy of news recommendation methods, in this paper we propose a novel perspective to understand personalized news recommendation based on its core problems and the associated techniques and challenges. We first review the techniques for tackling each core problem in a personalized news recommender system and the challenges they face. Next, we introduce the public datasets and evaluation methods for personalized news recommendation. We then discuss the key points on improving the responsibility of personalized news recommender systems. Finally, we raise several research directions that are worth investigating in the future. This paper can provide up-to-date and comprehensive views to help readers understand the personalized news recommendation field. We hope this paper can facilitate research on personalized news recommendation and as well as related fields in natural language processing and data mining

    Combating Fake News: A Survey on Identification and Mitigation Techniques

    Full text link
    The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news, and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users' engagements with the news on social media, there has been a rising interest in proactive intervention strategies to counter the spread of misinformation and its impact on society. In this survey, we describe the modern-day problem of fake news and, in particular, highlight the technical challenges associated with it. We discuss existing methods and techniques applicable to both identification and mitigation, with a focus on the significant advances in each method and their advantages and limitations. In addition, research has often been limited by the quality of existing datasets and their specific application contexts. To alleviate this problem, we comprehensively compile and summarize characteristic features of available datasets. Furthermore, we outline new directions of research to facilitate future development of effective and interdisciplinary solutions

    The Scottish corpus of texts and speech

    Get PDF

    A Topic-Agnostic Approach for Identifying Fake News Pages

    Full text link
    Fake news and misinformation have been increasingly used to manipulate popular opinion and influence political processes. To better understand fake news, how they are propagated, and how to counter their effect, it is necessary to first identify them. Recently, approaches have been proposed to automatically classify articles as fake based on their content. An important challenge for these approaches comes from the dynamic nature of news: as new political events are covered, topics and discourse constantly change and thus, a classifier trained using content from articles published at a given time is likely to become ineffective in the future. To address this challenge, we propose a topic-agnostic (TAG) classification strategy that uses linguistic and web-markup features to identify fake news pages. We report experimental results using multiple data sets which show that our approach attains high accuracy in the identification of fake news, even as topics evolve over time.Comment: Accepted for publication in the Companion Proceedings of the 2019 World Wide Web Conference (WWW'19 Companion). Presented in the 2019 International Workshop on Misinformation, Computational Fact-Checking and Credible Web (MisinfoWorkshop2019). 6 page

    Dirichlet belief networks for topic structure learning

    Full text link
    Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. Here we propose a new multi-layer generative process on word distributions of topics, where each layer consists of a set of topics and each topic is drawn from a mixture of the topics of the layer above. As the topics in all layers can be directly interpreted by words, the proposed model is able to discover interpretable topic hierarchies. As a self-contained module, our model can be flexibly adapted to different kinds of topic models to improve their modelling accuracy and interpretability. Extensive experiments on text corpora demonstrate the advantages of the proposed model.Comment: accepted in NIPS 201

    Linking Tweets with Monolingual and Cross-Lingual News using Transformed Word Embeddings

    Full text link
    Social media platforms have grown into an important medium to spread information about an event published by the traditional media, such as news articles. Grouping such diverse sources of information that discuss the same topic in varied perspectives provide new insights. But the gap in word usage between informal social media content such as tweets and diligently written content (e.g. news articles) make such assembling difficult. In this paper, we propose a transformation framework to bridge the word usage gap between tweets and online news articles across languages by leveraging their word embeddings. Using our framework, word embeddings extracted from tweets and news articles are aligned closer to each other across languages, thus facilitating the identification of similarity between news articles and tweets. Experimental results show a notable improvement over baselines for monolingual tweets and news articles comparison, while new findings are reported for cross-lingual comparison.Comment: Presented at CICLing 2017 (18th International Conference on Intelligent Text Processing and Computational Linguistics). To appear in International Journal of Computational Linguistics and Applications (IJLCA

    Grammar practice : theory and practice

    Get PDF
    Fil: Luque Colombres, María Candelaria. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Meehan, Patricia. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Oliva, María Belén. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Rius, Natalia. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: de Maussion, Ana. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Neyra, Vanina Pamela. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Our main objective when writing this handbook has been to design some kind of material that would provide the first-year university student at Facultad de Lenguas with the basic foundations of English grammar. Although this handout could be used as a self-study grammar guide, the student should bear in mind it is meant to be used as a complement of class work. Therefore, the material included in the present publication has not been organized according to the level of difficulty, but rather in accordance with the syllabus of the subject. Each chapter brings along graded exercises which have been carefully designed to improve and consolidate the grammar topics included in the syllabus of the subject. Finally, we would like to point out that to round off each unit, we have decided to include texts (often authentic ones) in an attempt to offer the student a new perspective on the subject: one which relates grammatical structure systematically to meaning and use.Fil: Luque Colombres, María Candelaria. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Meehan, Patricia. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Oliva, María Belén. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Rius, Natalia. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: de Maussion, Ana. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina.Fil: Neyra, Vanina Pamela. Universidad Nacional de Córdoba. Facultad de Lenguas; Argentina

    FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media

    Full text link
    Social media has become a popular means for people to consume news. Meanwhile, it also enables the wide dissemination of fake news, i.e., news with intentionally false information, which brings significant negative effects to the society. Thus, fake news detection is attracting increasing attention. However, fake news detection is a non-trivial task, which requires multi-source information such as news content, social context, and dynamic information. First, fake news is written to fool people, which makes it difficult to detect fake news simply based on news contents. In addition to news contents, we need to explore social contexts such as user engagements and social behaviors. For example, a credible user's comment that "this is a fake news" is a strong signal for detecting fake news. Second, dynamic information such as how fake news and true news propagate and how users' opinions toward news pieces are very important for extracting useful patterns for (early) fake news detection and intervention. Thus, comprehensive datasets which contain news content, social context, and dynamic information could facilitate fake news propagation, detection, and mitigation; while to the best of our knowledge, existing datasets only contains one or two aspects. Therefore, in this paper, to facilitate fake news related researches, we provide a fake news data repository FakeNewsNet, which contains two comprehensive datasets that includes news content, social context, and dynamic information. We present a comprehensive description of datasets collection, demonstrate an exploratory analysis of this data repository from different perspectives, and discuss the benefits of FakeNewsNet for potential applications on fake news study on social media.Comment: 11 pages; the dataset structure and API function are update
    corecore