10 research outputs found

    Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks

    Full text link
    Representation learning on networks aims to derive a meaningful vector representation for each node, thereby facilitating downstream tasks such as link prediction, node classification, and node clustering. In heterogeneous text-rich networks, this task is more challenging due to (1) presence or absence of text: Some nodes are associated with rich textual information, while others are not; (2) diversity of types: Nodes and edges of multiple types form a heterogeneous network structure. As pretrained language models (PLMs) have demonstrated their effectiveness in obtaining widely generalizable text representations, a substantial amount of effort has been made to incorporate PLMs into representation learning on text-rich networks. However, few of them can jointly consider heterogeneous structure (network) information as well as rich textual semantic information of each node effectively. In this paper, we propose Heterformer, a Heterogeneous Network-Empowered Transformer that performs contextualized text encoding and heterogeneous structure encoding in a unified model. Specifically, we inject heterogeneous structure information into each Transformer layer when encoding node texts. Meanwhile, Heterformer is capable of characterizing node/edge type heterogeneity and encoding nodes with or without texts. We conduct comprehensive experiments on three tasks (i.e., link prediction, node classification, and node clustering) on three large-scale datasets from different domains, where Heterformer outperforms competitive baselines significantly and consistently.Comment: KDD 2023. (Code: https://github.com/PeterGriffinJin/Heterformer

    Experimental IR Meets Multilinguality, Multimodality, and Interaction

    Full text link

    Power without responsibility: the content moderation policies of the online platforms in the response to infodemic

    Get PDF
    Rad sadržava pregled politika vodećih online-platformi u borbi protiv dezinformacija i infodemije kako bi se promislile njihove moguće izravne ili neizravne, kratkoročne ili dugoročne posljedice za medijski pluralizam i informirano građanstvo, uzimajući u obzir značenje platformi u oblikovanju informacijskog okruženja. Metoda istraživanja je analiza dokumenata koje su odabrane platforme objavile od početka koronakrize u veljači do 31. srpnja 2020. kako bi komunicirale promjene svojih politika moderacije sadržaja kao posljedicu te krize. Uključena su i izvješća, koja su u kolovozu 2020. dostavile Europskoj komisiji, o mjerama koje su primijenile kako bi ograničile širenje dezinformacija povezanih s bolešću COV ID-19. Analiza sugerira da su platforme nedvojbeno poduzele niz potencijalno korisnih mjera kako bi odgovorile na izazove širenja lažnih, obmanjujućih i štetnih informacija u pandemiji, ali gotovo svaka od tih mjera sadržava i potencijalan rizik za slobodu izražavanja i medijski pluralizam. Ti rizici ne proizlaze toliko iz samih inicijativa koliko iz nedostatka regulatornog okvira i neravnoteže između odgovornosti i moći koju platforme posjeduju.The paper provides an overview of the policies enrolled by the leading online platforms in the response to misinformation and infodemic around the COVID-19 pandemic. It further discusses the consequences these policies may have on media pluralism and informed citizenship both directly or indirectly, as well as in the short or long term. The method is an analysis of documents published by selected platforms to communicate changes in their content moderation policies in the period between the start of the novel coronavirus crisis in February and 31 July 2020, and it includes also the reports submitted to the European Commission in August 2020 to report on measures platforms have implemented to limit the spread of COVID-19-related misinformation. The analysis suggests that the platforms have undoubtedly taken a number of potentially useful measures to address the challenges of false, misleading and harmful information in a pandemic, but almost every one of these measures carries also a potential risk to freedom of expression and pluralism. These risks do not arise primarily from the initiatives themselves but rather from a lack of regulatory context and strong imbalance between accountability and the power that the platforms hold

    Diversité et recommandation : une investigation sur l’apport de la fouille d’opinions pour la distinction d’articles d’opinion dans une controverse médiatique

    Full text link
    Les plateformes de consultation d’articles de presse en format numérique comme Google Actualités et Yahoo! Actualités sont devenues de plus en plus populaires pour la recherche et la lecture de l’information journalistique en ligne. Dans le but d’aider les usagers à s’orienter parmi la multitude de sources d’information, ces plateformes intègrent à leurs moteurs de recherche des mécanismes de filtrage automatisés, connus comme systèmes de recommandation. Ceux-ci aident les usagers à retrouver des ressources informationnelles qui correspondent davantage à leurs intérêts et goûts personnels, en prenant comme base des comportements antérieurs, par exemple, l’historique de documents consultés. Cependant, ces systèmes peuvent nuire à la diversité d’idées et de perspectives politiques dans l’environnement informationnel qu’ils créent : la génération de résultats de recherche ou de recommandations excessivement spécialisées, surtout dans le contexte de la presse en ligne, pourrait cacher des idées qui sont importantes dans un débat. Quand l’environnement informationnel est insuffisamment divers, il y a un manque d’opportunité pour produire l’enquête ouverte, le dialogique et le désaccord constructif, ce qui peut résulter dans l’émergence d’opinions extrémistes et la dégradation générale du débat. Les travaux du domaine de l’intelligence artificielle qui tentent de répondre au problème de la diversité dans les systèmes de recommandation d’articles de presse sont confrontés par plusieurs questions, dont la représentation de textes numériques dans le modèle vectoriel à partir d’un ensemble de mots statistiquement discriminants dans ces textes, ainsi que le développement d’une mesure statistique capable de maximiser la différence entre des articles similaires qui sont retournés lors d’un processus de recommandation à un usager. Un courant de recherche propose des systèmes de recommandation basés sur des techniques de fouille d’opinions afin de détecter de manière automatique la différence d’opinions entre des articles de presse qui traitent d’un même thème lors du processus de recommandation. Dans cette approche, la représentation des textes numériques se fait par un ensemble de mots qui peuvent être associés, dans les textes, à l’expression d’opinions, comme les adjectifs et les émotions. Néanmoins, ces techniques s’avèrent moins efficaces pour détecter les différences entre les opinions relatives à un débat public argumenté, puisque l’expression de l’opinion dans les discussions politiques n’est pas nécessairement liée à l’expression de la subjectivité ou des émotions du journaliste. Notre recherche doctorale s’inscrit dans l’objectif de (1) systématiser et de valider une méthodologie de fouille d’opinions permettant d’assister l’identification d’opinions divergentes dans le cadre d’une controverse et (2) d’explorer l’applicabilité de cette méthodologie pour un système de recommandation d’articles de presse. Nous assimilons la controverse à un type de débat d’opinions dans la presse, dont la particularité est la formation de camps explicitement opposés quant à la façon de voir et de comprendre une question d’importance pour la collectivité. Notre recherche apporte des questionnements sur la définition d’opinion dans ce contexte précis et discute la pertinence d’exploiter les théories discursives et énonciatives dans les recherches de fouille d’opinions. Le corpus expérimental est composé par 495 articles d’opinion publiés dans la presse au sujet de la mobilisation étudiante du Québec en 2012 contre la hausse de droits de scolarité annoncée par le gouvernement de Jean Charest. Ils ont été classés dans deux catégories, ETUD et GOUV, en fonction du type d’opinion qu’ils véhiculent. Soit ils sont favorables aux étudiants et à la continuité de la grève soit favorables au gouvernement et critiques envers le mouvement de grève. Sur le plan méthodologique, notre recherche se base sur la démarche proposée par les travaux qui explorent des techniques du champ de la linguistique du corpus dans la fouille d’opinions, ainsi que les concepts de la sémantique interprétative de François Rastier. Elle systématise les étapes de cette démarche, en préconisant la description des textes du corpus, pour relever et interpréter les mots spécifiques qui contrastent les types d’opinions qui devront être classés. Ce travail permet de sélectionner des critères textuels interprétables et descriptifs des phénomènes énonciatifs étudiés dans le corpus qui serviront à représenter les textes numériques dans le format vectoriel. La démarche proposée par ces travaux a été validée avec l’utilisation du corpus de presse constitué pour l’expérimentation. Les résultats démontrent que la sélection de 447 critères textuels par une approche interprétative du corpus est plus performante pour la classification automatique des articles que le choix d’un ensemble de mots dont la sélection ne prend pas en compte de facteurs linguistiques liés au corpus. Notre recherche a également évalué la possibilité d’une application dans les systèmes de recommandation d’articles de presse, en faisant une étude sur l’évolution chronologique du vocabulaire du corpus de l’expérimentation. Nous démontrons que la sélection de critères textuels effectuée au début de la controverse est efficace pour prédire l’opinion des articles qui sont publiés par la suite, suggérant que la démarche de sélection de critères interprétables peut être mise au profit d’un système de recommandation qui propose des articles d’opinion issus d’une controverse médiatique.Web-based reading services such as Google News and Yahoo! News have become increasingly popular with the growth of online news consumption. To help users cope with information overload on these search engines, recommender systems and personalization techniques are utilized. These services help users find content that matches their personal interests and tastes, using their browser history and past behavior as a basis for recommendations. However, recommender systems can limit diversity of thought and the range of political perspectives that circulate within the informational environment. In consequence, relevant ideas and questions may not be seen, debatable assumptions may be taken as facts, and overspecialized recommendations may reinforce confirmation bias, special interests, tribalism, and extremist opinions. When the informational environment is insufficiently diverse, there is a loss of open inquiry, dialogue and constructive disagreement—and, as a result, an overall degradation of public discourse. Studies within the artificial intelligence field that try to solve the diversity problem for news recommender systems are confronted by many questions, including the vector model representation of digital texts and the development of a statistical measure that maximizes the difference between similar articles that are proposed to the user by the recommendation process. Studies based on opinion mining techniques propose to tackle the diversity problem in a different manner, by automatically detecting the difference of perspectives between news articles that are related by content in the recommendation process. In this latter approach, the representation of digital texts in the vector model considers a set of words that are associated with opinion expressions, such as adjectives or emotions. However, those techniques are less effective in detecting differences of opinion in a publicly argued debate, because journalistic opinions are not necessarily linked with the journalist’s subjectivity or emotions. The aims of our research are (1) to systematize and validate an opinion mining method that can classify divergent opinions within a controversial debate in the press and (2) to explore the applicability of this method in a news recommender system. We equate controversy to an opinion debate in the press where at least two camps are explicitly opposed in their understanding of a consequential question in their community. Our research raises questions about how to define opinion in this context and discusses the relevance of using discursive and enunciation theoretical approaches in opinion mining. The corpus of our experiment has 495 opinion articles about the 2012 student protest in Quebec against the raise of tuition fees announced by the Liberal Premier Minister Jean Charest. Articles were classified into two categories, ETUD and GOUV, representing the two types of opinions that dominated the debate: namely, those that favored the students and the continuation of the strike or those that favored the government and criticized the student movement. Methodologically, our research is based on the approach of previous studies that explore techniques from the corpus linguistics field in the context of opinion mining, as well as theoretical concepts of François Rastier’s Interpretative Semantics. Our research systematizes the steps of this approach, advocating for a contrastive and interpretative description of the corpus, with the aim of discovering linguistic features that better describe the types of opinion that are to be classified. This approach allows us to select textual features that are interpretable and compatible with the enunciative phenomena in the corpus that are then used to represent the digital texts in the vector model. The approach of previous works has been validated by our analysis of the corpus. The results show that the selection of 447 textual features by an interpretative approach of the corpus performs better for the automatic classification of the opinion articles than a selection process in which the set of words are not identified by linguistic factors. Our research also evaluated the possibility of applying this approach to the development of a news recommender system, by studying the chronological evolution of the vocabulary in the corpus. We show that the selection of features at the beginning of the controversy effectively predicts the opinion of the articles that are published later, suggesting that the selection of interpretable features can benefit the development of a news recommender system in a controversial debate

    A survey on challenges and methods in news recommendation

    No full text
    Control and Communication (INSTICC);Institute for Systems and Technologies of Information,10th International Conference on Web Information Systems and Technologies, WEBIST 2014 -- 3 April 2014 through 5 April 2014 -- Barcelona -- 105611Recommender systems are built to provide the most proper item or information within the huge amount of data on the internet without the manual effort of the users. As a specific application domain, news recommender systems aim to give the most relevant news article recommendations to users according to their personal interests and preferences. News recommendation have specific challenges when compared to the other domains. From the technical point of view there are many different methods to build a recommender system. Thus, while general methods are used in news recommendation, researchers also need some new methods to make proper news recommendations. In this paper we present the different approaches to news recommender systems and the challenges of news recommendation

    A Survey on Challenges and Methods in News Recommendation

    No full text
    corecore