624 research outputs found

    Relating Web pages to enable information-gathering tasks

    Full text link
    We argue that relationships between Web pages are functions of the user's intent. We identify a class of Web tasks - information-gathering - that can be facilitated by a search engine that provides links to pages which are related to the page the user is currently viewing. We define three kinds of intentional relationships that correspond to whether the user is a) seeking sources of information, b) reading pages which provide information, or c) surfing through pages as part of an extended information-gathering process. We show that these three relationships can be productively mined using a combination of textual and link information and provide three scoring mechanisms that correspond to them: {\em SeekRel}, {\em FactRel} and {\em SurfRel}. These scoring mechanisms incorporate both textual and link information. We build a set of capacitated subnetworks - each corresponding to a particular keyword - that mirror the interconnection structure of the World Wide Web. The scores are computed by computing flows on these subnetworks. The capacities of the links are derived from the {\em hub} and {\em authority} values of the nodes they connect, following the work of Kleinberg (1998) on assigning authority to pages in hyperlinked environments. We evaluated our scoring mechanism by running experiments on four data sets taken from the Web. We present user evaluations of the relevance of the top results returned by our scoring mechanisms and compare those to the top results returned by Google's Similar Pages feature, and the {\em Companion} algorithm proposed by Dean and Henzinger (1999).Comment: In Proceedings of ACM Hypertext 200

    Credibility analysis of textual claims with explainable evidence

    Get PDF
    Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources. We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.Das Web ist eine riesige Quelle wertvoller Informationen, allerdings wurde es durch die Verbreitung von Falschmeldungen verschmutzt. Eine zunehmende Anzahl an Hoaxes, Falschmeldungen und irreführenden Informationen im Internet haben viele Websites hervorgebracht, auf denen die Fakten überprüft und zweifelhafte Behauptungen manuell bewertet werden. Die rasante Verbreitung großer Mengen von Fehlinformationen sind jedoch zum Engpass für die manuelle Überprüfung geworden. Dies erfordert Tools zur Bewertung der Glaubwürdigkeit, mit denen dieser Überprüfungsprozess automatisiert werden kann. In früheren Arbeiten in diesem Bereich werden starke Annahmen gemacht über die Struktur der Behauptungen und die Portale, in denen sie gepostet werden. Vor allem aber können die Black-Box-Techniken, die in früheren Arbeiten vorgeschlagen wurden, nicht erklären, warum eine bestimmte Aussage als glaubwürdig erachtet wird oder nicht. Um diesen Einschränkungen zu begegnen, wird in dieser Dissertation ein allgemeines Framework für die automatisierte Bewertung der Glaubwürdigkeit vorgeschlagen, bei dem keine Annahmen über die Struktur oder den Ursprung der Behauptungen gemacht werden. Insbesondere schlagen wir ein featurebasiertes Modell vor, das automatisch relevante Artikel zu einer bestimmten Behauptung abruft und deren Glaubwürdigkeit bewertet, indem die gegenseitige Interaktion zwischen dem Sprachstil der relevanten Artikel, ihre Haltung zur Behauptung und der Vertrauenswürdigkeit der zugrunde liegenden Quellen erfasst wird. Wir verbessern unseren Ansatz zur Bewertung der Glaubwürdigkeit weiter und schlagen ein auf neuronalen Netzen basierendes Modell vor. Im Gegensatz zum featurebasierten Modell ist dieses Modell nicht auf Feature-Engineering und externe Lexika angewiesen. Unsere beiden Modelle machen ihre Einschätzungen interpretierbar, indem sie erklärbare Beweise aus sorgfältig ausgewählten Webquellen extrahieren. Wir verwenden unsere Modelle zur Entwicklung eines Webinterfaces, CredEye, mit dem Benutzer die Glaubwürdigkeit einer Behauptung in Textform automatisch bewerten und verstehen können, indem sie automatisch ausgewählte Beweisstücke einsehen. Darüber hinaus untersuchen wir das Problem der Positionsklassifizierung und schlagen ein auf neuronalen Netzen basierendes Modell vor, um die Position verschiedener Benutzerperspektiven in Bezug auf die umstrittenen Behauptungen vorherzusagen. Bei einer kontroversen Behauptung und einem Benutzerkommentar sagt unser Einstufungsmodell voraus, ob der Benutzerkommentar die Behauptung unterstützt oder ablehnt

    A Micro­Interaction Tool for Online Text Analysis

    Get PDF
    Mobile devices allow users to remain connected to the World in a ubiquitous way, creating new contexts of media use. Considering the structural changes in the journalistic market, media organizations are trying to lead this digital transition, (re)gaining the attention of the public [WS15]. This digital evolution can bring either many advantages or open the door to rushed journalism, such as the publication of fake news and malicious content, which can have critical effects on both individuals and society as a whole. For this reason, it’s becoming really important to fact­check the sources of information. Misinformation is incorrect or misleading information, which can lead to the distortion of people’s opinions on several matters and unintended consequences. Thus, fact­checking claims with reliable information from credible sources is perhaps the best way to fight the spread of misinformation. By double­checking a claim, you can verify whether or not it’s true. However, it’s important to use verifiable and reputable sources to fact­check that information, otherwise, you risk perpetuating the cycle [Ohi]. In order to help to fight this global issue, we can use the interaction from Internet users with the content producers/journalists, so those users can interact with Web content, validating, commenting, or expressing emotions about it to decrease the percentage of false, malicious or questionable content, as well as simultaneously create a profile of these same users and content producers, through the application of reputation rules. With this strategy, online content producers can get dynamic interaction and feedback from the public about the published content, so they can fact­check it and have a greater degree of truthfulness. This Master’s dissertation presents a Web tool that enables users to perform a fast factchecking, interacting with the media responsible for the news or text. This work, starts by presenting a study on the main tools and techniques that are being used in journalism for fact­check information. Then, it describes in detail the implementation process of the developed tool, that consists on a Web extension to help in this fact­checking domain. Finally, the dissertation presents an assessment and tests that were conducted to evaluate the feasibility of the solution.Os dispositivos móveis permitem que os utilizadores permaneçam conectados ao Mundo de forma ubíqua, criando novos contextos para o uso dos mídia. Diante as mudanças estruturais no mercado jornalístico, as organizações de mídia estão a tentar liderar esta transição digital, (re)ganhando a atenção do público [WS15]. Esta evolução digital pode trazer tanto muitas vantagens ou abrir a porta para o jornalismo apressado, como a publicação de notícias falsas e conteúdo malicioso, que pode ter efeitos críticos sobre os indivíduos e a sociedade como um todo. Por esse motivo, está a tornar­se cada vez mais importante verificar os factos das fontes de informação. A desinformação é informação incorreta ou enganosa, que pode levar à distorção das opiniões das pessoas sobre diversos assuntos e a consequências indesejadas. Portanto, a verificação de factos com informações de fontes confiáveis é talvez a melhor maneira de combater a disseminação de informações incorretas. É portanto muito importante utilizar fontes confiáveis para verificar os factos, caso contrário, corremos o risco de perpetuar o ciclo [Ohi]. Para ajudar a combater este problema global, podemos utilizar a interação dos utilizadores de Internet com os produtores/jornalistas de conteúdo, para que esses utilizadores possam interagir com o conteúdo da Web, validando, comentando ou expressando emoções sobre este, de forma a diminuir a percentagem de conteúdo falso, malicioso ou questionável, bem como simultaneamente criar um perfil desses mesmos utilizadores e produtores de conteúdo, através da aplicação de regras de reputação. Com esta estratégia, os produtores de conteúdo online podem obter uma interação dinâmica e feedback do público sobre o conteúdo publicado, para que possam verificar os factos e ter um maior grau de veracidade. Esta dissertação de mestrado apresenta uma ferramenta Web que permite aos utilizadores realizar uma verificação rápida de factos, interagindo com os mídia responsáveis por uma determinada notícia ou texto. Este trabalho começa por apresentar um estudo sobre as principais ferramentas e técnicas que estão a ser utilizadas no jornalismo para a verificação de factos. Em seguida, descreve detalhadamente o processo de implementação da ferramenta desenvolvida, que consiste numa extensão Web para auxiliar neste domínio de verificação de factos. Por fim, a dissertação apresenta alguns testes que foram realizados para avaliar a viabilidade da solução

    ASSESSING THE IMPACT PARTICIPATION IN SCIENCE JOURNALISM ACTIVITIES HAS ON SCIENTIFIC LITERACY AMONG HIGH SCHOOL STUDENTS

    Get PDF
    As part of the National Science Foundation Science Literacy through Science Journalism (SciJourn) initiative (http://www.scijourn.org; Polman, Saul, Newman, and Farrar, 2008) a quasi-experimental design was used to investigate what impact incorporating science journalism activities had on students’ scientific literacy. Over the course of a school year students participated in a variety of activities culminating in the production of science news articles for Scijourner (http://www.scijourner.org). Participating teachers and SciJourn team members collaboratively developed activities focused on five aspects of scientific literacy: contextualizing information, recognizing relevance, evaluating factual accuracy, use of multiple credible sources and information seeking processes. This study details the development process for the Scientific Literacy Assessment (SLA) including validity and reliability studies, evaluates student scientific literacy using the SLA, examines student SLA responses to provide a description of high school students’ scientific literacy, and outlines implications of the findings in relation to the National Research Council’s A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas (2012) and classroom science teaching practices. Scientifically literate adults acting as experts in the assessment development phase informed the creation of a scoring guide that was used to analyze student responses. The expert/novice comparison provides a rough description of a developmental continuum of scientific literacy. The SciJourn Scientific Literacy Assessment was used in a balanced crossover design to measure changes in student scientific literacy. The findings of this study including student results and Generalized Linear Mixed Modeling suggest that the incorporation of science journalism activities focused on STEM issues can improve student scientific literacy. Incorporation of a wide variety of strategies raised scores on the SLA. Teachers who included a writing and revision process that prioritized content had significantly larger gains in student scores. Future studies could broaden the description of high school student scientific literacy and measured by the SLA and provide alternative pathways for developing scientific literacy as envisioned by SciJourn and the NRC Frameworks

    Trust in Collaborative Web Applications

    Get PDF
    Collaborative functionality is increasingly prevalent in web applications. Such functionality permits individuals to add - and sometimes modify - web content, often with minimal barriers to entry. Ideally, large bodies of knowledge can be amassed and shared in this manner. However, such software also provide a medium for nefarious persons to operate. By determining the extent to which participating content/agents can be trusted, one can identify useful contributions. In this work, we define the notion of trust for Collaborative Web Applications and survey the state-of-the-art for calculating, interpreting, and presenting trust values. Though techniques can be applied broadly, Wikipedia\u27s archetypal nature makes it a focal point for discussion

    Opinion spam detection: using multi-iterative graph-based model

    Get PDF
    The demand to detect opinionated spam, using opinion mining applications to prevent their damaging effects on e-commerce reputations is on the rise in many business sectors globally. The existing spam detection techniques in use nowadays, only consider one or two types of spam entities such as review, reviewer, group of reviewers, and product. Besides, they use a limited number of features related to behaviour, content and the relation of entities which reduces the detection's accuracy. Accordingly, these techniques mostly exploit synthetic datasets to analyse their model and are not able to be applied in the context of the real-world environment. As such, a novel graph-based model called “Multi-iterative Graph-based opinion Spam Detection” (MGSD) in which all various types of entities are considered simultaneously within a unified structure is proposed. Using this approach, the model reveals both implicit (i.e., similar entity's) and explicit (i.e., different entities’) relationships. The MGSD model is able to evaluate the ‘spamicity’ effects of entities more efficiently given it applies a novel multi-iterative algorithm which considers different sets of factors to update the spamicity score of entities. To enhance the accuracy of the MGSD detection model, a higher number of existing weighted features along with the novel proposed features from different categories were selected using a combination of feature fusion techniques and machine learning (ML) algorithms. The MGSD model can also be generalised and applied in various opinionated documents due to employing domain independent features. The output of the MGSD model showed that our feature selection and feature fusion techniques showed a remarkable improvement in detecting spam. The findings of this study showed that MGSD could improve the accuracy of state-of-the-art ML and graph-based techniques by around 5.6% and 4.8%, respectively, also achieving an accuracy of 93% for the detection of spam detection in our synthetic crowdsourced dataset and 95.3% for Ott's crowdsourced dataset

    Calculating and Presenting Trust in Collaborative Content

    Get PDF
    Collaborative functionality is increasingly prevalent in Internet applications. Such functionality permits individuals to add -- and sometimes modify -- web content, often with minimal barriers to entry. Ideally, large bodies of knowledge can be amassed and shared in this manner. However, such software also provides a medium for biased individuals, spammers, and nefarious persons to operate. By computing trust/reputation for participating agents and/or the content they generate, one can identify quality contributions. In this work, we survey the state-of-the-art for calculating trust in collaborative content. In particular, we examine four proposals from literature based on: (1) content persistence, (2) natural-language processing, (3) metadata properties, and (4) incoming link quantity. Though each technique can be applied broadly, Wikipedia provides a focal point for discussion. Finally, having critiqued how trust values are calculated, we analyze how the presentation of these values can benefit end-users and application security
    corecore