6 research outputs found

    Credibility analysis of textual claims with explainable evidence

    Get PDF
    Despite being a vast resource of valuable information, the Web has been polluted by the spread of false claims. Increasing hoaxes, fake news, and misleading information on the Web have given rise to many fact-checking websites that manually assess these doubtful claims. However, the rapid speed and large scale of misinformation spread have become the bottleneck for manual verification. This calls for credibility assessment tools that can automate this verification process. Prior works in this domain make strong assumptions about the structure of the claims and the communities where they are made. Most importantly, black-box techniques proposed in prior works lack the ability to explain why a certain statement is deemed credible or not. To address these limitations, this dissertation proposes a general framework for automated credibility assessment that does not make any assumption about the structure or origin of the claims. Specifically, we propose a feature-based model, which automatically retrieves relevant articles about the given claim and assesses its credibility by capturing the mutual interaction between the language style of the relevant articles, their stance towards the claim, and the trustworthiness of the underlying web sources. We further enhance our credibility assessment approach and propose a neural-network-based model. Unlike the feature-based model, this model does not rely on feature engineering and external lexicons. Both our models make their assessments interpretable by extracting explainable evidence from judiciously selected web sources. We utilize our models and develop a Web interface, CredEye, which enables users to automatically assess the credibility of a textual claim and dissect into the assessment by browsing through judiciously and automatically selected evidence snippets. In addition, we study the problem of stance classification and propose a neural-network-based model for predicting the stance of diverse user perspectives regarding the controversial claims. Given a controversial claim and a user comment, our stance classification model predicts whether the user comment is supporting or opposing the claim.Das Web ist eine riesige Quelle wertvoller Informationen, allerdings wurde es durch die Verbreitung von Falschmeldungen verschmutzt. Eine zunehmende Anzahl an Hoaxes, Falschmeldungen und irreführenden Informationen im Internet haben viele Websites hervorgebracht, auf denen die Fakten überprüft und zweifelhafte Behauptungen manuell bewertet werden. Die rasante Verbreitung großer Mengen von Fehlinformationen sind jedoch zum Engpass für die manuelle Überprüfung geworden. Dies erfordert Tools zur Bewertung der Glaubwürdigkeit, mit denen dieser Überprüfungsprozess automatisiert werden kann. In früheren Arbeiten in diesem Bereich werden starke Annahmen gemacht über die Struktur der Behauptungen und die Portale, in denen sie gepostet werden. Vor allem aber können die Black-Box-Techniken, die in früheren Arbeiten vorgeschlagen wurden, nicht erklären, warum eine bestimmte Aussage als glaubwürdig erachtet wird oder nicht. Um diesen Einschränkungen zu begegnen, wird in dieser Dissertation ein allgemeines Framework für die automatisierte Bewertung der Glaubwürdigkeit vorgeschlagen, bei dem keine Annahmen über die Struktur oder den Ursprung der Behauptungen gemacht werden. Insbesondere schlagen wir ein featurebasiertes Modell vor, das automatisch relevante Artikel zu einer bestimmten Behauptung abruft und deren Glaubwürdigkeit bewertet, indem die gegenseitige Interaktion zwischen dem Sprachstil der relevanten Artikel, ihre Haltung zur Behauptung und der Vertrauenswürdigkeit der zugrunde liegenden Quellen erfasst wird. Wir verbessern unseren Ansatz zur Bewertung der Glaubwürdigkeit weiter und schlagen ein auf neuronalen Netzen basierendes Modell vor. Im Gegensatz zum featurebasierten Modell ist dieses Modell nicht auf Feature-Engineering und externe Lexika angewiesen. Unsere beiden Modelle machen ihre Einschätzungen interpretierbar, indem sie erklärbare Beweise aus sorgfältig ausgewählten Webquellen extrahieren. Wir verwenden unsere Modelle zur Entwicklung eines Webinterfaces, CredEye, mit dem Benutzer die Glaubwürdigkeit einer Behauptung in Textform automatisch bewerten und verstehen können, indem sie automatisch ausgewählte Beweisstücke einsehen. Darüber hinaus untersuchen wir das Problem der Positionsklassifizierung und schlagen ein auf neuronalen Netzen basierendes Modell vor, um die Position verschiedener Benutzerperspektiven in Bezug auf die umstrittenen Behauptungen vorherzusagen. Bei einer kontroversen Behauptung und einem Benutzerkommentar sagt unser Einstufungsmodell voraus, ob der Benutzerkommentar die Behauptung unterstützt oder ablehnt

    Identification of Informativeness in Text using Natural Language Stylometry

    Get PDF
    In this age of information overload, one experiences a rapidly growing over-abundance of written text. To assist with handling this bounty, this plethora of texts is now widely used to develop and optimize statistical natural language processing (NLP) systems. Surprisingly, the use of more fragments of text to train these statistical NLP systems may not necessarily lead to improved performance. We hypothesize that those fragments that help the most with training are those that contain the desired information. Therefore, determining informativeness in text has become a central issue in our view of NLP. Recent developments in this field have spawned a number of solutions to identify informativeness in text. Nevertheless, a shortfall of most of these solutions is their dependency on the genre and domain of the text. In addition, most of them are not efficient regardless of the natural language processing problem areas. Therefore, we attempt to provide a more general solution to this NLP problem. This thesis takes a different approach to this problem by considering the underlying theme of a linguistic theory known as the Code Quantity Principle. This theory suggests that humans codify information in text so that readers can retrieve this information more efficiently. During the codification process, humans usually change elements of their writing ranging from characters to sentences. Examples of such elements are the use of simple words, complex words, function words, content words, syllables, and so on. This theory suggests that these elements have reasonable discriminating strength and can play a key role in distinguishing informativeness in natural language text. In another vein, Stylometry is a modern method to analyze literary style and deals largely with the aforementioned elements of writing. With this as background, we model text using a set of stylometric attributes to characterize variations in writing style present in it. We explore their effectiveness to determine informativeness in text. To the best of our knowledge, this is the first use of stylometric attributes to determine informativeness in statistical NLP. In doing so, we use texts of different genres, viz., scientific papers, technical reports, emails and newspaper articles, that are selected from assorted domains like agriculture, physics, and biomedical science. The variety of NLP systems that have benefitted from incorporating these stylometric attributes somewhere in their computational realm dealing with this set of multifarious texts suggests that these attributes can be regarded as an effective solution to identify informativeness in text. In addition to the variety of text genres and domains, the potential of stylometric attributes is also explored in some NLP application areas---including biomedical relation mining, automatic keyphrase indexing, spam classification, and text summarization---where performance improvement is both important and challenging. The success of the attributes in all these areas further highlights their usefulness

    Revista Mediterránea de Comunicación. Vol. 11, n. 2 (2020)

    Get PDF

    Fuzzy-based machine learning for predicting narcissistic traits among Twitter users.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Social media has provided a platform for people to share views and opinions they identify with or which are significant to them. Similarly, social media enables individuals to express themselves authentically and divulge their personal experiences in a variety of ways. This behaviour, in turn, reflects the user’s personality. Social media has in recent times been used to perpetuate various forms of crimes, and a narcissistic personality trait has been linked to violent criminal activities. This negative side effect of social media calls for multiple ways to respond and prevent damage instigated. Eysenck's theory on personality and crime postulated that various forms of crime are caused by a mixture of environmental and neurological causes. This theory suggests certain people are more likely to commit a crime, and personality is the principal factor in criminal behaviour. Twitter is a widely used social media platform for sharing news, opinions, feelings, and emotions by users. Given that narcissists have an inflated self-view and engage in a variety of strategies aimed at bringing attention to themselves, features unique to Twitter are more appealing to narcissists than those on sites such as Facebook. This study adopted design science research methodology to develop a fuzzy-based machine learning predictive model to identify traces of narcissism from Twitter using data obtained from the activities of a user. Performance evaluation of various classifiers was conducted and an optimal classifier with 95% accuracy was obtained. The research found that the size of the dataset and input variables have an influence on classifier accuracy. In addition, the research developed an updated process model and recommended a research model for narcissism classification

    Towards Consumer 4.0 Insights and Opportunities under the Marketing 4.0 Scenario

    Get PDF
    This Research Topic is a sequel to our previous Research Topic “From Consumer Experience to Affective Loyalty: Challenges and Prospects in the Psychology of Consumer Behavior 3.0”. This first article collection was devoted to analyze the changes that appeared in different industries and companies, fostered by factors mainly related to the development of technologies. The evolution from consumer 3.0 to consumer 4.0 represents an opportunity to feature the changes that have been occurring lately as well as to gain an insight into the future of consumer behavior. Nowadays, the markets are experiencing several transformations in consumer behavior. These changes have been fueled by several trends: processes of globalization that produced an extraordinary assortment of diverse products and brand alternatives, new business models based on the intensive use of technology advances in communication and mobile technologies that allow customers’ capacity to easily participating in co-creation processes with companies; and big data developments. In this scenario, customers acquired more power than ever before due to their availability of information required to choose among the better priced alternatives product-brand options, as well as the technological means to access to such alternatives. Thus, customers evolved from a position to simply receiving the offer proposed by companies, to a position of power where they had the last word in the decision process, that is, the position of consumer 3.0. These consumers were characterized by their ability to adopt and use new technologies to meet their individual needs. What is more, these types of consumers did not longer easily respond to traditional mass marketing techniques. Instead, this generation of consumers demanded a highly customized approach across all facets of businesses including new product development, communication and customer service, among others. Nevertheless, in the advent of Marketing 4.0, a new type of consumer is observed, namely the customer 4.0. The transition from consumer 3.0 to consumer 4.0 is becoming evident, not only in consumers’ behavior but also in companies’ behavior. Related to the first one, consumers 4.0 are hyper-connected through different technologies, including not only the well-known mobile or digital technologies, but also other type of technologies, such as IoT, nanotech or artificial intelligence. Hence, their behavior is characterized by the demand of technology that have integrated the facets of Marketing 4.0 such as geolocation, marketing virtual and augmented reality facets. Regarding the second one, companies should face a digital transformation affecting not only value areas, but also, the way business interact with the environment. In particular, companies need to incorporate systems and applications that allow them to collect and analyze information, while helping decision making, since in the long run these issues constitute the cornerstone on which to start building a successful marketing strategy 4.0. This Research Topic welcomes scientific papers that covers the following topics (but not limited exclusively): - Consumers’ 4.0. behavior in different countries, industries, products, brands, etc.; - Digital transformations of industries and companies due to new consumption patterns; - New devices launched by companies work to meet the demands of consumer 4.0 (e.g., IoT), as well as the use consumers make of such devices; - The latest technology trends in business areas that make easier the consumer-companies relationships (processing, communication or any other digital technologies)

    Detecting deceptive behaviour in the wild:text mining for online child protection in the presence of noisy and adversarial social media communications

    Get PDF
    A real-life application of text mining research “in the wild”, i.e. in online social media, differs from more general applications in that its defining characteristics are both domain and process dependent. This gives rise to a number of challenges of which contemporary research has only scratched the surface. More specifically, a text mining approach applied in the wild typically has no control over the dataset size. Hence, the system has to be robust towards limited data availability, a variable number of samples across users and a highly skewed dataset. Additionally, the quality of the data cannot be guaranteed. As a result, the approach needs to be tolerant to a certain degree of linguistic noise. Finally, it has to be robust towards deceptive behaviour or adversaries. This thesis examines the viability of a text mining approach for supporting cybercrime investigations pertaining to online child protection. The main contributions of this dissertation are as follows. A systematic study of different aspects of methodological design of a state-ofthe- art text mining approach is presented to assess its scalability towards a large, imbalanced and linguistically noisy social media dataset. In this framework, three key automatic text categorisation tasks are examined, namely the feasibility to (i) identify a social network user’s age group and gender based on textual information found in only one single message; (ii) aggregate predictions on the message level to the user level without neglecting potential clues of deception and detect false user profiles on social networks and (iii) identify child sexual abuse media among thousands of legal other media, including adult pornography, based on their filename. Finally, a novel approach is presented that combines age group predictions with advanced text clustering techniques and unsupervised learning to identify online child sex offenders’ grooming behaviour. The methodology presented in this thesis was extensively discussed with law enforcement to assess its forensic readiness. Additionally, each component was evaluated on actual child sex offender data. Despite the challenging characteristics of these text types, the results show high degrees of accuracy for false profile detection, identifying grooming behaviour and child sexual abuse media identification
    corecore