94 research outputs found

    Automated fact-checking: A survey

    Get PDF
    As online false information continues to grow, automated fact-checking has gained an increasing amount of attention in recent years. Researchers in the field of Natural Language Processing (NLP) have contributed to the task by building fact-checking datasets, devising automated fact-checking pipelines and proposing NLP methods to further research in the development of different components. This article reviews relevant research on automated fact-checking covering both the claim detection and claim validation components

    Mining complex trees for hidden fruit : a graph–based computational solution to detect latent criminal networks : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Technology at Massey University, Albany, New Zealand.

    Get PDF
    The detection of crime is a complex and difficult endeavour. Public and private organisations – focusing on law enforcement, intelligence, and compliance – commonly apply the rational isolated actor approach premised on observability and materiality. This is manifested largely as conducting entity-level risk management sourcing ‘leads’ from reactive covert human intelligence sources and/or proactive sources by applying simple rules-based models. Focusing on discrete observable and material actors simply ignores that criminal activity exists within a complex system deriving its fundamental structural fabric from the complex interactions between actors - with those most unobservable likely to be both criminally proficient and influential. The graph-based computational solution developed to detect latent criminal networks is a response to the inadequacy of the rational isolated actor approach that ignores the connectedness and complexity of criminality. The core computational solution, written in the R language, consists of novel entity resolution, link discovery, and knowledge discovery technology. Entity resolution enables the fusion of multiple datasets with high accuracy (mean F-measure of 0.986 versus competitors 0.872), generating a graph-based expressive view of the problem. Link discovery is comprised of link prediction and link inference, enabling the high-performance detection (accuracy of ~0.8 versus relevant published models ~0.45) of unobserved relationships such as identity fraud. Knowledge discovery uses the fused graph generated and applies the “GraphExtract” algorithm to create a set of subgraphs representing latent functional criminal groups, and a mesoscopic graph representing how this set of criminal groups are interconnected. Latent knowledge is generated from a range of metrics including the “Super-broker” metric and attitude prediction. The computational solution has been evaluated on a range of datasets that mimic an applied setting, demonstrating a scalable (tested on ~18 million node graphs) and performant (~33 hours runtime on a non-distributed platform) solution that successfully detects relevant latent functional criminal groups in around 90% of cases sampled and enables the contextual understanding of the broader criminal system through the mesoscopic graph and associated metadata. The augmented data assets generated provide a multi-perspective systems view of criminal activity that enable advanced informed decision making across the microscopic mesoscopic macroscopic spectrum

    A Comprehensive Collection and Analysis Model for the Drone Forensics Field

    Get PDF
    Unmanned aerial vehicles (UAVs) are adaptable and rapid mobile boards that can be applied to several purposes, especially in smart cities. These involve traffic observation, environmental monitoring, and public safety. The need to realize effective drone forensic processes has mainly been reinforced by drone-based evidence. Drone-based evidence collection and preservation entails accumulating and collecting digital evidence from the drone of the victim for subsequent analysis and presentation. Digital evidence must, however, be collected and analyzed in a forensically sound manner using the appropriate collection and analysis methodologies and tools to preserve the integrity of the evidence. For this purpose, various collection and analysis models have been proposed for drone forensics based on the existing literature; several models are inclined towards specific scenarios and drone systems. As a result, the literature lacks a suitable and standardized drone-based collection and analysis model devoid of commonalities, which can solve future problems that may arise in the drone forensics field. Therefore, this paper has three contributions: (a) studies the machine learning existing in the literature in the context of handling drone data to discover criminal actions, (b) highlights the existing forensic models proposed for drone forensics, and (c) proposes a novel comprehensive collection and analysis forensic model (CCAFM) applicable to the drone forensics field using the design science research approach. The proposed CCAFM consists of three main processes: (1) acquisition and preservation, (2) reconstruction and analysis, and (3) post-investigation process. CCAFM contextually leverages the initially proposed models herein incorporated in this study. CCAFM allows digital forensic investigators to collect, protect, rebuild, and examine volatile and nonvolatile items from the suspected drone based on scientific forensic techniques. Therefore, it enables sharing of knowledge on drone forensic investigation among practitioners working in the forensics domain

    Imaging biomarkers extraction and classification for Prion disease

    Get PDF
    Prion diseases are a group of rare neurodegenerative conditions characterised by a high rate of progression and highly heterogeneous phenotypes. Whilst the most common form of prion disease occurs sporadically (sporadic Creutzfeldt-Jakob disease, sCJD), other forms are caused by inheritance of prion protein gene mutations or exposure to prions. To date, there are no accurate imaging biomarkers that can be used to predict the future diagnosis of a subject or to quantify the progression of symptoms over time. Besides, CJD is commonly mistaken for other forms of dementia. Due to the large heterogeneity of phenotypes of prion disease and the lack of a consistent spatial pattern of disease progression, the approaches used to study other types of neurodegenerative diseases are not satisfactory to capture the progression of the human form of prion disease. Using a tailored framework, I extracted quantitative imaging biomarkers for characterisation of patients with Prion diseases. Following the extraction of patient-specific imaging biomarkers from multiple images, I implemented a Gaussian Process approach to correlated symptoms with disease types and stages. The model was used on three different tasks: diagnosis, differential diagnosis and stratification, addressing an unmet need to automatically identify patients with or at risk of developing Prion disease. The work presented in this thesis has been extensively validated in a unique Prion disease cohort, comprising both the inherited and sporadic forms of the disease. The model has shown to be effective in the prediction of this illness. Furthermore, this approach may have used in other disorders with heterogeneous imaging features, being an added value for the understanding of neurodegenerative diseases. Lastly, given the rarity of this disease, I also addressed the issue of missing data and the limitations raised by it. Overall, this work presents progress towards modelling of Prion diseases and which computational methodologies are potentially suitable for its characterisation

    A Deep Multi-View Learning Framework for City Event Extraction from Twitter Data Streams

    Get PDF
    Cities have been a thriving place for citizens over the centuries due to their complex infrastructure. The emergence of the Cyber-Physical-Social Systems (CPSS) and context-aware technologies boost a growing interest in analysing, extracting and eventually understanding city events which subsequently can be utilised to leverage the citizen observations of their cities. In this paper, we investigate the feasibility of using Twitter textual streams for extracting city events. We propose a hierarchical multi-view deep learning approach to contextualise citizen observations of various city systems and services. Our goal has been to build a flexible architecture that can learn representations useful for tasks, thus avoiding excessive task-specific feature engineering. We apply our approach on a real-world dataset consisting of event reports and tweets of over four months from San Francisco Bay Area dataset and additional datasets collected from London. The results of our evaluations show that our proposed solution outperforms the existing models and can be used for extracting city related events with an averaged accuracy of 81% over all classes. To further evaluate the impact of our Twitter event extraction model, we have used two sources of authorised reports through collecting road traffic disruptions data from Transport for London API, and parsing the Time Out London website for sociocultural events. The analysis showed that 49.5% of the Twitter traffic comments are reported approximately five hours prior to the authorities official records. Moreover, we discovered that amongst the scheduled sociocultural event topics; tweets reporting transportation, cultural and social events are 31.75% more likely to influence the distribution of the Twitter comments than sport, weather and crime topics

    Automated Detection of Anomalous Patterns in Validation Scores for Protein X-Ray Structure Models

    Get PDF
    Structural bioinformatics is a subdomain of data mining focused on identifying structural patterns relevant to functional attributes in repositories of biological macromolecular structure models. This research focused on structures determined via x-ray crystallography and deposited in the Protein Data Bank (PDB). Protein structures deposited in the PDB are products of experimental processes, and only approximately model physical reality. Structural biologists address accuracy and precision concerns via community-enforced consensus standards of accepted practice for proper building, refinement, and validation of models. Validation scores are quantitative partial indicators of the likelihood that a model contains serious systematic errors. The PDB recently convened a panel of experts, which placed renewed emphasis on troubling anomalies among deposited structure models. This study set out to detect such anomalies. I hypothesized that community consensus standards would be evident in patterns of validation scores, and deviations from those standards would appear as unusual combinations of validation scores. Validation attributes were extracted from PDB entry headers and multiple software tools (e.g., WhatCheck, SFCheck, and MolProbity). Independent component analysis (ICA) was used for attribute transformation to increase contrast between inliers and outliers. Unusual patterns were sought in regions of locally low density in the space of validation score profiles, using a novel standardization of Local Outlier Factor (LOF) scores. Validation score profiles associated with the most extreme outlier scores were demonstrably anomalous according to domain theory. Among these were documented fabrications, possible annotation errors, and complications in the underlying experimental data. Analysis of deep inliers revealed promising support for the hypothesized link between consensus standard practices and common validation score values. Unfortunately, with numerical anomaly detection methods that operate simultaneously on numerous continuous-valued attributes, it is often quite difficult to know why a case gets a particular outlier score. Therefore, I hypothesized that IF-THEN rules could be used to post-process outlier scores to make them comprehensible and explainable. Inductive rule extraction was performed using RIPPER. Results were mixed, but they represent a promising proof of concept. The methods explored are general and applicable beyond this problem. Indeed, they could be used to detect structural anomalies using physical attributes

    Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine

    Get PDF
    The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience

    Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine

    Get PDF
    The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience
    • 

    corecore