50,651 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content

    Get PDF
    The OpenMinTeD platform aims to bring full text Open Access scholarly content from a wide range of providers together with Text and Data Mining (TDM) tools from various Natural Language Processing frameworks and TDM developers in an integrated environment. In this way, it supports users who want to mine scientific literature with easy access to relevant content and allows running scalable TDM workflows in the cloud

    Do Social Bots Dream of Electric Sheep? A Categorisation of Social Media Bot Accounts

    Get PDF
    So-called 'social bots' have garnered a lot of attention lately. Previous research showed that they attempted to influence political events such as the Brexit referendum and the US presidential elections. It remains, however, somewhat unclear what exactly can be understood by the term 'social bot'. This paper addresses the need to better understand the intentions of bots on social media and to develop a shared understanding of how 'social' bots differ from other types of bots. We thus describe a systematic review of publications that researched bot accounts on social media. Based on the results of this literature review, we propose a scheme for categorising bot accounts on social media sites. Our scheme groups bot accounts by two dimensions - Imitation of human behaviour and Intent.Comment: Accepted for publication in the Proceedings of the Australasian Conference on Information Systems, 201

    Troping the Enemy: Metaphor, Culture, and the Big Data Black Boxes of National Security

    Get PDF
    This article considers how cultural understanding is being brought into the work of the Intelligence Advanced Research Projects Activity (IARPA), through an analysis of its Metaphor program. It examines the type of social science underwriting this program, unpacks implications of the agency’s conception of metaphor for understanding so-called cultures of interest, and compares IARPA’s to competing accounts of how metaphor works to create cultural meaning. The article highlights some risks posed by key deficits in the Intelligence Community\u27s (IC) approach to culture, which relies on the cognitive linguistic theories of George Lakoff and colleagues. It also explores the problem of the opacity of these risks for analysts, even as such predictive cultural analytics are becoming a part of intelligence forecasting. This article examines the problem of information secrecy in two ways, by unpacking the opacity of “black box,” algorithm-based social science of culture for end users with little appreciation of their potential biases, and by evaluating the IC\u27s nontransparent approach to foreign cultures, as it underwrites national security assessments
    corecore