50,651 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content
The OpenMinTeD platform aims to bring full text Open Access scholarly content from a wide range of providers together with Text and Data Mining (TDM) tools from various Natural Language Processing frameworks and TDM developers in an integrated environment. In this way, it supports users who want to mine scientific literature with easy access to relevant content and allows running scalable TDM workflows in the cloud
Do Social Bots Dream of Electric Sheep? A Categorisation of Social Media Bot Accounts
So-called 'social bots' have garnered a lot of attention lately. Previous
research showed that they attempted to influence political events such as the
Brexit referendum and the US presidential elections. It remains, however,
somewhat unclear what exactly can be understood by the term 'social bot'. This
paper addresses the need to better understand the intentions of bots on social
media and to develop a shared understanding of how 'social' bots differ from
other types of bots. We thus describe a systematic review of publications that
researched bot accounts on social media. Based on the results of this
literature review, we propose a scheme for categorising bot accounts on social
media sites. Our scheme groups bot accounts by two dimensions - Imitation of
human behaviour and Intent.Comment: Accepted for publication in the Proceedings of the Australasian
Conference on Information Systems, 201
Troping the Enemy: Metaphor, Culture, and the Big Data Black Boxes of National Security
This article considers how cultural understanding is being brought into the work of the Intelligence Advanced Research Projects Activity (IARPA), through an analysis of its Metaphor program. It examines the type of social science underwriting this program, unpacks implications of the agency’s conception of metaphor for understanding so-called cultures of interest, and compares IARPA’s to competing accounts of how metaphor works to create cultural meaning. The article highlights some risks posed by key deficits in the Intelligence Community\u27s (IC) approach to culture, which relies on the cognitive linguistic theories of George Lakoff and colleagues. It also explores the problem of the opacity of these risks for analysts, even as such predictive cultural analytics are becoming a part of intelligence forecasting. This article examines the problem of information secrecy in two ways, by unpacking the opacity of “black box,” algorithm-based social science of culture for end users with little appreciation of their potential biases, and by evaluating the IC\u27s nontransparent approach to foreign cultures, as it underwrites national security assessments
- …