47,137 research outputs found

    iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling

    Full text link
    Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers. In this paper we address the issues of enabling the collection of fresh and relevant Web and Social Web content for a topic of interest through seamless integration of Web and Social Media in a novel integrated focused crawler. The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.Comment: Published in the Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries 201

    A European research roadmap for optimizing societal impact of big data on environment and energy efficiency

    Full text link
    We present a roadmap to guide European research efforts towards a socially responsible big data economy that maximizes the positive impact of big data in environment and energy efficiency. The goal of the roadmap is to allow stakeholders and the big data community to identify and meet big data challenges, and to proceed with a shared understanding of the societal impact, positive and negative externalities, and concrete problems worth investigating. It builds upon a case study focused on the impact of big data practices in the context of Earth Observation that reveals both positive and negative effects in the areas of economy, society and ethics, legal frameworks and political issues. The roadmap identifies European technical and non-technical priorities in research and innovation to be addressed in the upcoming five years in order to deliver societal impact, develop skills and contribute to standardization.Comment: 6 pages, 2 figures, 1 tabl

    Engineering Crowdsourced Stream Processing Systems

    Full text link
    A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort

    Mining Social Media for Newsgathering: A Review

    Get PDF
    Social media is becoming an increasingly important data source for learning about breaking news and for following the latest developments of ongoing news. This is in part possible thanks to the existence of mobile devices, which allows anyone with access to the Internet to post updates from anywhere, leading in turn to a growing presence of citizen journalism. Consequently, social media has become a go-to resource for journalists during the process of newsgathering. Use of social media for newsgathering is however challenging, and suitable tools are needed in order to facilitate access to useful information for reporting. In this paper, we provide an overview of research in data mining and natural language processing for mining social media for newsgathering. We discuss five different areas that researchers have worked on to mitigate the challenges inherent to social media newsgathering: news discovery, curation of news, validation and verification of content, newsgathering dashboards, and other tasks. We outline the progress made so far in the field, summarise the current challenges as well as discuss future directions in the use of computational journalism to assist with social media newsgathering. This review is relevant to computer scientists researching news in social media as well as for interdisciplinary researchers interested in the intersection of computer science and journalism.Comment: Accepted for publication in Online Social Networks and Medi

    Scraping the Social? Issues in live social research

    Get PDF
    What makes scraping methodologically interesting for social and cultural research? This paper seeks to contribute to debates about digital social research by exploring how a ‘medium-specific’ technique for online data capture may be rendered analytically productive for social research. As a device that is currently being imported into social research, scraping has the capacity to re-structure social research, and this in at least two ways. Firstly, as a technique that is not native to social research, scraping risks to introduce ‘alien’ methodological assumptions into social research (such as an pre-occupation with freshness). Secondly, to scrape is to risk importing into our inquiry categories that are prevalent in the social practices enabled by the media: scraping makes available already formatted data for social research. Scraped data, and online social data more generally, tend to come with ‘external’ analytics already built-in. This circumstance is often approached as a ‘problem’ with online data capture, but we propose it may be turned into virtue, insofar as data formats that have currency in the areas under scrutiny may serve as a source of social data themselves. Scraping, we propose, makes it possible to render traffic between the object and process of social research analytically productive. It enables a form of ‘real-time’ social research, in which the formats and life cycles of online data may lend structure to the analytic objects and findings of social research. By way of a conclusion, we demonstrate this point in an exercise of online issue profiling, and more particularly, by relying on Twitter to profile the issue of ‘austerity’. Here we distinguish between two forms of real-time research, those dedicated to monitoring live content (which terms are current?) and those concerned with analysing the liveliness of issues (which topics are happening?)

    Chasing Sustainability on the Net : International research on 69 journalistic pure players and their business models

    Get PDF
    This report outlines how online-based journalistic startups have created their economical locker in the evolving media ecology. The research introduces the ways that startups have found sustainability in the markets of ten countries. The work is based on 69 case studies from Europe, USA and Japan. The case analysis shows that business models can be divided into two groups. The storytelling-oriented business models are still prevalent in our findings. These are the online journalistic outlets that produce original content – news and stories for audiences. But the other group, service-oriented business models, seems to be growing. This group consists of sites that don’t try to monetize the journalistic content as such but rather focus on carving out new functionality. The project was able to identify several revenue sources: advertising, paying for content, affiliate marketing, donations, selling data or services, organizing events, freelancing and training or selling merchandise. Where it was hard to evidence entirely new revenue sources, it was however possible to find new ways in which revenue sources have been combined or reconfigured. The report also offers practical advice for those who are planning to start their own journalistic site
    • …
    corecore