Search CORE

3,348 research outputs found

Preserving Social Media: the Problem of Access

Author: Kilbride William
Thomson Sara Day
Publication venue
Publication date: 03/07/2015
Field of study

As the applications and services made possible through Web 2.0 continue to proliferate and influence the way individuals exchange information, the landscape of social science research, as well as research in the humanities and the arts, has the potential to change dramatically and to be enriched by a wealth of new, user-generated data. In response to this phenomenon, the UK Data Service have commissioned the Digital Preservation Coalition to undertake a 12-month study into the preservation of social media as part of the ‘Big Data Network’ programme funded by the Economic and Social Research Council (ESRC). The larger study focuses on the potential uses and accompanying challenges of data generated by social networking applications. This paper, ‘Preserving Social Media: the Problem of Access’, comprises an excerpt of that longer study, allowing the authors a space to explore in closer detail the issue of making social media archives accessible to researchers and students now and in the future. To do this, the paper addresses use cases that demonstrate the potential value of social media to academic social science. Furthermore, it examines how researchers and collecting institutions acquire and preserve social media data within a context of curatorial and legislative restrictions that may prove an even greater obstacle to access than any technical restrictions. Based on analysis of these obstacles, it will examine existing methods of curating and preserving social media archives, and second, make some recommendations for how collecting institutions might approach the long-term preservation of social media in a way that protects the individuals represented in the data and complies with the conditions of third party platforms. With the understanding that web-based communication technologies will continue to evolve, this paper will focus on the overarching properties of social media, analysing and comparing current methods of curation and preservation that provide sustainable solutions

Enlighten

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling

Author: Diligenti M.
Mohr G.
Psallidas F.
Risse T.
Tannier X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/12/2016
Field of study

Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers. In this paper we address the issues of enabling the collection of fresh and relevant Web and Social Web content for a topic of interest through seamless integration of Web and Social Media in a novel integrated focused crawler. The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.Comment: Published in the Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries 201

arXiv.org e-Print Archive

Crossref

Describing Web Archives: A Computer-Assisted Approach

Author: Wiedeman Gregory
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 09/12/2019
Field of study

Currently, web archives are challenging for users to discover and use. Many archives and libraries are actively collecting web archives, but description in this area has been dominated by bibliographic approaches, which do not connect web archives to existing description or contextual information, and have often resulted in format-based silos. This is primarily because web archiving tools such as Archive-It arrange materials by seeds and groups of seeds, which reflect the complex technical process of web crawling or web recording, and are often not very meaningful to users or helpful for discovery. This article makes the case for arranging and describing web archives in meaningful aggregates according to established standards—showing how archival practices allow archivists to arrange the diversity of web content according to their common forms and functions while empowering them to be creative with their time and thoughtful with their labor. It provides a path to exposing important provenance information to users and demonstrates an existing proof of concept. Finally, it outlines a possible integration between ArchivesSpace and Archive-It that is feasible to implement for many archives and would automate the repetitive parts of creating and updating description for new web crawls

Yale University

Methods of collecting facebook data and their effects on later analysis

Author: Brügger Niels
Laursen Ditte
Sandvik Kjetil
Publication venue
Publication date: 01/01/2013
Field of study

Copenhagen University Research Information System

Unruly Records: Personal Archives, Sociotechnical Infrastructure, and Archival Practice

Author: Gunn Chelsea
Publication venue
Publication date: 26/08/2020
Field of study

Personal records have long occupied a complicated space within archival theory and practice. The archival profession, as it is practiced in the United States today, developed with organizational records, such as those created by governments and businesses, in mind. Personal records were considered to fall beyond the bounds of archival work and were primarily cared for by libraries and other cultural heritage institutions. Since the mid-20th century, this divide has become less pronounced, and it has become common to find personal records within archival institutions. As a result of these conditions in the development of the profession, the archivists who work with personal records have had to reconcile the specific characteristics of personal materials with theoretical and practical approaches that were designed not only to accommodate organizational records but to explicitly exclude personal records. These conditions have been further complicated by the continually changing technological landscape in which personal records are now created. As ownership of personal computers, access to the World Wide Web, and the use of networked social platforms have grown, personal records have increasingly come to be created, stored, and accessed within complex socio-technical systems. The infrastructures that support personal digital record creation today precipitate new methods and strategies, and an abundance of new questions, for the archivists who are responsible for collecting and preserving digital cultural heritage. This dissertation considers how both the history of excluding personal records in the archival profession and the socio-technical systems that support contemporary personal record creation impact archival practice today. This research considers archival approaches to working with personal records created within three environments: personal computers, the open web, and networked social platforms. Ultimately, this dissertation seeks to reevaluate the role that personal records have previously occupied, and to center the personal in archival practice today

D-Scholarship@Pitt

Web archives: the future

Author: Arthur Thomas
Eric T. Meyer
Ralph Schroeder
Publication venue: International Internet Preservation Consortium (IIPC)
Publication date
Field of study

T his report is structured first, to engage in some speculative thought about the possible futures of the web as an exercise in prom pting us to think about what we need to do now in order to make sure that we can reliably and fruitfully use archives of the w eb in the future. Next, we turn to considering the methods and tools being used to research the live web, as a pointer to the types of things that can be developed to help unde rstand the archived web. Then , we turn to a series of topics and questions that researchers want or may want to address using the archived web. In this final section, we i dentify some of the challenges individuals, organizations, and international bodies can target to increase our ability to explore these topi cs and answer these quest ions. We end the report with some conclusions based on what we have learned from this exercise

Analysis and Policy Observatory (APO)

Towards a sustainable social media archiving strategy for Belgium:BESOCIAL : WP1 report : an international review of Social Media Archiving initiatives

Author: Birkholz Julie
Chambers Sally
Geeraert Friedel
Lieber Sven
Mechant Peter
Messens Fien
Michel Alejandra
Pranger Jessica
Vlassenroot Evelyne
Publication venue: s.n.
Publication date: 01/01/2021
Field of study

Repository of the University of Namur

Accessing Web Archives: Integrating an Archive-It Collection into EBSCO Discovery Service

Author: Beis Christina A.
Harris Kayla
Shreffler Stephanie
Publication venue: eCommons
Publication date: 08/06/2019
Field of study

Effective collaboration between archives and technical services can increase the discoverability of special collection materials. Archivists at the University of Dayton Libraries began using Archive-It to capture websites relevant to their collecting policies in 2015. However, the collections were only made available to users from the University of Dayton page on the Archive-It website. Content was isolated in a separate platform and was not promoted to users. Working together, the team of archivists and technical services librarians incorporated the web archive collections into the Libraries’ EBSCO Discovery Service (EDS) discovery layer. A local data dictionary was created based on OCLC’s Descriptive Metadata for Web Archiving report (2018), and metadata was added at the seed and collection levels. The result was indexed content on a single, user-friendly platform. The web archive collections were then marketed to the University of Dayton community, and statistics were generated on their use

University of Dayton