Search CORE

761 research outputs found

Libraries and Museums in the Flat World: Are They Becoming Virtual Destinations?

Author: Tonta Yaşar
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

In his recent book, “TheWorld is Flat”, Thomas L. Friedman reviews the impact of networks on globalization. The emergence of the Internet, web browsers, computer applications talking to each other through the Internet, and the open source software, among others, made the world flatter and created an opportunity for individuals to collaborate and compete globally. Friedman predicts that “connecting all the knowledge centers on the planet together into a single global network…could usher in an amazing era of prosperity and innovation”. Networking also is changing the ways by which libraries and museums provide access to information sources and services. In the flat world, libraries and museums are no longer a physical “place” only: they are becoming “virtual destinations”. This paper discusses the implications of this transformation for the digitization and preservation of, and access to, cultural heritage resources

Hacettepe University Institutional Repository

E-LIS

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

JISC Preservation of Web Resources (PoWR) Handbook

Author: Ashley Kevin
Davis Richard M.
Guy Marieke
Kelly Brian
Pinsent Edward
Publication venue
Publication date: 01/11/2008
Field of study

Handbook of Web Preservation produced by the JISC-PoWR project which ran from April to November 2008. The handbook specifically addresses digital preservation issues that are relevant to the UK HE/FE web management community”. The project was undertaken jointly by UKOLN at the University of Bath and ULCC Digital Archives department

Scraping SERPs for Archival Seeds: It Matters When You Start

Author: Brunelle Justin F
Farag Mohamed MG
Klein Martin
Risse Thomas
Schneider Steven M
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/05/2018
Field of study

Event-based collections are often started with a web search, but the search results you find on Day 1 may not be the same as those you find on Day 7. In this paper, we consider collections that originate from extracting URIs (Uniform Resource Identifiers) from Search Engine Result Pages (SERPs). Specifically, we seek to provide insight about the retrievability of URIs of news stories found on Google, and to answer two main questions: first, can one "refind" the same URI of a news story (for the same query) from Google after a given time? Second, what is the probability of finding a story on Google over a given period of time? To answer these questions, we issued seven queries to Google every day for over seven months (2017-05-25 to 2018-01-12) and collected links from the first five SERPs to generate seven collections for each query. The queries represent public interest stories: "healthcare bill," "manchester bombing," "london terrorism," "trump russia," "travel ban," "hurricane harvey," and "hurricane irma." We tracked each URI in all collections over time to estimate the discoverability of URIs from the first five SERPs. Our results showed that the daily average rate at which stories were replaced on the default Google SERP ranged from 0.21 -0.54, and a weekly rate of 0.39 - 0.79, suggesting the fast replacement of older stories by newer stories. The probability of finding the same URI of a news story after one day from the initial appearance on the SERP ranged from 0.34 - 0.44. After a week, the probability of finding the same news stories diminishes rapidly to 0.01 - 0.11. Our findings suggest that due to the difficulty in retrieving the URIs of news stories from Google, collection building that originates from search engines should begin as soon as possible in order to capture the first stages of events, and should persist in order to capture the evolution of the events...Comment: This is an extended version of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2018) full paper: https://doi.org/10.1145/3197026.3197056. Some of the figure numbers have change

arXiv.org e-Print Archive

Crossref

Researcher Engagement with Web Archives: Challenges and Opportunities for Investment

Author: Dougherty Meghan
Madsen Christine
Meyer Eric T
Thomas Arthur
van den Heuvel Charles
Wyatt Sally
Publication venue: Loyola eCommons
Publication date: 01/01/2010
Field of study

Maastricht University Research Portal

Loyola eCommons

Data Scraping as a Cause of Action: Limiting Use of the CFAA and Trespass in Online Copying Cases

Author: Riley Kathleen C.
Publication venue: FLASH: The Fordham Law Archive of Scholarship and History
Publication date: 01/01/2019
Field of study

In recent years, online platforms have used claims such as the Computer Fraud and Abuse Act (“CFAA”) and trespass to curb data scraping, or copying of web content accomplished using robots or web crawlers. However, as the term “data scraping” implies, the content typically copied is data or information that is not protected by intellectual property law, and the means by which the copying occurs is not considered to be hacking. Trespass and the CFAA are both concerned with authorization, but in data scraping cases, these torts are used in such a way that implies that real property norms exist on the Internet, a misleading and harmful analogy. To correct this imbalance, the CFAA must be interpreted in its native context, that of computers, computer networks, and the Internet, and given contextual meaning. Alternatively, the CFAA should be amended. Because data scraping is fundamentally copying, copyright offers the correct means for litigating data scraping cases. This Note additionally offers proposals for creating enforceable terms of service online and for strengthening copyright to make it applicable to user-based online platforms

bepress Legal Repository

Fordham University School of Law

CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

Author: Boujemaa Nozha
Compañó Ramón
Dosch Christoph
Geurts Joost
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Sebe Nicu
Publication venue: Chorus Project Consortium
Publication date: 01/01/2007
Field of study

Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Online Search And Society: Could Your Best Friend Be Your Worst Enemy?

Author: Noonan Rachel
Publication venue
Publication date: 01/12/2017
Field of study

Online search is becoming the main source individuals use to find information about sports, politics, health, religion, world issues, and other subjects that shape our views on the world and how we live our lives. Of all internet users, 92% use online search and are doing so on desktop and mobile, with an average of 129 searches a month per person. Search is designed to keep users engaged and serviced with speed and brevity. As search engine usage increases around the world and impact on behaviours becomes more of a concern, we must understand how might the design of search engine algorithms be affecting society’s ability to shape the way we see the world. Is commerce compromising community in user experience and design? Are we unknowingly being sent into echo chambers with predictive and personalized search algorithms. Is the fast and wide internet actually narrowing the doors of perception we have been walking through online for the last 30 years? It is the right time for through exploratory research to better understand the current and potential future impacts and implications of search on society and citizens. I will employ a literature review, first party participant research and document a chronology of knowledge discovery and capture in context to searching, sharing and storing of information, along with a horizon scanning exercise with a focus on trends research. The first-party human-based research will involve the segmentation of Digital Natives and Digital Immigrants to explore whether there are patterns emerging within distinct age groups. These methods will be deployed and findings will be analyzed to ascertain what the issues might be and whether people understand the complexities, powers, and abilities of search engines

OCAD University Open Research Repository

Recommended from our members

The Corpus Expansion Toolkit: finding what we want on the web

Author: Pay Jack Frederick
Publication venue
Publication date: 13/08/2020
Field of study

This thesis presents the Corpus Expansion Toolkit (CET), a generally applicable toolkit that allows researchers to build domain-specific corpora from the web. The main purpose of the work presented in this thesis and the development of the CET is to provide a solution to discovering desired content on the web from possibly unknown locations or a poorly defined domain. Using an iterative process, the CET is able to solve the problem of discovering domain-specific online content and expand a corpus using only a very small number of example documents or characteristic phrases taken from the target domain. Using a human-in-the-loop strategy and a chain of discrete software components the CET also allows the concept of a domain to be iteratively defined using the very online resources used to expand the original corpus. The CET combines feature extraction, search, web crawling and machine learning methods to collected, store, filter and perform information extraction on collected documents. Using a small number of example ‘seed’ documents the CET is able to expand the original corpus by finding more relevant documents from the web and provide a number of tools to support their analysis. This thesis presents a case study-based methodology that introduces the various contributions and components of the CET through the discussion of five case studies covering a wide variety of domains and requirements that the CET has been applied. These case studies hope to illustrate three main use cases, listed below, where the CET is applicable: 1. Domain known – source known 2. Domain known – source unknown 3. Domain unknown – source unknown First, use cases where the sites for document collection are known and the topic of research is clearly defined. Second, instances where the topic of research is clearly defined but where to find relevant documents on the web is unknown. Third, the most extreme use case, where the domain is poorly defined or unknown to the researcher and the location of the information is also unknown. This thesis presents a solution that allows researchers to begin with very little information on a specific topic and iteratively build a clear conception of a domain and translate that to a computational system

Sussex Research Online