Search CORE

392,417 research outputs found

How to Search the Internet Archive Without Indexing It

Author: Kanhabua Nattiya
Kemkes Philipp
Nejdl Wolfgang
Nguyen Tu Ngoc
Reis Felipe
Tran Nam Khanh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Significant parts of cultural heritage are produced on the web during the last decades. While easy accessibility to the current web is a good baseline, optimal access to the past web faces several challenges. This includes dealing with large-scale web archive collections and lacking of usage logs that contain implicit human feedback most relevant for today's web search. In this paper, we propose an entity-oriented search system to support retrieval and analytics on the Internet Archive. We use Bing to retrieve a ranked list of results from the current web. In addition, we link retrieved results to the WayBack Machine; thus allowing keyword search on the Internet Archive without processing and indexing its raw archived content. Our search system complements existing web archive search tools through a user-friendly interface, which comes close to the functionalities of modern web search engines (e.g., keyword search, query auto-completion and related query suggestion), and provides a great benefit of taking user feedback on the current web into account also for web archive search. Through extensive experiments, we conduct quantitative and qualitative analyses in order to provide insights that enable further research on and practical applications of web archives

arXiv.org e-Print Archive

VBN

Digital and programmable economy applications: A smart cities congestion case by fuzzy sets

Author: Arroyo Cañada Francisco Javier
Herraiz Faixó Ferran
Lauroba Pérez Ana M.
López-Jurado González María Pilar
Publication venue: 'IOS Press'
Publication date: 09/03/2022
Field of study

Currently, cities are facing great challenges such as the population growing, citizen wellbeing, externalities management or environmental deterioration. The search for solutions are making significant inroads into the incorporation of ICT in them and subsequent large-scale digitalization such as programmable economy (PE) applications, offering the possibility to develop new approaches over these issues, in particular which related to sustainability management. Operating under a fuzzy numbers methodology and FIS (Fuzzy Inference System), the present exploratory work shows a new approach to city urban congestion management by deploying PE applications, which include some disruptive inputs such as the Internet of value, blockchain/DLT (distributed ledger, technology), smarts contracts, digital assets and the monetization, all of this combined with the human motivation

Diposit Digital de la Universitat de Barcelona

Secondary Liability and the Fragmentation of Digital Copyright Law

Author: Lipton Jacqueline D.
Publication venue: IdeaExchange@UAkron
Publication date: 25/03/2016
Field of study

The digital age brought many challenges for copyright law. While offering enticing new formats for the production and dissemination of copyright content, it also raised the specter of large scale digital piracy. Since the end of the 20th century, content industries have reeled to keep up with technological developments that offer significant promise as well as threats of large scale piracy. There has always been some tension between promoting innovation in content creation and promoting innovation in technologies that enable the enjoyment of copyright works, such as photocopiers, audio tape recorders, video tape recorders, and peer-to-peer file sharing systems. The manufacturers and distributors of these technologies have had to tread a fine line in their marketing and distribution efforts to avoid liability for secondary copyright infringement based on direct infringements by their customers. To this list of technologies, we may now add Internet search engines and online payment systems. This paper considers ways in which copyright law has addressed the secondary liability question in an increasingly digital marketplace. It suggests that the realities of this marketplace necessitate a new look at broader policy issues underlying digital copyright law in order to meaningfully address questions of secondary liability online

The University of Akron

A scalable Peer-to-Peer System for Music Content and Information Retrieval

Author: George Tzanetakis
Jun Gao
Peter Steenkiste
Publication venue: Johns Hopkins University
Publication date: 01/01/2003
Field of study

Currently a large percentage of internet traffice consists of music files, typically stored in MP3 compressed audio format, shared and exchanged over Peer-to-Peer (P2P) networks. Searching for music is performed by specifying keywords and naive string matching techniques. In the past years the emerging research area of Music Information Retrieval (MIR) has produced a variety of new ways of looking at the problem of music search. Such MIR techniques can significantly enhance the ways users search for music over P2P networks. In order for that to happen there are two main challenges that need to be addressed: 1) scalability to large collections and number of peers 2) richer set of search semantics that can support MIR especially when retrieval is content-based. In this paper, we describe a scalable P2P system that uses Rendezvouz Points (RPs) for music metadata registration and query resolution, that supports atribute-value search semantics as well as content-based retrieval. The performance of the system has been evaluated in large scale usage scenarios using "real" automatically calculated musical content descriptors

CiteSeerX

JScholarship

When Things Matter: A Data-Centric View of the Internet of Things

Author: Dustdar Schahram
Falkner Nickolas J. G.
Qin Yongrui
Sheng Quan Z.
Vasilakos Athanasios V.
Wang Hua
Publication venue
Publication date: 01/01/2014
Field of study

With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

arXiv.org e-Print Archive

Victoria University Eprints Repository

CHORUS Deliverable 4.5: Report of the 3rd CHORUS Conference

Author: Boujemaa Nozha
Compañó Ramón
Dosch Christoph
Geurts Joost
Karlgren Jussi
Kauber Markus
Köhler Joachim
Ortgies Robert
Sebe Nicu
Publication venue: Chorus Project Consortium
Publication date: 01/01/2009
Field of study

The third and last CHORUS conference on Multimedia Search Engines took place from the 26th to the 27th of May 2009 in Brussels, Belgium. About 100 participants from 15 European countries, the US, Japan and Australia learned about the latest developments in the domain. An exhibition of 13 stands presented 16 research projects currently ongoing around the world

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Scuttling Web Opportunities By Application Cramming

Author: Alahari Hanumat Prasad
Dhulipalla Vijaya Sree
Publication venue: Kakinada Institute of Engineering and Technology for Women
Publication date: 28/10/2014
Field of study

The web contains large data and it contains innumerable websites that is monitored by a tool or a program known as Crawler. The main goal of this paper is to focus on the web forum crawling techniques. In this paper, the various techniques of web forum crawler and challenges of crawling are discussed. The paper also gives the overview of web crawling and web forums. Internet is emergent exponentially and has become progressively more. Now, it is complicated to retrieve relevant information from internet. The rapid growth of the internet poses unprecedented scaling challenges for general purpose crawlers and search engines. In this paper, we present a novel Forum Crawler under Supervision (FoCUS) method, which supervised internet-scale forum crawler. The intention of FoCUS is to crawl relevant forum information from the internet with minimal overhead, this crawler is to selectively seek out pages that are pertinent to a predefined set of topics, rather than collecting and indexing all accessible web documents to be capable to answer all possible ad-hoc questions. FoCUS is continuously keeps on crawling the internet and finds any new internet pages that have been added to the internet, pages that have been removed from the internet. Due to growing and vibrant activity of the internet; it has become more challengeable to navigate all URLs in the web documents and to handle these URLs. We will take one seed URL as input and search with a keyword, the searching result is based on keyword and it will fetch the internet pages where it will find that keywor

International Journal of Science Engineering and Advance Technology (IJSEAT)

Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating and Analyzing Large-Scale e-Commerce Data

Author: Bapna Ravi
Goes Paulo
Gopal Ram
Marsden James R.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/05/2006
Field of study

Widespread e-commerce activity on the Internet has led to new opportunities to collect vast amounts of micro-level market and nonmarket data. In this paper we share our experiences in collecting, validating, storing and analyzing large Internet-based data sets in the area of online auctions, music file sharing and online retailer pricing. We demonstrate how such data can advance knowledge by facilitating sharper and more extensive tests of existing theories and by offering observational underpinnings for the development of new theories. Just as experimental economics pushed the frontiers of economic thought by enabling the testing of numerous theories of economic behavior in the environment of a controlled laboratory, we believe that observing, often over extended periods of time, real-world agents participating in market and nonmarket activity on the Internet can lead us to develop and test a variety of new theories. Internet data gathering is not controlled experimentation. We cannot randomly assign participants to treatments or determine event orderings. Internet data gathering does offer potentially large data sets with repeated observation of individual choices and action. In addition, the automated data collection holds promise for greatly reduced cost per observation. Our methods rely on technological advances in automated data collection agents. Significant challenges remain in developing appropriate sampling techniques integrating data from heterogeneous sources in a variety of formats, constructing generalizable processes and understanding legal constraints. Despite these challenges, the early evidence from those who have harvested and analyzed large amounts of e-commerce data points toward a significant leap in our ability to understand the functioning of electronic commerce.Comment: Published at http://dx.doi.org/10.1214/088342306000000231 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Caltech Authors