392,417 research outputs found
How to Search the Internet Archive Without Indexing It
Significant parts of cultural heritage are produced on the web during the
last decades. While easy accessibility to the current web is a good baseline,
optimal access to the past web faces several challenges. This includes dealing
with large-scale web archive collections and lacking of usage logs that contain
implicit human feedback most relevant for today's web search. In this paper, we
propose an entity-oriented search system to support retrieval and analytics on
the Internet Archive. We use Bing to retrieve a ranked list of results from the
current web. In addition, we link retrieved results to the WayBack Machine;
thus allowing keyword search on the Internet Archive without processing and
indexing its raw archived content. Our search system complements existing web
archive search tools through a user-friendly interface, which comes close to
the functionalities of modern web search engines (e.g., keyword search, query
auto-completion and related query suggestion), and provides a great benefit of
taking user feedback on the current web into account also for web archive
search. Through extensive experiments, we conduct quantitative and qualitative
analyses in order to provide insights that enable further research on and
practical applications of web archives
Digital and programmable economy applications: A smart cities congestion case by fuzzy sets
Currently, cities are facing great challenges such as the population growing, citizen wellbeing, externalities management or environmental deterioration. The search for solutions are making significant inroads into the incorporation of ICT in them and subsequent large-scale digitalization such as programmable economy (PE) applications, offering the possibility to develop new approaches over these issues, in particular which related to sustainability management. Operating under a fuzzy numbers methodology and FIS (Fuzzy Inference System), the present exploratory work shows a new approach to city urban congestion management by deploying PE applications, which include some disruptive inputs such as the Internet of value, blockchain/DLT (distributed ledger, technology), smarts contracts, digital assets and the monetization, all of this combined with the human motivation
Secondary Liability and the Fragmentation of Digital Copyright Law
The digital age brought many challenges for copyright law. While offering enticing new formats for the production and dissemination of copyright content, it also raised the specter of large scale digital piracy. Since the end of the 20th century, content industries have reeled to keep up with technological developments that offer significant promise as well as threats of large scale piracy. There has always been some tension between promoting innovation in content creation and promoting innovation in technologies that enable the enjoyment of copyright works, such as photocopiers, audio tape recorders, video tape recorders, and peer-to-peer file sharing systems. The manufacturers and distributors of these technologies have had to tread a fine line in their marketing and distribution efforts to avoid liability for secondary copyright infringement based on direct infringements by their customers. To this list of technologies, we may now add Internet search engines and online payment systems. This paper considers ways in which copyright law has addressed the secondary liability question in an increasingly digital marketplace. It suggests that the realities of this marketplace necessitate a new look at broader policy issues underlying digital copyright law in order to meaningfully address questions of secondary liability online
A scalable Peer-to-Peer System for Music Content and Information Retrieval
Currently a large percentage of internet traffice consists of music files, typically stored in MP3 compressed audio format, shared and exchanged over Peer-to-Peer (P2P) networks. Searching for music is performed by specifying keywords and naive string matching techniques. In the past years the emerging research area of Music Information Retrieval (MIR) has produced a variety of new ways of looking at the problem of music search. Such MIR techniques can significantly enhance the ways users search for music over P2P networks. In order for that to happen there are two main challenges that need to be addressed: 1) scalability to large collections and number of peers 2) richer set of search semantics that can support MIR especially when retrieval is content-based. In this paper, we describe a scalable P2P system that uses Rendezvouz Points (RPs) for music metadata registration and query resolution, that supports atribute-value search semantics as well as content-based retrieval. The performance of the system has been evaluated in large scale usage scenarios using "real" automatically calculated musical content descriptors
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
CHORUS Deliverable 4.5: Report of the 3rd CHORUS Conference
The third and last CHORUS conference on Multimedia Search Engines took place from the 26th to the 27th of May 2009 in Brussels, Belgium. About 100 participants from 15 European countries, the US, Japan and Australia learned about the latest developments in the domain. An exhibition of 13 stands presented 16 research projects currently ongoing around the
world
Scuttling Web Opportunities By Application Cramming
The web contains large data and it contains innumerable websites that is monitored by a tool or a program known as Crawler. The main goal of this paper is to focus on the web forum crawling techniques. In this paper, the various techniques of web forum crawler and challenges of crawling are discussed. The paper also gives the overview of web crawling and web forums. Internet is emergent exponentially and has become progressively more. Now, it is complicated to retrieve relevant information from internet. The rapid growth of the internet poses unprecedented scaling challenges for general purpose crawlers and search engines. In this paper, we present a novel Forum Crawler under Supervision (FoCUS) method, which supervised internet-scale forum crawler. The intention of FoCUS is to crawl relevant forum information from the internet with minimal overhead, this crawler is to selectively seek out pages that are pertinent to a predefined set of topics, rather than collecting and indexing all accessible web documents to be capable to answer all possible ad-hoc questions. FoCUS is continuously keeps on crawling the internet and finds any new internet pages that have been added to the internet, pages that have been removed from the internet. Due to growing and vibrant activity of the internet; it has become more challengeable to navigate all URLs in the web documents and to handle these URLs. We will take one seed URL as input and search with a keyword, the searching result is based on keyword and it will fetch the internet pages where it will find that keywor
Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating and Analyzing Large-Scale e-Commerce Data
Widespread e-commerce activity on the Internet has led to new opportunities
to collect vast amounts of micro-level market and nonmarket data. In this paper
we share our experiences in collecting, validating, storing and analyzing large
Internet-based data sets in the area of online auctions, music file sharing and
online retailer pricing. We demonstrate how such data can advance knowledge by
facilitating sharper and more extensive tests of existing theories and by
offering observational underpinnings for the development of new theories. Just
as experimental economics pushed the frontiers of economic thought by enabling
the testing of numerous theories of economic behavior in the environment of a
controlled laboratory, we believe that observing, often over extended periods
of time, real-world agents participating in market and nonmarket activity on
the Internet can lead us to develop and test a variety of new theories.
Internet data gathering is not controlled experimentation. We cannot randomly
assign participants to treatments or determine event orderings. Internet data
gathering does offer potentially large data sets with repeated observation of
individual choices and action. In addition, the automated data collection holds
promise for greatly reduced cost per observation. Our methods rely on
technological advances in automated data collection agents. Significant
challenges remain in developing appropriate sampling techniques integrating
data from heterogeneous sources in a variety of formats, constructing
generalizable processes and understanding legal constraints. Despite these
challenges, the early evidence from those who have harvested and analyzed large
amounts of e-commerce data points toward a significant leap in our ability to
understand the functioning of electronic commerce.Comment: Published at http://dx.doi.org/10.1214/088342306000000231 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- âŠ