57,075 research outputs found
Creation of a Style Independent Intelligent Autonomous Citation Indexer to Support Academic Research
This paper describes the current state of RUgle, a system for
classifying and indexing papers made available on the
World Wide Web, in a domain-independent and universal
manner. By building RUgle with the most relaxed
restrictions possible on the formatting of the documents it
can process, we hope to create a system that can combine
the best features of currently available closed library
searches that are designed to facilitate academic research
with the inclusive nature of general purpose search engines
that continually crawl the web and add documents to their
indexed database
Melody based tune retrieval over the World Wide Web
In this paper we describe the steps taken to develop a Web-based version of an existing stand-alone, single-user digital library application for melodical searching of a collection of music. For the three key components: input, searching, and output, we assess the suitability of various Web-based strategies that deal with the now distributed software architecture and explain the decisions we made. The resulting melody indexing service, known as MELDEX, has been in operation for one year, and the feed-back we have received has been favorable
Indexing relations on the web
Journal ArticleThere has been a substantial increase in the volume of (semi) structured data on the Web. This opens new opportunities for exploring and querying these data that goes beyond the keyword-based queries traditionally used on the Web. But supporting queries over a very large number of apparently disconnected Web sources is challenging. In this paper we propose index methods that capture both the structure of the sources and connections between them. The indexes are designed for data that is represented as relations, such as HTML tables, and support queries with predicates. We show how associations between overlapping sources are discovered, captured in the indexes, and used to derive query rewritings that join multiple sources. We demonstrate, through an experimental evaluation
A Brief History of Web Crawlers
Web crawlers visit internet applications, collect data, and learn about new
web pages from visited pages. Web crawlers have a long and interesting history.
Early web crawlers collected statistics about the web. In addition to
collecting statistics about the web and indexing the applications for search
engines, modern crawlers can be used to perform accessibility and vulnerability
checks on the application. Quick expansion of the web, and the complexity added
to web applications have made the process of crawling a very challenging one.
Throughout the history of web crawling many researchers and industrial groups
addressed different issues and challenges that web crawlers face. Different
solutions have been proposed to reduce the time and cost of crawling.
Performing an exhaustive crawl is a challenging question. Additionally
capturing the model of a modern web application and extracting data from it
automatically is another open question. What follows is a brief history of
different technique and algorithms used from the early days of crawling up to
the recent days. We introduce criteria to evaluate the relative performance of
web crawlers. Based on these criteria we plot the evolution of web crawlers and
compare their performanc
- …