Search CORE

498,984 research outputs found

Query-related data extraction of hidden web documents

Author: Hedley Y.
James A.
Sanderson M.
Younas M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dynamically generated through querying databases — which are referred to as Hidden Web databases. Documents returned in response to a user query are typically presented using templategenerated Web pages. This paper proposes a novel approach that identifies Web page templates by analysing the textual contents and the adjacent tag structures of a document in order to extract query-related data. Preliminary results demonstrate that our approach effectively detects templates and retrieves data with high recall and precision

Crossref

White Rose Research Online

Information extraction from template-generated hidden web documents

Author: Hedley Y.
James A.
Sanderson M.
Younas M.
Publication venue
Publication date: 01/01/2004
Field of study

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (such as Google and Yahoo). Databases dynamically generate a list of documents in response to a user query – which are referred to as Hidden Web databases. Such documents are typically presented to users as templategenerated Web pages. This paper presents a new approach that identifies Web page templates in order to extract queryrelated information from documents. We propose two forms of representation to analyse the content of a document – Text with Immediate Adjacent Tag Segments (TIATS) and Text with Neighbouring Adjacent Tag Segments (TNATS). Our techniques exploit tag structures that surround the textual contents of documents in order to detect Web page templates thereby extracting query-related information. Experimental results demonstrate that TNATS detects Web page templates most effectively and extracts information with high recall and precision

White Rose Research Online

A Comparison between Two Main Academic Literature Collections: Web of Science and Scopus Databases

Author: Ale Ebrahim Nader
Publication venue
Publication date: 27/06/2014
Field of study

Nowadays, the world’s scientific community has been publishing an enormous number of papers in different scientific fields. In such environment, it is essential to know which databases are equally efficient and objective for literature searches. It seems that two most extensive databases are Web of Science and Scopus. Besides searching the literature, these two databases used to rank journals in terms of their productivity and the total citations received to indicate the journals impact, prestige or influence. This article attempts to provide a comprehensive comparison of these databases to answer frequent questions which researchers ask, such as: How Web of Science and Scopus are different? In which aspects these two databases are similar? Or, if the researchers are forced to choose one of them, which one should they prefer? For answering these questions, these two databases will be compared based on their qualitative and quantitative characteristics.Cite as: Aghaei Chadegani, A., Salehi, H., Yunus, M. M., Farhadi, H., Fooladi, M., Farhadi, M., & Ale Ebrahim, N. (2013). A Comparison between Two Main Academic Literature Collections: Web of Science and Scopus Databases. Asian Social Science, 9(5), 18-26. doi: 10.5539/ass.v9n5p1

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Poster Presentation: Xcerpt and XChange – Logic Programming Languages for Querying and Evolution on the Web

Author: Bry François
Patranjan Paula-Lavinia
Schaffert Sebastian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

age Xcerpt and provides advanced, Web-specific capabilities, such as propagation of changes on the Web (change) and event-based communications between Web sites (exchange). Xcerpt: Querying Data on the Web Xcerpt is a declarative, rule-based query language for Web data (i.e. XML documents or semistructured databases) based on logic programming. An Xcerpt program contains at least one goal and some (maybe zero) rules. Rules and goals consist of query and construction patterns, called terms in analogy to other logic programming languages. Terms represent tree-like (or graph-like) structures. The children of a node may be either ordered (as in standard XML) or unordered (as is common in databases). Data terms are used to represent XML documents and the data items of a semistructured database. They are similar to ground functional programming expressions and logical atoms. A database is a (multi-)set of data terms (e.g. the Web). Query terms are patterns matched against Web resources

CiteSeerX

Crossref

Open Access LMU

A Comparison between Two Main Academic Literature Collections: Web of Science and Scopus Databases

Author: Aghaei Chadegani Arezoo (409257)
Ale Ebrahim Nader (409259)
Hadi Farhadi (409258)
Hadi Salehi (409255)
Maryam Farhadi (120516)
Masood Fooladi (120520)
Melor Md Yunus (409256)
Nader Ale Ebrahim (100797)
Publication venue: 'Canadian Center of Science and Education'
Publication date: 01/01/2013
Field of study

Revistas de la Universidad de Valladolid

E-LIS

UM Digital Repository

FigShare

arXiv.org e-Print Archive

Munich RePEc Personal Archive

CiteSeerX

Crossref

Repository of the Academy's Library

DIALNET

A three-year study on the freshness of Web search engine databases

Author: Lewandowski Dirk
Publication venue
Publication date: 01/01/2008
Field of study

This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Yahoo, and MSN/Live.com. We conducted a test of the updates of 40 daily updated pages and 30 irregularly updated pages, respectively. We used data from a time span of six weeks in the years 2005, 2006, and 2007. We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Frequency distributions for the pages’ ages are skewed, which means that search engines do differentiate between often- and seldom-updated pages. This is confirmed by the difference between the average ages of daily updated pages and our control group of pages. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another

E-LIS

REPOSIT

Recommended from our members

Hierarchical classification for multiple, distributed web databases

Author: Yang Hui
Zhang Minjie
Publication venue
Publication date: 01/01/2004
Field of study

The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance

Open Research Online (The Open University)

White Rose Research Online