Search CORE

226,684 research outputs found

Building a domain-specific document collection for evaluating metadata effects on information retrieval

Author: Jones Gareth J.F.
Leveling Johannes
Magdy Walid
Min Jinming
Publication venue: European Language Resources Association
Publication date: 01/05/2010
Field of study

This paper describes the development of a structured document collection containing user-generated text and numerical metadata for exploring the exploitation of metadata in information retrieval (IR). The collection consists of more than 61,000 documents extracted from YouTube video pages on basketball in general and NBA (National Basketball Association) in particular, together with a set of 40 topics and their relevance judgements. In addition, a collection of nearly 250,000 user profiles related to the NBA collection is available. Several baseline IR experiments report the effect of using video-associated metadata on retrieval effectiveness. The results surprisingly show that searching the videos titles only performs significantly better than searching additional metadata text fields of the videos such as the tags or the description

CiteSeerX

Irish Universities

DCU Online Research Access Service

STRUCTURED DOCUMENT LOGIC

Author: Szegő Dániel
Publication venue: Periodica Polytechnica Electrical Engineering (Archives)
Publication date: 01/01/2003
Field of study

This paper describes some practical and theoretical foundations of Structured Document Logic (SDL), which is a logical methodology for analyzing properties of Web documents, like XML or HTML. SDL can make benefits in searching of HTML pages, or in defining filters for web documents. Both syntax and semantics of SDL are described, and an efficient evaluation algorithm is also introduced

Periodica Polytechnica (Budapest University of Technology and Economics)

Methods and means used in programming intelligent searches of technical documents

Author: Gross David L.
Publication venue
Publication date
Field of study

In order to meet the data research requirements of the Safety, Reliability & Quality Assurance activities at Kennedy Space Center (KSC), a new computer search method for technical data documents was developed. By their very nature, technical documents are partially encrypted because of the author's use of acronyms, abbreviations, and shortcut notations. This problem of computerized searching is compounded at KSC by the volume of documentation that is produced during normal Space Shuttle operations. The Centralized Document Database (CDD) is designed to solve this problem. It provides a common interface to an unlimited number of files of various sizes, with the capability to perform any diversified types and levels of data searches. The heart of the CDD is the nature and capability of its search algorithms. The most complex form of search that the program uses is with the use of a domain-specific database of acronyms, abbreviations, synonyms, and word frequency tables. This database, along with basic sentence parsing, is used to convert a request for information into a relational network. This network is used as a filter on the original document file to determine the most likely locations for the data requested. This type of search will locate information that traditional techniques, (i.e., Boolean structured key-word searching), would not find

NASA Technical Reports Server

Combining Concept- with Content-based Multimedia Retrieval

Author: Windhouwer M.A. (Menzo)
Zwol R. van
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2003
Field of study

The arrival of the XML standard opened new doors for structured document search. Common approach in XML retrieval is to directly exploit the documents structure. However this is likely to fail for two reasons. First of all, it neglects the rich multimedia character of documents on the Internet, where a wide variety of multimedia objects can be found such as text, images and streaming video. Secondly, using the document structure as the basis for searching the content of a document can easily lead to semantical misinterpretation of the document's content. This chapter discusses an approach for searching rich multimedia document collections, that tackles these two problems using a combination of conceptual search and content-based retrieval

CWI's Institutional Repository

Adding Hierarchical Objects to Relational Database General-Purpose XML-Based Information Managements

Author: Bell David
Gawdiak Yuri
Knight Chris
La Tracy
Lin Shu-Chun
Maluf David
Tran Khai Peter
Publication venue
Publication date
Field of study

NETMARK is a flexible, high-throughput software system for managing, storing, and rapid searching of unstructured and semi-structured documents. NETMARK transforms such documents from their original highly complex, constantly changing, heterogeneous data formats into well-structured, common data formats in using Hypertext Markup Language (HTML) and/or Extensible Markup Language (XML). The software implements an object-relational database system that combines the best practices of the relational model utilizing Structured Query Language (SQL) with those of the object-oriented, semantic database model for creating complex data. In particular, NETMARK takes advantage of the Oracle 8i object-relational database model using physical-address data types for very efficient keyword searches of records across both context and content. NETMARK also supports multiple international standards such as WEBDAV for drag-and-drop file management and SOAP for integrated information management using Web services. The document-organization and -searching capabilities afforded by NETMARK are likely to make this software attractive for use in disciplines as diverse as science, auditing, and law enforcement

NASA Technical Reports Server

A Compressive Survey on New Technique Towards Successful Document Research Using Key Phrase Annotations Together with Querying Benefit

Author: Miss. Jadhav Priyanka
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/11/2015
Field of study

Generally it can be challenging to find out the particular pertinent data inside unstructured wording paperwork. This kind of information is still suffocated within unstructured wording and terminology. Annotations by means of Characteristic name-value frames tend to be more significant for retrieval of this sort of documents. This system proposes a novel, different, alternative approach for document retrieval which includes annotations identification. This system identifies the values of structured attributes by reading, analyzing and parsing the uploaded documents. This system proposes an approach for efficient document retrieval using effective methods. The main use of this system is that when users of author perform query based search, they could get minimum and distinct accurate results where it could be easy for retr ieval data from the database. By using these techniques two techniques, workload of system can reduce by large amount. And it also, given the fact the effic iency of searching annotation document will be faster because of using the query-based searching technique or content value searching

International Journal on Recent and Innovation Trends in Computing and Communication

A Semantic Portal for Fund Finding in the EU: Semantic Upgrade, Integration and Publication of Heterogeneous Legacy Data

Author: J. Contreras
J.C. Arpírez
R. Studer
S. Staab
V.R. Benjamins
Ó. Corcho
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

FundFinder is a Semantic Web portal that allows searching for and navigating through information about funding opportunities. This application has been created following a set of techniques and using a set of tools for the upgrade of legacy content to the Semantic Web, including databases and semi-structured documents. This process consists in extracting and populating knowledge from heterogeneous information sources and making it available on the Web

Crossref

Archivo Digital UPM

The decisions and processes involved in a systematic search strategy: a hierarchical framework

Author: Beller Elaine
Glasziou Paul
Michael Clark Justin
Sanders Sharon
Publication venue: 'University Library System, University of Pittsburgh'
Publication date: 01/04/2021
Field of study

OBJECTIVE: The decisions and processes that may compose a systematic search strategy have not been formally identified and categorized. This study aimed to (1) identify all decisions that could be made and processes that could be used in a systematic search strategy and (2) create a hierarchical framework of those decisions and processes. METHODS: The literature was searched for documents or guides on conducting a literature search for a systematic review or other evidence synthesis. The decisions or processes for locating studies were extracted from eligible documents and categorized into a structured hierarchical framework. Feedback from experts was sought to revise the framework. The framework was revised iteratively and tested using recently published literature on systematic searching. RESULTS: Guidance documents were identified from expert organizations and a search of the literature and Internet. Data were extracted from 74 eligible documents to form the initial framework. The framework was revised based on feedback from 9 search experts and further review and testing by the authors. The hierarchical framework consists of 119 decisions or processes sorted into 17 categories and arranged under 5 topics. These topics are “Skill of the searcher,” “Selecting information to identify,” “Searching the literature electronically,” “Other ways to identify studies,” and “Updating the systematic review.” CONCLUSIONS: The work identifies and classifies the decisions and processes used in systematic searching. Future work can now focus on assessing and prioritizing research on the best methods for successfully identifying all eligible studies for a systematic review

Bond University Research Portal

Directory of Open Access Journals

PubMed Central