226,684 research outputs found

    Building a domain-specific document collection for evaluating metadata effects on information retrieval

    Get PDF
    This paper describes the development of a structured document collection containing user-generated text and numerical metadata for exploring the exploitation of metadata in information retrieval (IR). The collection consists of more than 61,000 documents extracted from YouTube video pages on basketball in general and NBA (National Basketball Association) in particular, together with a set of 40 topics and their relevance judgements. In addition, a collection of nearly 250,000 user profiles related to the NBA collection is available. Several baseline IR experiments report the effect of using video-associated metadata on retrieval effectiveness. The results surprisingly show that searching the videos titles only performs significantly better than searching additional metadata text fields of the videos such as the tags or the description

    STRUCTURED DOCUMENT LOGIC

    Get PDF
    This paper describes some practical and theoretical foundations of Structured Document Logic (SDL), which is a logical methodology for analyzing properties of Web documents, like XML or HTML. SDL can make benefits in searching of HTML pages, or in defining filters for web documents. Both syntax and semantics of SDL are described, and an efficient evaluation algorithm is also introduced

    Methods and means used in programming intelligent searches of technical documents

    Get PDF
    In order to meet the data research requirements of the Safety, Reliability & Quality Assurance activities at Kennedy Space Center (KSC), a new computer search method for technical data documents was developed. By their very nature, technical documents are partially encrypted because of the author's use of acronyms, abbreviations, and shortcut notations. This problem of computerized searching is compounded at KSC by the volume of documentation that is produced during normal Space Shuttle operations. The Centralized Document Database (CDD) is designed to solve this problem. It provides a common interface to an unlimited number of files of various sizes, with the capability to perform any diversified types and levels of data searches. The heart of the CDD is the nature and capability of its search algorithms. The most complex form of search that the program uses is with the use of a domain-specific database of acronyms, abbreviations, synonyms, and word frequency tables. This database, along with basic sentence parsing, is used to convert a request for information into a relational network. This network is used as a filter on the original document file to determine the most likely locations for the data requested. This type of search will locate information that traditional techniques, (i.e., Boolean structured key-word searching), would not find

    Combining Concept- with Content-based Multimedia Retrieval

    Get PDF
    The arrival of the XML standard opened new doors for structured document search. Common approach in XML retrieval is to directly exploit the documents structure. However this is likely to fail for two reasons. First of all, it neglects the rich multimedia character of documents on the Internet, where a wide variety of multimedia objects can be found such as text, images and streaming video. Secondly, using the document structure as the basis for searching the content of a document can easily lead to semantical misinterpretation of the document's content. This chapter discusses an approach for searching rich multimedia document collections, that tackles these two problems using a combination of conceptual search and content-based retrieval

    Adding Hierarchical Objects to Relational Database General-Purpose XML-Based Information Managements

    Get PDF
    NETMARK is a flexible, high-throughput software system for managing, storing, and rapid searching of unstructured and semi-structured documents. NETMARK transforms such documents from their original highly complex, constantly changing, heterogeneous data formats into well-structured, common data formats in using Hypertext Markup Language (HTML) and/or Extensible Markup Language (XML). The software implements an object-relational database system that combines the best practices of the relational model utilizing Structured Query Language (SQL) with those of the object-oriented, semantic database model for creating complex data. In particular, NETMARK takes advantage of the Oracle 8i object-relational database model using physical-address data types for very efficient keyword searches of records across both context and content. NETMARK also supports multiple international standards such as WEBDAV for drag-and-drop file management and SOAP for integrated information management using Web services. The document-organization and -searching capabilities afforded by NETMARK are likely to make this software attractive for use in disciplines as diverse as science, auditing, and law enforcement

    A Compressive Survey on New Technique Towards Successful Document Research Using Key Phrase Annotations Together with Querying Benefit

    Get PDF
    Generally it can be challenging to find out the particular pertinent data inside unstructured wording paperwork. This kind of information is still suffocated within unstructured wording and terminology. Annotations by means of Characteristic name-value frames tend to be more significant for retrieval of this sort of documents. This system proposes a novel, different, alternative approach for document retrieval which includes annotations identification. This system identifies the values of structured attributes by reading, analyzing and parsing the uploaded documents. This system proposes an approach for efficient document retrieval using effective methods. The main use of this system is that when users of author perform query based search, they could get minimum and distinct accurate results where it could be easy for retr ieval data from the database. By using these techniques two techniques, workload of system can reduce by large amount. And it also, given the fact the effic iency of searching annotation document will be faster because of using the query-based searching technique or content value searching

    A Semantic Portal for Fund Finding in the EU: Semantic Upgrade, Integration and Publication of Heterogeneous Legacy Data

    Get PDF
    FundFinder is a Semantic Web portal that allows searching for and navigating through information about funding opportunities. This application has been created following a set of techniques and using a set of tools for the upgrade of legacy content to the Semantic Web, including databases and semi-structured documents. This process consists in extracting and populating knowledge from heterogeneous information sources and making it available on the Web

    The decisions and processes involved in a systematic search strategy: a hierarchical framework

    Get PDF
    OBJECTIVE: The decisions and processes that may compose a systematic search strategy have not been formally identified and categorized. This study aimed to (1) identify all decisions that could be made and processes that could be used in a systematic search strategy and (2) create a hierarchical framework of those decisions and processes. METHODS: The literature was searched for documents or guides on conducting a literature search for a systematic review or other evidence synthesis. The decisions or processes for locating studies were extracted from eligible documents and categorized into a structured hierarchical framework. Feedback from experts was sought to revise the framework. The framework was revised iteratively and tested using recently published literature on systematic searching. RESULTS: Guidance documents were identified from expert organizations and a search of the literature and Internet. Data were extracted from 74 eligible documents to form the initial framework. The framework was revised based on feedback from 9 search experts and further review and testing by the authors. The hierarchical framework consists of 119 decisions or processes sorted into 17 categories and arranged under 5 topics. These topics are “Skill of the searcher,” “Selecting information to identify,” “Searching the literature electronically,” “Other ways to identify studies,” and “Updating the systematic review.” CONCLUSIONS: The work identifies and classifies the decisions and processes used in systematic searching. Future work can now focus on assessing and prioritizing research on the best methods for successfully identifying all eligible studies for a systematic review
    • 

    corecore