20,609 research outputs found

    The State-of-the-arts in Focused Search

    Get PDF
    The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

    A multi-layered Bayesian network model for structured document retrieval

    Get PDF
    New standards in document representation, like for example SGML, XML, and MPEG-7, compel Information Retrieval to design and implement models and tools to index, retrieve and present documents according to the given document structure. The paper presents the design of an Information Retrieval system for multimedia structured documents, like for example journal articles, e-books, and MPEG-7 videos. The system is based on Bayesian Networks, since this class of mathematical models enable to represent and quantify the relations between the structural components of the document. Some preliminary results on the system implementation are also presented

    A multi-layered Bayesian network model for structured document retrieval

    Get PDF
    New standards in document representation, like for example SGML, XML, and MPEG-7, compel Information Retrieval to design and implement models and tools to index, retrieve and present documents according to the given document structure. The paper presents the design of an Information Retrieval system for multimedia structured documents, like for example journal articles, e-books, and MPEG-7 videos. The system is based on Bayesian Networks, since this class of mathematical models enable to represent and quantify the relations between the structural components of the document. Some preliminary results on the system implementation are also presented

    IMPLEMENTASI ALGORITMA SIMNOMERGE UNTUK INFORMATION RETRIEVAL DENGAN KOLEKSI DOKUMEN TERSTRUKTUR XML

    Get PDF
    ABSTRAKSI: XML (eXtensible Markup Language) adalah spesifikasi umum untuk membuat kustom markup language. XML diklasifikasikan sebagai extensible language, karena memperbolehkan pengguna untuk mendefinisikan sendiri elemen-elemen markup, Tujuan dari XML adalah untuk membantu sistem-sistem informasi dalam berbagai struktur data, khususnya melalui internet, disamping itu juga diperuntukan untuk encode ke dokumen-dokumen, dan untuk serialisasi data. Hal ini membuat XML diadaptasi secara luas dan membuat para peneliti untuk membuat Information Retrieval untuk dokumen XML. Dalam tugas akhir ini di bangun Information Retrieval terstruktur dengan menggunakan dokumen XML. Dalam implementasinya algoritma perangkingan yang digunakan adalah algoritma SimNoMerge. Untuk pembobotan dalam algoritma SimNoMerge digunakan tiga jenis pembobotan. Pembobotan menggunakan TF, IDF, dan TF-IDF. Pembobotan tersebut dikombinasikan dalam pengujian penggunaan preprocessing untuk kemudian dianalisis keluarannya. Hasil penelitian menunjukkan bahwa algoritma SimNoMerge dapat digunakan untuk melakukan perangkingan terhadap dokumen XML, meskipun memiliki nilai pengujian precision yang cenderung rendah, sedangkan untuk pengujian recall memiliki nilai yang lebih baik. Dengan melakukan perbandingan nilai rata – rata presicion dan recall dari hasil pengujian, didapatkan bahwa penggunaan preprocessing dengan kombinasi ketiga jenis pembobotan lebih baik untuk diterapkan dalam sistem information retrieval terstruktur dibandingkan jika tidak menggunakan preprocessing apapun jenis pembobotan yang digunakan.Kata Kunci : Information Retrieval Terstruktur, SimNoMerge, TF, IDF, TF-IDFABSTRACT: XML (Extensible Markup Language) is a general specification to create a custom markup language. XML is classified as an extensible language, because it allows users to define their own markup elements, purpose of XML is to help information systems in a variety of data structures, particularly through the internet, besides that it is also designed to encode the documents, and for serialization data. This makes the XML is widely adapted and made the researchers to make for XML Information Retrieval. In this final task will be develop structured Information Retrieval using XML. SIMNOMERGE is the algorithm used for rank document. For the weighting in the SimNoMerge algorithm used three kinds of weightings. Weighting using the TF, IDF, and TF-IDF. weightings output are then analyzed. This is to measure the performance of the algorithm based on specific weights. Result showed that the algorithms can be used to SimNoMerge do ranking of XML documents, although a test score is not high precision, whereas for test recall having a better value, this indicates that the algorithm can perform SimNoMerge of ranking result in many irrelevant documents. By comparison the value of averages of precision and recall from the testing, was found that appliying preprocessing with combination of all three types of weighting was better to apply on structured information retrieval instead of not applying preprocessing.Keyword: Structured Information Retrieval, SimNoMerge, TF, IDF, TF-ID

    Investigating the document structure as a source of evidence for multimedia fragment retrieval

    Get PDF
    International audienceMultimedia objects can be retrieved using their context that can be for instance the text surrounding them in documents. This text may be either near or far from the searched objects. Our goal in this paper is to study the impact, in term of effectiveness, of text position relatively to searched objects. The multimedia objects we consider are described in structured documents such as XML ones. The document structure is therefore exploited to provide this text position in documents. Although structural information has been shown to be an effective source of evidence in textual information retrieval, only a few works investigated its interest in multimedia retrieval. More precisely, the task we are interested in this paper is to retrieve multimedia fragments (i.e. XML elements having at least one multimedia object). Our general approach is built on two steps: we first retrieve XML elements containing multimedia objects, and we then explore the surrounding information to retrieve relevant multimedia fragments. In both cases, we study the impact of the surrounding information using the documents structure.Our work is carried out on images, but it can be extended to any other media, since the physical content of multimedia objects is not used. We conducted several experiments in the context of the Multimedia track of the INEX evaluation campaign. Results showed that structural evidences are of high interest to tune the importance of textual context for multimedia retrieval. Moreover, the proposed approach outperforms state of the art approaches

    Using Proximity and Tag Weights for Focused Retrieval in Structured Documents

    Get PDF
    International audienceFocused information retrieval is concerned with the retrieval of small units of information. In this context, the structure of the documents as well as the proximity among query terms have been found useful for improving retrieval effectiveness. In this article, we propose an approach combining the proximity of the terms and the tags which mark these terms. Our approach is based on a Fetch and Browse method where the fetch step is performed with BM25 and the browse step with a structure enhanced proximity model. In this way, the ranking of a document depends not only upon the existence of the query terms within the document but also upon the tags which mark these terms. Thus, the document tends to be highly relevant when query terms are close together and are emphasized by tags. The evaluation of this model on a large XML structured collection provided by the INEX 2010 XML IR evaluation campaign shows that the use of term proximity and structure improves the retrieval effectiveness of BM25 in the context of focused information retrieval

    Accessing Information Based on a Combination of Document Structure and Content: Exploiting XML tags in indexing and searching to enhance content retrieval of online document-centric XML encoded texts

    Get PDF
    This study explores the challenges of using traditional information retrieval methods to retrieve document-centric XML encoded text. It demonstrates how coupling structure and content in query and index formulation improves retrieval performance. Native XML database (NXD) and search engine technologies were evaluated in a baseline experiment, and in a second test after alterations were made to their respective indexes. Documents were retrieved for simple and complex forms of 30 XPath and keyword queries from a corpus of 95 XML/TEI encoded texts. Overall results indicated that query augmentation using document structure improves retrieval performance. Complex queries submitted to the NXD produced the most satisfying results, with an average precision of 93.3% and an average recall of 86.3%. Performance improvements were also achieved using complex, structured queries and indexes in the search engine. Study findings suggest that effective XML retrieval models might result from a combination of unstructures and structured retrieval techniques

    Visual exploration and retrieval of XML document collections with the generic system X2

    Get PDF
    This article reports on the XML retrieval system X2 which has been developed at the University of Munich over the last five years. In a typical session with X2, the user first browses a structural summary of the XML database in order to select interesting elements and keywords occurring in documents. Using this intermediate result, queries combining structure and textual references are composed semiautomatically. After query evaluation, the full set of answers is presented in a visual and structured way. X2 largely exploits the structure found in documents, queries and answers to enable new interactive visualization and exploration techniques that support mixed IR and database-oriented querying, thus bridging the gap between these three views on the data to be retrieved. Another salient characteristic of X2 which distinguishes it from other visual query systems for XML is that it supports various degrees of detailedness in the presentation of answers, as well as techniques for dynamically reordering and grouping retrieved elements once the complete answer set has been computed
    corecore