45,651 research outputs found

    Computer-aided Semantic Signature Identification and Document Classification via Semantic Signatures

    Get PDF
    In this era of textual data explosion on the World Wide Web, it may be very hard to find documents that are similar to the documents that are of interest to us. To overcome this problem we have developed a type of semantic signature that captures the semantics of target content (text). Semantic signatures from a text/document of interest are derived using the software package semantic signature mining tool (SSMinT). This software package has been developed as a part of this thesis work in collaboration with Sri Ramya Peddada. These semantic signatures are used to search and retrieve documents with similar semantic patterns. Effects of different representations of semantic signatures on the document classification outcomes are illustrated. Retrieved document classification accuracies of Euclidean and Spherical K-means clustering algorithms are compared. A Chi-square test is presented to prove that the observed and expected numbers of documents retrieved (from a corpus) are not significantly different. From this Chi-square test it is proved that the semantic signature concept is capable of retrieving documents of interest with high probability. Our findings indicate that this concept has potential for use in commercial text/document searching applications

    KIIT Digital Library: An open hypermedia Application

    No full text
    The massive use of Web technologies has spurred a new revolution in information storing and retrieving. It has always been an issue whether to incorporate hyperlinks embedded in a document or to store them separately in a link base. Research effort has been concentrated on the development of link services that enable hypermedia functionality to be integrate into the general computing environment and allow linking from all tools on the browser or desktop. KIIT digital library is such an application that focuses mainly on architecture and protocols of Open Hypermedia Systems (OHS), providing on-line document authoring, browsing, cataloguing, searching and updating features. The WWW needs fundamentally new frameworks and concepts to support new search and indexing functionality. This is because of the frequent use of digital archives and to maintain huge amount of database and documents. These digital materials range from electronic versions of books and journals offered by traditional publishers to manuscripts, photographs, maps, sound recordings and similar materials digitized from libraries' own special collections to new electronic scholarly and scientific databases developed through the collaboration of researchers, computer and information scientists, and librarians. Metadata in catalogue systems are an indispensable tool to find information and services in networks. Technological advances provide new opportunities to facilitate the process of collecting and maintaining metadata and to facilitate using catalogue systems. The overall objective is how to make best use of catalogue systems. Information systems such as the World Wide Web, Digital Libraries, inventories of satellite images and other repositories contain more data than ever before, are globally distributed, easy to use and, therefore, become accessible to huge, heterogeneous user groups. For KIIT Digital Library, we have used Resource Development Framework (RDF) and Dublin Core (DC) standards to incorporate metadata. Overall KIIT digital library provides electronic access to information in many different forms. Recent technological advances make the storage and transmission of digital information possible. This project is to design and implement a cataloguing system of the digital library system suitable for storage, indexing, and retrieving information and providing that information across the Internet. The goal is to allow users to quickly search indices to locate segments of interests and view and manipulate these segments on their remote computers

    A Study on Ranking Method in Retrieving Web Pages Based on Content and Link Analysis: Combination of Fourier Domain Scoring and Pagerank Scoring

    Get PDF
    Ranking module is an important component of search process which sorts through relevant pages. Since collection of Web pages has additional information inherent in the hyperlink structure of the Web, it can be represented as link score and then combined with the usual information retrieval techniques of content score. In this paper we report our studies about ranking score of Web pages combined from link analysis, PageRank Scoring, and content analysis, Fourier Domain Scoring. Our experiments use collection of Web pages relate to Statistic subject from Wikipedia with objectives to check correctness and performance evaluation of combination ranking method. Evaluation of PageRank Scoring show that the highest score does not always relate to Statistic. Since the links within Wikipedia articles exists so that users are always one click away from more information on any point that has a link attached, it it possible that unrelated topics to Statistic are most likely frequently mentioned in the collection. While the combination method show link score which is given proportional weight to content score of Web pages does effect the retrieval results

    Semantic web-based document: editing and browsing in AktiveDoc

    Get PDF
    This paper presents a tool for supporting sharing and reuse of knowledge in document creation (writing) and use (reading). Semantic Web technologies are used to support the production of ontology based annotations while the document is written. Free text annotations (comments) can be added to integrate the knowledge in the document. In addition the tool uses external services (e.g. a Semantic Web harvester) to propose relevant content to writing user, enabling easy knowledge reuse. Similar facilities are provided for readers when their task does not coincide with the author’s one. The tool is specifically designed for Knowledge Management in organisations. In this paper we present and discuss how Semantic Web technologies are designed and integrated in the system

    A Vertical PRF Architecture for Microblog Search

    Full text link
    In microblog retrieval, query expansion can be essential to obtain good search results due to the short size of queries and posts. Since information in microblogs is highly dynamic, an up-to-date index coupled with pseudo-relevance feedback (PRF) with an external corpus has a higher chance of retrieving more relevant documents and improving ranking. In this paper, we focus on the research question:how can we reduce the query expansion computational cost while maintaining the same retrieval precision as standard PRF? Therefore, we propose to accelerate the query expansion step of pseudo-relevance feedback. The hypothesis is that using an expansion corpus organized into verticals for expanding the query, will lead to a more efficient query expansion process and improved retrieval effectiveness. Thus, the proposed query expansion method uses a distributed search architecture and resource selection algorithms to provide an efficient query expansion process. Experiments on the TREC Microblog datasets show that the proposed approach can match or outperform standard PRF in MAP and NDCG@30, with a computational cost that is three orders of magnitude lower.Comment: To appear in ICTIR 201

    PRES: A score metric for evaluating recall-oriented information retrieval applications

    Get PDF
    Information retrieval (IR) evaluation scores are generally designed to measure the effectiveness with which relevant documents are identified and retrieved. Many scores have been proposed for this purpose over the years. These have primarily focused on aspects of precision and recall, and while these are often discussed with equal importance, in practice most attention has been given to precision focused metrics. Even for recalloriented IR tasks of growing importance, such as patent retrieval, these precision based scores remain the primary evaluation measures. Our study examines different evaluation measures for a recall-oriented patent retrieval task and demonstrates the limitations of the current scores in comparing different IR systems for this task. We introduce PRES, a novel evaluation metric for this type of application taking account of recall and the user’s search effort. The behaviour of PRES is demonstrated on 48 runs from the CLEF-IP 2009 patent retrieval track. A full analysis of the performance of PRES shows its suitability for measuring the retrieval effectiveness of systems from a recall focused perspective taking into account the user’s expected search effort
    corecore