9,888 research outputs found

    Online visualization of bibliography Using Visualization Techniques

    Get PDF
    Visualization is a concept where we can represent some raw data in the form of graphs, images, charts, etc. which will be very helpful for the end-user to correlate and be able to understand the relationships between the data elements in a single screen. Representing the bibliographic information of the computer science journals and proceedings using Visualization technique would help user choose a particular author and navigate through the hierarchy and find out what papers the author has published, the keywords of the papers, what papers cite them, the co-authors along with the main author, and how many papers are published by the author selected by the user and so on in a single page. These information is right now present in a scattered manner and the user has to search on websites like Google Scholar [1], Cite Seer [2] to get these bibliographic records. By the use of visualization techniques, all the information can be accessed on a single page by having a graph like points on the page, where the user can search for a particular author and the author and its co-authors are represented in the form of points. The goal of this project is to enhance current bibliography web services with an intuitive interactive visualization interface and to improve user understanding and conceptualization. In this project, we develop a simple web-interface which will take a search query from the user and find the related information like author\u27s name, the co-authors, number of papers published by him, related keywords, citations referred etc. The project uses the bibliographic records which are available as XML files from the Citeseer database[2], extracts the data into the database and then queries the database for the results using a web service. The data which is extracted is then presented visually to allow the user to conceptualize the results in a better way and help him/her find the articles of interest with utmost ease. In addition the user can interactively navigate the visual results to get more information about any of the article or the author displayed. So here we present both paper centric view and author centric view to the user by representing data in terms of graphs. The nodes in the graphs obtained for paper centric views and author centric views are color coded based on the paper’s weight parameter ( popularity of the paper ). For the paper centric view, the papers which are referring other papers are represented by providing a directed arrow from referred paper to referenced paper. Overall the idea here was to represent this related data in the form of a tree, so that the user can correlate all the data and get the relationships between them

    A Progressive Clustering Algorithm to Group the XML Data by Structural and Semantic Similarity

    Get PDF
    Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two individual documents. The experimental analysis shows the method to be fast and accurate

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    STARGATE : Static Repository Gateway and Toolkit. Final Project Report

    Get PDF
    STARGATE (Static Repository Gateway and Toolkit) was funded by the Joint Information Systems Committee (JISC) and is intended to demonstrate the ease of use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Static Repository technology, and the potential benefits offered to publishers in making their metadata available in this way This technology offers a simpler method of participating in many information discovery services than creating fully-fledged OAI-compliant repositories. It does this by allowing the infrastructure and technical support required to participate in OAI-based services to be shifted from the data provider (the journal) to a third party and allows a single third party gateway provider to provide intermediation for many data providers (journals). Specifically, STARGATE has created a series of Static Repositories of publisher metadata provided by a selection of Library and Information Science journals. It has demonstrated the interoperability of these repositories by exposing their metadata via a Static Repository Gateway for harvesting and cross-searching by external service providers. The project has conducted a critical evaluation of the Static Repository approach in conjunction with the participating publishers and service providers. The technology works. The project has demonstrated that Static Repositories are easy to create and that the differences between fully-fledged and static OAI Repositories have no impact on the participation of small journal publishers in OAI-based services. The problems for a service that arise out of the use of Static Repositories are parallel to those created by any other repository dealing with journal articles. Problems arise from the diversity of metadata element sets provided by a given journal and the lack of specific metadata elements for the articles' volume and issue details. Another issue for the use of publishers' metadata arise as the collection policies of some existing services only allow Open Access materials to be included in them. The project recommends that the use of Static Repositories continues to be explored - in particular as a flexible way to expose existing sets of structured information to OAI services and to create the opportunity to enhance the metadata as part of the process. The project further recommends that the publishing community consider the creation or adoption of an application profile for journal articles to support information discovery that can search by volume and issue. Significant further use of the Static Repository technology by small journal publishers will require the future creation and maintenance of a community-specific Static Repository Gateway. Further use will also require advocacy within the publishing community but might initially be most effectively kick-started through the creation of OAI repositories based on metadata held by the commercial services which publish or mediate access to electronic copies of journals on behalf of small publishers
    corecore