32 research outputs found

    Text categorization and similarity analysis: similarity measure, literature review

    Get PDF
    Document classification and provenance has become an important area of computer science as the amount of digital information is growing significantly. Organisations are storing documents on computers rather than in paper form. Software is now required that will show the similarities between documents (i.e. document classification) and to point out duplicates and possibly the history of each document (i.e. provenance). Poor organisation is common and leads to situations like above. There exists a number of software solutions in this area designed to make document organisation as simple as possible. I'm doing my project with Pingar who are a company based in Auckland who aim to help organise the growing amount of unstructured digital data. This reports analyses the existing literature in this area with the aim to determine what already exists and how my project will be different from existing solutions

    Text categorization and similarity analysis: similarity measure, architecture and design

    Get PDF
    This research looks at the most appropriate similarity measure to use for a document classification problem. The goal is to find a method that is accurate in finding both semantically and version related documents. A necessary requirement is that the method is efficient in its speed and disk usage. Simhash is found to be the measure best suited to the application and it can be combined with other software to increase the accuracy. Pingar have provided an API that will extract the entities from a document and create a taxonomy displaying the relationships and this extra information can be used to accurately classify input documents. Two algorithms are designed incorporating the Pingar API and then finally an efficient comparison algorithm is introduced to cut down the comparisons required

    Browsing and book selection in the physical library shelves

    Get PDF
    Library users should be conveniently interact with collections and be able to easily choose books of interest as they explore and browse a physical book collection. While there exists a growing body of naturalistic studies of browsing and book selection in digital collections, the corresponding literature on behaviour in the physical stacks is surprisingly sparse. We add to this literature in this paper, by conducting observations of patrons in a university library as they selected books from the shelves. Our aim is to further our understanding of patterns of behaviour in browsing and selection in physical collections

    requirements and use cases

    Get PDF
    In this report, we introduce our initial vision of the Corporate Semantic Web as the next step in the broad field of Semantic Web research. We identify requirements of the corporate environment and gaps between current approaches to tackle problems facing ontology engineering, semantic collaboration, and semantic search. Each of these pillars will yield innovative methods and tools during the project runtime until 2013. Corporate ontology engineering will improve the facilitation of agile ontology engineering to lessen the costs of ontology development and, especially, maintenance. Corporate semantic collaboration focuses the human-centered aspects of knowledge management in corporate contexts. Corporate semantic search is settled on the highest application level of the three research areas and at that point it is a representative for applications working on and with the appropriately represented and delivered background knowledge. We propose an initial layout for an integrative architecture of a Corporate Semantic Web provided by these three core pillars

    prototypical implementations ; working packages in project phase II

    Get PDF
    In this technical report, we present the concepts and first prototypical imple- mentations of innovative tools and methods for personalized and contextualized (multimedia) search, collaborative ontology evolution, ontology evaluation and cost models, and dynamic access and trends in distributed (semantic) knowledge. The concepts and prototypes are based on the state of art analysis and identified requirements in the CSW report IV

    prototypical implementations

    Get PDF
    In this technical report, we present prototypical implementations of innovative tools and methods developed according to the working plan outlined in Technical Report TR-B-09-05 [23]. We present an ontology modularization and integration framework and the SVoNt server, the server-side end of an SVN- based versioning system for ontologies in the Corporate Ontology Engineering pillar. For the Corporate Semantic Collaboration pillar, we present the prototypical implementation of a light-weight ontology editor for non-experts and an ontology based expert finder system. For the Corporate Semantic Search pillar, we present a prototype for algorithmic extraction of relations in folksonomies, a tool for trend detection using a semantic analyzer, a tool for automatic classification of web documents using Hidden Markov models, a personalized semantic recommender for multimedia content, and a semantic search assistant developed in co-operation with the Museumsportal Berlin. The prototypes complete the next milestone on the path to an integral Cor- porate Semantic Web architecture based on the three pillars Corporate Ontol- ogy Engineering, Corporate Semantic Collaboration, and Corporate Semantic Search, as envisioned in [23]

    Validation and Evaluation

    Get PDF
    In this technical report, we present prototypical implementations of innovative tools and methods for personalized and contextualized (multimedia) search, collaborative ontology evolution, ontology evaluation and cost models, and dynamic access and trends in distributed (semantic) knowledge, developed according to the working plan outlined in Technical Report TR-B-12-04. The prototypes complete the next milestone on the path to an integral Corporate Semantic Web architecture based on the three pillars Corporate Ontology Engineering, Corporate Semantic Collaboration, and Corporate Semantic Search, as envisioned in TR-B-08-09
    corecore