32 research outputs found
Text categorization and similarity analysis: similarity measure, literature review
Document classification and provenance has become an important area of computer science as the amount of digital information is growing significantly. Organisations are storing documents on computers rather than in paper form. Software is now required that will show the similarities between documents (i.e. document classification) and to point out duplicates and possibly the history of each document (i.e. provenance). Poor organisation is common and leads to situations like above. There exists a number of software solutions in this area designed to make document organisation as simple as possible. I'm doing my project with Pingar who are a company based in Auckland who aim to help organise the growing amount of unstructured digital data. This reports analyses the existing literature in this area with the aim to determine what already exists and how my project will be different from existing solutions
Text categorization and similarity analysis: similarity measure, architecture and design
This research looks at the most appropriate similarity measure to use for a document classification problem. The goal is to find a method that is accurate in finding both semantically and version related documents. A necessary requirement is that the method is efficient in its speed and disk usage. Simhash is found to be the measure best suited to the application and it can be combined with other software to increase the accuracy. Pingar have provided an API that will extract the entities from a document and create a taxonomy displaying the relationships and this extra information can be used to accurately classify input documents. Two algorithms are designed incorporating the Pingar API and then finally an efficient comparison algorithm is introduced to cut down the comparisons required
Browsing and book selection in the physical library shelves
Library users should be conveniently interact with collections and be able to easily choose books of interest as they explore and browse a physical book collection. While there exists a growing body of naturalistic studies of browsing and book selection in digital collections, the corresponding literature on behaviour in the physical stacks is surprisingly sparse. We add to this literature in this paper, by conducting observations of patrons in a university library as they selected books from the shelves. Our aim is to further our understanding of patterns of behaviour in browsing and selection in physical collections
requirements and use cases
In this report, we introduce our initial vision of the Corporate Semantic Web
as the next step in the broad field of Semantic Web research. We identify
requirements of the corporate environment and gaps between current approaches
to tackle problems facing ontology engineering, semantic collaboration, and
semantic search. Each of these pillars will yield innovative methods and tools
during the project runtime until 2013. Corporate ontology engineering will
improve the facilitation of agile ontology engineering to lessen the costs of
ontology development and, especially, maintenance. Corporate semantic
collaboration focuses the human-centered aspects of knowledge management in
corporate contexts. Corporate semantic search is settled on the highest
application level of the three research areas and at that point it is a
representative for applications working on and with the appropriately
represented and delivered background knowledge. We propose an initial layout
for an integrative architecture of a Corporate Semantic Web provided by these
three core pillars
prototypical implementations ; working packages in project phase II
In this technical report, we present the concepts and first prototypical
imple- mentations of innovative tools and methods for personalized and
contextualized (multimedia) search, collaborative ontology evolution, ontology
evaluation and cost models, and dynamic access and trends in distributed
(semantic) knowledge. The concepts and prototypes are based on the state of
art analysis and identified requirements in the CSW report IV
prototypical implementations
In this technical report, we present prototypical implementations of
innovative tools and methods developed according to the working plan outlined
in Technical Report TR-B-09-05 [23]. We present an ontology modularization and
integration framework and the SVoNt server, the server-side end of an SVN-
based versioning system for ontologies in the Corporate Ontology Engineering
pillar. For the Corporate Semantic Collaboration pillar, we present the
prototypical implementation of a light-weight ontology editor for non-experts
and an ontology based expert finder system. For the Corporate Semantic Search
pillar, we present a prototype for algorithmic extraction of relations in
folksonomies, a tool for trend detection using a semantic analyzer, a tool for
automatic classification of web documents using Hidden Markov models, a
personalized semantic recommender for multimedia content, and a semantic
search assistant developed in co-operation with the Museumsportal Berlin. The
prototypes complete the next milestone on the path to an integral Cor- porate
Semantic Web architecture based on the three pillars Corporate Ontol- ogy
Engineering, Corporate Semantic Collaboration, and Corporate Semantic Search,
as envisioned in [23]
Validation and Evaluation
In this technical report, we present prototypical implementations of
innovative tools and methods for personalized and contextualized (multimedia)
search, collaborative ontology evolution, ontology evaluation and cost models,
and dynamic access and trends in distributed (semantic) knowledge, developed
according to the working plan outlined in Technical Report TR-B-12-04. The
prototypes complete the next milestone on the path to an integral Corporate
Semantic Web architecture based on the three pillars Corporate Ontology
Engineering, Corporate Semantic Collaboration, and Corporate Semantic Search,
as envisioned in TR-B-08-09