33,943 research outputs found

    WIKINGER – Wiki Next Generation Enhanced Repositories

    No full text
    The regular indexing of text documents is based on the textual representation and does not evaluate the actual document content. In the semantic web approach, human-authored text documents are transformed into machine-readable content data which can be used to create semantic relations among documents. In this paper, we present on-going work in the WIKINGER project which aims to build a web-based system for semantic indexing of text documents by evaluating manual and semi-automatic annotations. A particular feature is the continuous refinement of the automatically generated semantic network by considering community feedback. The feasibility of the approach will be validated in a pilot application

    Semantic heterogeneity: comparing new semantic web approaches with those of digital libraries

    Full text link
    To demonstrate that newer developments in the semantic web community, particularly those based on ontologies (simple knowledge organization system and others) mitigate common arguments from the digital library (DL) community against participation in the Semantic web. The approach is a semantic web discussion focusing on the weak structure of the Web and the lack of consideration given to the semantic content during indexing. The points criticised by the semantic web and ontology approaches are the same as those of the DL ‘‘Shell model approach’’ from the mid-1990s, with emphasis on the centrality of its heterogeneity components (used, for example, in vascoda). The Shell model argument began with the ‘‘invisible web’’, necessitating the restructuring of DL approaches. The conclusion is that both approaches fit well together and that the Shell model, with its semantic heterogeneity components, can be reformulated on the semantic web basis. A reinterpretation of the DL approaches of semantic heterogeneity and adapting to standards and tools supported by the W3C should be the best solution. It is therefore recommended that – although most of the semantic web standards are not technologically refined for commercial applications at present – all individual DL developments should be checked for their adaptability to the W3C standards of the semantic web. A unique conceptual analysis of the parallel developments emanating from the digital library and semantic web communities. (author's abstract

    An Approach and an Eclipse Based Environment for Enhancing the Navigation Structure of Web Sites

    Get PDF
    This paper presents an approach based on information retrieval and clustering techniques for automatically enhancing the navigation structure of a Web site for improving navigability. The approach increments the set of navigation links provided in each page of the site with a semantic navigation map, i.e., a set of links enabling navigating from a given page to other pages of the site showing similar or related content. The approach uses Latent Semantic Indexing to compute a dissimilarity measure between the pages of the site and a graph-theoretic clustering algorithm to group pages showing similar or related content according to the calculated dissimilarity measure. AJAX code is finally used to extend each Web page with an associated semantic navigation map. The paper also presents a prototype of a tool developed to support the approach and the results from a case study conducted to assess the validity and feasibility of the proposal

    Just an Update on PMING Distance for Web-based Semantic Similarity in Artificial Intelligence and Data Mining

    Full text link
    One of the main problems that emerges in the classic approach to semantics is the difficulty in acquisition and maintenance of ontologies and semantic annotations. On the other hand, the Internet explosion and the massive diffusion of mobile smart devices lead to the creation of a worldwide system, which information is daily checked and fueled by the contribution of millions of users who interacts in a collaborative way. Search engines, continually exploring the Web, are a natural source of information on which to base a modern approach to semantic annotation. A promising idea is that it is possible to generalize the semantic similarity, under the assumption that semantically similar terms behave similarly, and define collaborative proximity measures based on the indexing information returned by search engines. The PMING Distance is a proximity measure used in data mining and information retrieval, which collaborative information express the degree of relationship between two terms, using only the number of documents returned as result for a query on a search engine. In this work, the PMINIG Distance is updated, providing a novel formal algebraic definition, which corrects previous works. The novel point of view underlines the features of the PMING to be a locally normalized linear combination of the Pointwise Mutual Information and Normalized Google Distance. The analyzed measure dynamically reflects the collaborative change made on the web resources

    Constructing Large-Scale Semantic Web Indices for the Six RDF Collation Orders

    Get PDF
    The Semantic Web community collects masses of valuable and publicly available RDF data in order to drive the success story of the Semantic Web. Efficient processing of these datasets requires their indexing. Semantic Web indices make use of the simple data model of RDF: The basic concept of RDF is the triple, which hence has only 6 different collation orders. On the one hand having 6 collation orders indexed fast merge joins (consuming the sorted input of the indices) can be applied as much as possible during query processing. On the other hand constructing the indices for 6 different collation orders is very time-consuming for large-scale datasets. Hence the focus of this paper is the efficient Semantic Web index construction for large-scale datasets on today's multi-core computers. We complete our discussion with a comprehensive performance evaluation, where our approach efficiently constructs the indices of over 1 billion triples of real world data

    Semantic Indexing and Retrieval based on Formal Concept Analysis

    Get PDF
    Semantic indexing and retrieval has become an important research area, as the available amount of information on the Web is growing more and more. In this paper, we introduce an original approach to semantic indexing and retrieval based on Formal Concept Analysis. The concept lattice is used as a semantic index and we propose an original algorithm for traversing the lattice and answering user queries. This framework has been used and evaluated on song datasets

    Colour Text Segmentation in Web Images Based on Human Perception

    No full text
    There is a significant need to extract and analyse the text in images on Web documents, for effective indexing, semantic analysis and even presentation by non-visual means (e.g., audio). This paper argues that the challenging segmentation stage for such images benefits from a human perspective of colour perception in preference to RGB colour space analysis. The proposed approach enables the segmentation of text in complex situations such as in the presence of varying colour and texture (characters and background). More precisely, characters are segmented as distinct regions with separate chromaticity and/or lightness by performing a layer decomposition of the image. The method described here is a result of the authors’ systematic approach to approximate the human colour perception characteristics for the identification of character regions. In this instance, the image is decomposed by performing histogram analysis of Hue and Lightness in the HLS colour space and merging using information on human discrimination of wavelength and luminance

    Development of an Enhanced Knowledge Retrieval System Using Web 2.0 Technology and Vector Space Model

    Get PDF
    There is an increasing pool of information on the web and a major contributor is web 2.0 technology on which social media is based. Searching for specific information in this pool is always tasking, therefore, the need to harness this information as a means of enhancing retrieval and reuse of relevant ones. Some researches and development have been carried out in the field of Knowledge Retrieval using Vector Space Model (VSM) and Latent Semantic Indexing (LSI), but the approach used is based on large pool of information available online, which makes getting most relevant information relatively difficult at the point of retrieval, this is a major setback. Collaborations on Facebook and Twitter (web 2.0 technology) were harvested using APIs and stored in the Knowledge Repository, The collaboration on social media served as the source of information in the Knowledge Repository. An Enhanced Knowledge Retrieval System (EKRS) applying VSM was developed and implemented. The use of VSM was to calculate the Cosine Similarity and Term Frequency to aid effective retrieval of relevant documents from the repository based on user’s needs. In this project, we were able to achieve the aim of retrieving relevant documents. EKRS was able to employ both web 2.0 and VSM to meet specific user’s information needs. Keywords: web 2.0, Knowledge retrieval, Vector Space Model, Latent Semantic Indexing, Knowledge Repository, Cosine Similarity and Term Frequency
    corecore