30 research outputs found

    Ontology selection for reuse: Will it ever get easier?

    Get PDF
    Ontologists and knowledge engineers tend to examine different aspects of ontologies when assessing their suitability for reuse. However, most of the evaluation metrics and frameworks introduced in the literature are based on a limited set of internal characteristics of ontologies and dismiss how the community uses and evaluates them. This paper used a survey questionnaire to explore, clarify and also confirm the importance of the set of quality related metrics previously found in the literature and an interview study. According to the 157 responses collected from ontologists and knowledge engineers, the process of ontology selection for reuse depends on different social and community related metrics and metadata. We believe that the findings of this research can contribute to facilitating the process of selecting an ontology for reuse

    Entity Ranking in Wikipedia

    Get PDF
    The traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named entities; we are interested in entity ranking in the field of information retrieval. In this paper, we describe our approach to identifying and ranking entities from the INEX Wikipedia document collection. Wikipedia offers a number of interesting features for entity identification and ranking that we first introduce. We then describe the principles and the architecture of our entity ranking system, and introduce our methodology for evaluation. Our preliminary results show that the use of categories and the link structure of Wikipedia, together with entity examples, can significantly improve retrieval effectiveness.Comment: to appea

    Automatic Concept Extraction in Semantic Summarization Process

    Get PDF
    The Semantic Web offers a generic infrastructure for interchange, integration and creative reuse of structured data, which can help to cross some of the boundaries that Web 2.0 is facing. Currently, Web 2.0 offers poor query possibilities apart from searching by keywords or tags. There has been a great deal of interest in the development of semantic-based systems to facilitate knowledge representation and extraction and content integration [1], [2]. Semantic-based approach to retrieving relevant material can be useful to address issues like trying to determine the type or the quality of the information suggested from a personalized environment. In this context, standard keyword search has a very limited effectiveness. For example, it cannot filter for the type of information, the level of information or the quality of information. Potentially, one of the biggest application areas of content-based exploration might be personalized searching framework (e.g., [3],[4]). Whereas search engines provide nowadays largely anonymous information, new framework might highlight or recommend web pages related to key concepts. We can consider semantic information representation as an important step towards a wide efficient manipulation and retrieval of information [5], [6], [7]. In the digital library community a flat list of attribute/value pairs is often assumed to be available. In the Semantic Web community, annotations are often assumed to be an instance of an ontology. Through the ontologies the system will express key entities and relationships describing resources in a formal machine-processable representation. An ontology-based knowledge representation could be used for content analysis and object recognition, for reasoning processes and for enabling user-friendly and intelligent multimedia content search and retrieval. Text summarization has been an interesting and active research area since the 60’s. The definition and assumption are that a small portion or several keywords of the original long document can represent the whole informatively and/or indicatively. Reading or processing this shorter version of the document would save time and other resources [8]. This property is especially true and urgently needed at present due to the vast availability of information. Concept-based approach to represent dynamic and unstructured information can be useful to address issues like trying to determine the key concepts and to summarize the information exchanged within a personalized environment. In this context, a concept is represented with a Wikipedia article. With millions of articles and thousands of contributors, this online repository of knowledge is the largest and fastest growing encyclopedia in existence. The problem described above can then be divided into three steps: • Mapping of a series of terms with the most appropriate Wikipedia article (disambiguation). • Assigning a score for each item identified on the basis of its importance in the given context. • Extraction of n items with the highest score. Text summarization can be applied to many fields: from information retrieval to text mining processes and text display. Also in personalized searching framework text summarization could be very useful. The chapter is organized as follows: the next Section introduces personalized searching framework as one of the possible application areas of automatic concept extraction systems. Section three describes the summarization process, providing details on system architecture, used methodology and tools. Section four provides an overview about document summarization approaches that have been recently developed. Section five summarizes a number of real-world applications which might benefit from WSD. Section six introduces Wikipedia and WordNet as used in our project. Section seven describes the logical structure of the project, describing software components and databases. Finally, Section eight provides some consideration..

    Algorithms for Recollection of Search Terms Based on the Wikipedia Category Structure

    Get PDF
    The common user interface for a search engine consists of a text field where the user can enter queries consisting of one or more keywords. Keyword query based search engines work well when the users have a clear vision what they are looking for and are capable of articulating their query using the same terms as indexed. For our multimedia database containing 202,868 items with text descriptions, we supplement such a search engine with a category-based interface whose category structure is tailored to the content of the database. This facilitates browsing and offers the users the possibility to look for named entities, even if they forgot their names. We demonstrate that this approach allows users who fail to recollect the name of named entities to retrieve data with little effort. In all our experiments, it takes 1 query on a category and on average 2.49 clicks, compared to 5.68 queries on the database’s traditional text search engine for a 68.3% success probability or 6.01 queries when the user also turns to Google, for a 97.1% success probability

    Ontology evaluation approach for semantic web documents

    Get PDF
    Ontology is a conceptual tool used for managing and capturing information related to domain knowledge, such as the travel, education and medical domains. Publicly available ontology repositories like Falcons and SWOOGLE enhance the growth of ontology on the Web by providing a medium for ontology developers to publish their ontologies. In order to promote ontology reuse, a suitable approach for ontology evaluation is required that deals with ontology coverage for domain representation which includes an approach for validating the ontology with a corpus of information containing terms related to domain knowledge. Since contributions in ontology evaluation were introduced in different aspects, it is important to conceptualise related information to build an evaluation approach that can help users to select ontology. This work proposed OntoUji, an ontology that conceptualises information related to ontology evaluation. From OntoUji conceptualisation, these works proceed with the development of evaluation steps that are then converted into ontology evaluation algorithms to evaluate ontology documents retrieved from selected repositories according to data-driven evaluation approach. The data-driven approach focuses on evaluating the coverage of ontology using a set of keywords provided, yet similarly involves a comparison of ontological vocabulary with a pre-defined corpus, WordNet, gained from the information retrieval approach. The evaluation is then processed using Letters Pair Similarity algorithm as the selected similarity measures technique to process the ontology coverage result. The findings showed that the OntoUji ontology conceptualization helps to define ontology evaluation steps to gain similarity result for ontology selection

    Using Wikipedia Categories and Links in Entity Ranking

    Get PDF
    This paper describes the participation of the INRIA group in the INEX 2007 XML entity ranking and ad hoc tracks. We developed a system for ranking Wikipedia entities in answer to a query. Our approach utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the examples (when provided) to improve the effectiveness of entity ranking. Our experiments on the training data set demonstrate that the use of categories and the link structure of Wikipedia, together with entity examples, can significantly improve entity retrieval effectiveness. We also use our system for the ad hoc tasks by inferring target categories from the title of the query. The results were worse than when using a full-text search engine, which confirms our hypothesis that ad hoc retrieval and entity retrieval are two different tasks

    Leveraging the Wisdom of the Crowd to Address Societal Challenges: Revisiting the Knowledge Reuse for Innovation Process through Analytics

    Get PDF
    Societal challenges can be addressed not only by experts but also by crowds. Crowdsourcing provides a way to engage a crowd to contribute to the solutions of some of the biggest challenges of our era: how to cut our carbon footprint, how to address worldwide epidemic of chronic disease, and how to achieve sustainable development. Isolated crowd-based solutions in online communities are not always creative and innovative. Hence, remixing has been developed as a way to enable idea evolution and integration, and to harness reusable innovative solutions. Understanding the generativity of remixing is essential to leveraging the wisdom of the crowd to solve societal challenges. At its best, remixing can promote online community engagement, as well as support comprehensive and innovative solution generation. Organizers can maintain an active online community, community members can collectively innovate and learn, and, as a result, society can find new ways to solve important problems. We address what affects the generativity of a remix by revisiting the knowledge reuse for innovation process model. We analyze the reuse of proposals in Climate CoLab, an online innovation community that aims to address global climate change issues. Our application of several analytical methods to study factors that may contribute to the generativity of a remix reveals that remixes that include prevalent topics and integration metaknowledge are more generative. We conclude by suggesting strategies and tools that can help online communities better harness collective intelligence for addressing societal challenges

    Evaluating hierarchical organisation structures for exploring digital libraries

    Get PDF
    Search boxes providing simple keyword-based search are insufficient when users have complex information needs or are unfamiliar with a collection, for example in large digital libraries. Browsing hierarchies can support these richer interactions, but many collections do not have a suitable hierarchy available. In this paper we present a number of approaches for automatically creating hierarchies and mapping items into them, including a novel technique which automatically adapts a Wikipedia-based taxonomy to the target collection. These approaches are applied to a large collection of cultural heritage items which is formed through the aggregation of other collections and for which no unified hierarchy is available. We investigate a number of novel user-evaluated metrics to quantify the hierarchies’ quality and performance, showing that the proposed technique is preferred by users. From this we draw a number of conclusions as to what makes a hierarchy useful to the user
    corecore