5 research outputs found

    Enabling entity retrieval by exploiting Wikipedia as a semantic knowledge source

    Get PDF
    This dissertation research, PanAnthropon FilmWorld, aims to demonstrate direct retrieval of entities and related facts by exploiting Wikipedia as a semantic knowledge source, with the film domain as its proof-of-concept domain of application. To this end, a semantic knowledge base concerning the film domain has been constructed with the data extracted/derived from 10,640 Wikipedia pages on films and additional pages on film awards. The knowledge base currently contains 209,266 entities and 2,345,931 entity-centric facts. Both the knowledge base and the corresponding semantic search interface are based on the coherent classification of entities. Entity-centric facts are also consistently represented as tuples. The semantic search interface (http://dlib.ischool.drexel.edu:8080/sofia/PA/) supports multiple types of semantic search functions, which go beyond the traditional keyword-based search function, including the main General Entity Retrieval Query (GERQ) function, which is concerned with retrieving all entities that match the specified entity type, subtype, and semantic conditions and thus corresponds to the main research problem. Two types of evaluation have been performed in order to evaluate (1) the quality of information extraction and (2) the effectiveness of information retrieval using the semantic interface. The first type of evaluation has been performed by inspecting 11,495 film-centric facts concerning 100 films. The results have confirmed high data quality with 99.96% average precision and 99.84% average recall. The second type of evaluation has been performed by conducting an experiment with human subjects. The experiment involved having the subjects perform a retrieval task by using both the PanAnthropon interface and the Internet Movie Database (IMDb) interface and comparing their task performance between the two interfaces. The results have confirmed higher effectiveness of the PanAnthropon interface vs. the IMDb interface (83.11% vs. 40.78% average precision; 83.55% vs. 40.26% average recall). Moreover, the subjects’ responses to the post-task questionnaire indicate that the subjects found the PanAnthropon interface to be highly usable and easily understandable as well as highly effective. The main contribution from this research therefore consists in achieving the set research goal, namely, demonstrating the utility and feasibility of semantics-based direct entity retrieval.Ph.D., Information Studies -- Drexel University, 201

    Special Libraries, December 1976

    Get PDF
    Volume 67, Issue 12https://scholarworks.sjsu.edu/sla_sl_1976/1009/thumbnail.jp

    The fiction problem in public libraries : a study with special reference to Cape Town City Library Service

    Get PDF
    Includes bibliography.The focal point of this study emanates from both personal observations formed in public library branch work and questions raised in the subject literature to the effect that tension appears to exist between the wants of the majority of users and the perception of the dominant goals of the public library by their staff, resulting in differing views as to the book selection policy of this institution. Book selection policies have been taken to reflect the attitudes of library staff towards users' wants in terms of their adherence to the tenets of Anglo-American public library objectives

    Manager’s and citizen’s perspective of positive and negative risks for small probabilities

    Get PDF
    So far „risk‟ has been mostly defined as the expected value of a loss, mathematically PL, being P the probability of an adverse event and L the loss incurred as a consequence of the event. The so called risk matrix is based on this definition. Also for favorable events one usually refers to the expected gain PG, being G the gain incurred as a consequence of the positive event. These “measures” are generally violated in practice. The case of insurances (on the side of losses, negative risk) and the case of lotteries (on the side of gains, positive risk) are the most obvious. In these cases a single person is available to pay a higher price than that stated by the mathematical expected value, according to (more or less theoretically justified) measures. The higher the risk, the higher the unfair accepted price. The definition of risk as expected value is justified in a long term “manager‟s” perspective, in which it is conceivable to distribute the effects of an adverse event on a large number of subjects or a large number of recurrences. In other words, this definition is mostly justified on frequentist terms. Moreover, according to this definition, in two extreme situations (high-probability/low-consequence and low-probability/high-consequence), the estimated risk is low. This logic is against the principles of sustainability and continuous improvement, which should impose instead both a continuous search for lower probabilities of adverse events (higher and higher reliability) and a continuous search for lower impact of adverse events (in accordance with the fail-safe principle). In this work a different definition of risk is proposed, which stems from the idea of safeguard: (1Risk)=(1P)(1L). According to this definition, the risk levels can be considered low only when both the probability of the adverse event and the loss are small. Such perspective, in which the calculation of safeguard is privileged to the calculation of risk, would possibly avoid exposing the Society to catastrophic consequences, sometimes due to wrong or oversimplified use of probabilistic models. Therefore, it can be seen as the citizen‟s perspective to the definition of risk
    corecore