8 research outputs found

    Web knowledge bases

    Get PDF
    Knowledge is key to natural language understanding. References to specific people, places and things in text are crucial to resolving ambiguity and extracting meaning. Knowledge Bases (KBs) codify this information for automated systems — enabling applications such as entity-based search and question answering. This thesis explores the idea that sites on the web may act as a KB, even if that is not their primary intent. Dedicated kbs like Wikipedia are a rich source of entity information, but are built and maintained at an ongoing cost in human effort. As a result, they are generally limited in terms of the breadth and depth of knowledge they index about entities. Web knowledge bases offer a distributed solution to the problem of aggregating entity knowledge. Social networks aggregate content about people, news sites describe events with tags for organizations and locations, and a diverse assortment of web directories aggregate statistics and summaries for long-tail entities notable within niche movie, musical and sporting domains. We aim to develop the potential of these resources for both web-centric entity Information Extraction (IE) and structured KB population. We first investigate the problem of Named Entity Linking (NEL), where systems must resolve ambiguous mentions of entities in text to their corresponding node in a structured KB. We demonstrate that entity disambiguation models derived from inbound web links to Wikipedia are able to complement and in some cases completely replace the role of resources typically derived from the KB. Building on this work, we observe that any page on the web which reliably disambiguates inbound web links may act as an aggregation point for entity knowledge. To uncover these resources, we formalize the task of Web Knowledge Base Discovery (KBD) and develop a system to automatically infer the existence of KB-like endpoints on the web. While extending our framework to multiple KBs increases the breadth of available entity knowledge, we must still consolidate references to the same entity across different web KBs. We investigate this task of Cross-KB Coreference Resolution (KB-Coref) and develop models for efficiently clustering coreferent endpoints across web-scale document collections. Finally, assessing the gap between unstructured web knowledge resources and those of a typical KB, we develop a neural machine translation approach which transforms entity knowledge between unstructured textual mentions and traditional KB structures. The web has great potential as a source of entity knowledge. In this thesis we aim to first discover, distill and finally transform this knowledge into forms which will ultimately be useful in downstream language understanding tasks

    The role of context in image annotation and recommendation

    Get PDF
    With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start)

    A study of interdisciplinary education at M.I.T. : the Concourse Program.

    Get PDF
    Thesis. 1975. Ph.D.--Massachusetts Institute of Technology. Alfred P. Sloan School of Management.Vita.Bibliography: leaves 139-142.Ph.D

    Speaking of Diversity

    Get PDF
    Originally published in 1992. In this collection of essays, Philip Gleason explores the different linguistic tools that American scholars have used to write about ethnicity in the United States and analyzes how various vocabularies have played out in the political sphere. In doing this, he reveals tensions between terms used by academic groups and those preferred by the people whom the academics discuss. Gleason unpacks words and phrases—such as melting pot and plurality—used to visualize the multitude of ethnicities in the United States. And he examines debates over concepts such as "assimilation," "national character," "oppressed group," and "people of color." Gleason advocates for greater clarity of these concepts when discussed in America's national political arena. Gleason's essays are grouped into three parts. Part 1 focuses on linguistic analyses of specific terms. Part 2 examines the effect of World War II on national identity and American thought about diversity and intergroup relations. Part 3 discusses discourse on the diversity of religions. This collection of eleven essays sharpens our historical understanding of the evolution of language used to define diversity in twentieth-century America

    Marcus Dods : with special reference to his teaching ministry

    Get PDF
    Today -when the name of Marcus Dods is mentioned the first thought that comes to the minds of many is one that associates him either with a long probation or a heresy charge. Yet few facts pertaining to either of these experiences in his life are known. This is not surprising when we realize that a biography of this prominent Scotsman of the past has never been attempted nor is there much information about his life and work available in such volumes as the Dictionary of National Biography. It is the purpose of this study not only to shed light upon and interpret the significance of these two aspects of Dods' career, but also to focus attention upon his teaching ministry as the unifying feature of his life. It is in this way that we can determine his particular contributions to the nineteenth century church in Scotland.A complete account of the life of Marcus Dods has never been written. During his lifetime, various resumes were published in periodicals, but these' accounts lacked accuracy and detail. The most important period of his life—the probation years—remained concealed throughout the nineteenth century because Dods' own silence regarding his probationmeant that no reliable information was available prior to the posthumous publication of his early letters in 1910. Therefore, this chapter is devoted to his early life, with the most detailed treatment being given to his hitherto little known and little understood probation years.His own letters and other writings supplied the major sources from which this chapter was drawn. Extensive use was also made of numerous periodicals of the late nineteenth century.Although the full extent of his influence is not well known today, he made a deep impression upon the age in which he lived. His contributions were great measured in reference to the needs of his time. But his personal greatness consisted not so much in any special brilliance of talent or achievement as in the superlative degree in which he exhibited the qualities of human character and Christian faith which are open to all men

    Introduction to speech communication

    Get PDF
    Introduction to Speech Communication is used to support teaching, learning and research for SPCH 2713 at Oklahoma State University (OSU). In addition to inclusion of original work authored by the editors to meet the needs of their course at OSU, the editors adapted portions of Exploring Public Speaking: 4th Edition, Stand Up, Speak Out, and Fundamentals of Public Speaking. Please see the Acknowledgements chapter for full citations. We at Oklahoma State University Libraries acknowledge our gratitude for the expertise and generosity of the scholars at Affordable Learning Georgia, College of the Canyons, the Open Education Network and elsewhere for creating and sharing customizable versions of their work

    International self-report delinquency (ISRD4) study protocol: background, methodology, and mandatory items for the 2021/2022 survey

    Get PDF
    This document describes the background and methodology of the fourth round of the International Self-Report Delinquency study (ISRD4). Drawing from the fields of criminology, public health and cross-national methodology, the ISRD is an ongoing multi-national research study that aims to describe and explain adolescents’ experiences with crime and victimization, to test criminological theories, and to develop recommendations for prevention and interventions. The project relies on a common research protocol, which standardizes questionnaire content and administration, and prescribes comparable sampling procedures in participating countries enabling the collection of common data across all of them. The ISRD4 Study Protocol describes the standard sections of the ISRD4 questionnaire (core and sweep-specific), for both the school-based as well as the internet-based samples. In addition to the core ISRD items, the ISRD4 questionnaire includes new items related to cyber-offending and –victimization, discrimination, and perceptions of violence and revenge motives. The protocol also describes the rationale for including an internet-based survey as a complement to the school-based survey. The document aims to provide a detailed set of guidelines for participating national teams but will also be of interest to researchers interested in youth victimization and offending, theory-testing, and cross-national methodology. Fieldwork in approximately 40 countries began in 2020 and will conclude by the end of 2022
    corecore