275 research outputs found

    Discovering Structural Similarities in Narrative Texts using Event Alignment Algorithms

    Get PDF
    This thesis is about the discovery of structural similarities across narrative texts. We will describe a method that is based on event alignments created automatically on automatically preprocessed texts. This opens up a path to large-scale empirical research on structural similarities across texts. Structural similarities are of interest for many areas in the humanities and social sciences. We will focus on folkloristics and research of rituals as application scenarios. Folkloristics researches folktales, i.e., tales that have been passed down orally for a long time. Similarities across different folktales have been observed, both at the level of individual events (being abandoned in the woods) or participants (the gingerbread house) and structurally: Events do not happen at random, but in a certain order. Rituals are an omnipresent part of human behavior and are studied in ethnology, social sciences and history. Similarities across types of rituals have been observed and sparked a discussion about structural principles that govern the combination of individual ritual elements to rituals. As descriptions of rituals feature a lot of uncommon language constructions, we will also discuss methods of domain adaptation in order to adapt existing NLP components to the domain of rituals. We will mainly use supervised methods and employ retraining as a means for adaptation. This presupposes annotating small amounts of domain data. We will be discussing the following linguistic levels: Part of speech, chunking, dependency parsing, word sense disambiguation, semantic role labeling and coreference resolution. On all levels, we have achieved improvements. We will also describe how these annotation levels are brought together in a single, integrated discourse representation that is the basis for further experiments. In order to discover structural similarities, we employ three different alignment algorithms and use them to align semantically similar events. Sequence alignment (Needleman-Wunsch) is a classic algorithm with limited capabilities. A graph-based event alignment system that has been developed for newspaper texts will be used in comparison. As a third algorithm, we employ Bayesian model merging, which induces a hidden Markov model, from which we extract an alignment. We will evaluate the algorithms in two experiments. In the first experiment, we evaluate against a gold standard of aligned descriptions of rituals. Bayesian model merging achieves the best results, measured using the Blanc metric. Due to difficulties in creating an event alignment gold standard, the second experiment is based on cluster induction. Although this is not a strict evaluation of structural similarities, it gives some insight into the behavior of the algorithms. We induce a document similarity measure from the generated alignments and use this measure to cluster the documents. The clustering is then compared against a gold standard classification of documents from both scenarios. In this experiment, the lemma alignment baseline achieves the best numerical performance on folktales (but as it aligns lemmas instead of event representations, its expressiveness is limited), followed by predicate alignment, Bayesian model merging and Needleman-Wunsch. On descriptions of rituals, the predicate alignment algorithm outperforms all shallow and more specialized baselines. Shallow measures of semantic similarities of texts outperform the alignment-based algorithms on folktales, but they do not allow the exact localization of similarities. Finally, we present a graph-based algorithm that ranks events according to their participation in structurally similar regions across documents. This allows us to direct researchers from humanities to interesting cases, which are worth manual inspection. Because in digital humanities scenarios, the accessibility of results to researchers from humanities is of utmost importance, we close the thesis with a showcase scenario in which we analyze descriptions of rituals using the alignment, clustering and event ranking algorithms we have described before. We will show in this showcase how results can be visualized and interpreted by researchers of rituals

    Integrating Cultural Knowledge into Artificially Intelligent Systems: Human Experiments and Computational Implementations

    Get PDF
    With the advancement of Artificial Intelligence, it seems as if every aspect of our lives is impacted by AI in one way or the other. As AI is used for everything from driving vehicles to criminal justice, it becomes crucial that it overcome any biases that might hinder its fair application. We are constantly trying to make AI be more like humans. But most AI systems so far fail to address one of the main aspects of humanity: our culture and the differences between cultures. We cannot truly consider AI to have understood human reasoning without understanding culture. So it is important for cultural information to be embedded into AI systems in some way, as well as for the AI systems to understand the differences across these cultures. The main way I have chosen to do this are using two cultural markers: motifs and rituals. This is because they are both so inherently part of any culture. Motifs are things that are repeated often and are grounded in well-known stories, and tend to be very specific to individual cultures. Rituals are something that are part of every culture in some way, and while there are some that are constant across all cultures, some are very specific to individual ones. This makes them great to compare and to contrast. The first two parts of this dissertation talk about a couple of cognitive psychology studies I conducted. The first is to see how people understood motifs. Is is true that in-culture people identify motifs better than out-culture people? We see that my study shows this to indeed be the case. The second study attempts to test if motifs are recognizable in texts, regardless of whether or not people might understand their meaning. Our results confirm our hypothesis that motifs are recognizable. The third part of my work discusses the survey and data collection effort around rituals. I collected data about rituals from people from various national groups, and observed the differences in their responses. The main results from this was twofold: first, that cultural differences across groups are quantifiable, and that they are prevalent and observable with proper effort; and second, to collect and curate a substantial culturally sensitive dataset that can have a wide variety of use across various AI systems. The fourth part of the dissertation focuses on a system I built, called the motif association miner, which provides information about motifs present in input text, like associations, sources of motifs, connotations, etc. This information will be highly useful as this will enable future systems to use my output as input for their systems, and have a better understanding of motifs, especially as this shows an approach of bringing out meaning of motifs specific to certain culture to wider usage. As the final contribution, this thesis details my efforts to use the curated ritual data to improve existing Question Answering system, and show that this method helps systems perform better in situations which vary by culture. This data and approach, which will be made publicly available, will enable others in the field to take advantage of the information contained within to try and combat some bias in their systems

    Judicial decision-making and extra-legal influences: Neurolinguistic Programming as a candidate framework to understand persuasion in the legal context

    Get PDF
    Jurister försöker pĂ„verka rĂ€ttsliga beslutsprocesser med hjĂ€lp av övertalning, men den befintliga litteraturen om övertalning i rĂ€ttssalen Ă€r förvĂ„nansvĂ€rt begrĂ€nsad med fokus pĂ„ enskilda tekniker i isolering; inga omfattande integrerade ramverk finns tillgĂ€ngliga. Vi föreslĂ„r en populĂ€r kommersiell metod för övertalning, Neurolingvistisk Programmering (NLP), som startpunkt för att utveckla en modell som kan fylla detta gap. Först presenterar vi en bred analys av rĂ€ttsliga beslutsprocesser och utomrĂ€ttsliga faktorer som pĂ„verkar dem. DĂ€refter utsĂ€tter vi centrala aspekter av NLP för noggrann granskning. Slutligen syntetiserar vi dessa trĂ„dar i en mĂ„ngfacetterad bedömning av NLPs potentiella anvĂ€ndbarhet som ett omfattande och integrerat ramverk för att förstĂ„ och beskriva juristers övertalningsprocesser i rĂ€ttssalen. Vi hĂ€vdar att NLP kan beskriva dessa beteenden och strategier bĂ„de genom en sjĂ€lvreflexiv logik, som ett resultat av dess breda inflytande, men ocksĂ„ för mer generella övertalningsprocesser tack vare ett stort antal överensstĂ€mmelser mellan NLP-begrepp och resultat frĂ„n vetenskaplig litteratur. Även om dessa överensstĂ€mmelser Ă€r ytliga, tyder det faktum att NLP integrerar sina förenklade koncept i ett sammanhĂ„llet ramverk, som spĂ€nner argumentations- och presentations-dimensioner för övertalning, att det förhĂ„llandevis enkelt kan anpassas till en praktisk modell för att beskriva och förstĂ„ övertalning i rĂ€ttssalen. Vidare forskning Ă€r indikerad.Trial advocates seek to influence the outcomes of judicial decision-making processes using persuasion, but the existing literature regarding persuasion in the courtroom is surprisingly piecemeal, focusing on individual techniques in isolation; no comprehensive frameworks for integrating these techniques, or for systematically analyzing advocates’ attempts to enact persuasion in the courtroom, have been developed. We propose a popular commercial technology for persuasion, Neurolinguistic Programming (NLP), as a candidate framework that might be modified and adapted to fill this gap. First we present a wide-ranging, discursive analysis of judicial decision-making processes and extra-legal factors that influence them. Next, core aspects of NLP theory are subjected to careful examination. Finally, these threads are synthesized into a multifaceted assessment of NLP’s potential utility as a comprehensive and integrative framework for understanding and describing how litigators enact persuasion in the courtroom. We argue that NLP can describe these behaviors and strategies both by way of a self-reflexive logic resulting from its popular influence, but also as a more general, context independent model by virtue of a large number of correspondences between NLP concepts and findings from the scholarly literature. Although these correspondences are superficial, the fact that NLP integrates its simplified, folk concepts into a coherent framework spanning argumentative and presentational dimensions of persuasion suggests that it might readily be adapted into a useful descriptive model for understanding persuasion in the courtroom. Further scholarly attention is indicated

    Ontology-based approach to semantically enhanced question answering for closed domain: a review

    Get PDF
    Abstract: For many users of natural language processing (NLP), it can be challenging to obtain concise, accurate and precise answers to a question. Systems such as question answering (QA) enable users to ask questions and receive feedback in the form of quick answers to questions posed in natural language, rather than in the form of lists of documents delivered by search engines. This task is challenging and involves complex semantic annotation and knowledge representation. This study reviews the literature detailing ontology-based methods that semantically enhance QA for a closed domain, by presenting a literature review of the relevant studies published between 2000 and 2020. The review reports that 83 of the 124 papers considered acknowledge the QA approach, and recommend its development and evaluation using different methods. These methods are evaluated according to accuracy, precision, and recall. An ontological approach to semantically enhancing QA is found to be adopted in a limited way, as many of the studies reviewed concentrated instead on NLP and information retrieval (IR) processing. While the majority of the studies reviewed focus on open domains, this study investigates the closed domain

    Pattern-based design applied to cultural heritage knowledge graphs

    Full text link
    Ontology Design Patterns (ODPs) have become an established and recognised practice for guaranteeing good quality ontology engineering. There are several ODP repositories where ODPs are shared as well as ontology design methodologies recommending their reuse. Performing rigorous testing is recommended as well for supporting ontology maintenance and validating the resulting resource against its motivating requirements. Nevertheless, it is less than straightforward to find guidelines on how to apply such methodologies for developing domain-specific knowledge graphs. ArCo is the knowledge graph of Italian Cultural Heritage and has been developed by using eXtreme Design (XD), an ODP- and test-driven methodology. During its development, XD has been adapted to the need of the CH domain e.g. gathering requirements from an open, diverse community of consumers, a new ODP has been defined and many have been specialised to address specific CH requirements. This paper presents ArCo and describes how to apply XD to the development and validation of a CH knowledge graph, also detailing the (intellectual) process implemented for matching the encountered modelling problems to ODPs. Relevant contributions also include a novel web tool for supporting unit-testing of knowledge graphs, a rigorous evaluation of ArCo, and a discussion of methodological lessons learned during ArCo development

    Populating the semantic web: combining text and relational databases as RDF graphs

    Get PDF
    The Semantic Web promises a way of linking distributed information at a granular level by interconnecting compact data items instead of complete HTML pages. New data is gradually being added to the SemanticWeb but there is a need to incorporate existing knowledge. This thesis explores ways to convert a coherent body of information from various structured and unstructured formats into the necessary graph form. The transformation work crosses several currently active disciplines, and there are further research questions that can be addressed once the graph has been built. Hybrid databases, such as the cultural heritage one used here, consist of structured relational tables associated with free text documents. Access to the data is hampered by complex schemas, confusing terminology and difficulties in searching the text effectively. This thesis describes how hybrid data can be unified by assembly into a graph. A major component task is the conversion of relational database content to RDF. This is an active research field, to which this work contributes by examining weaknesses in some existing methods and proposing alternatives. The next significant element of the work is an attempt to extract structure automatically from English text using natural language processing methods. The first claim made is that the semantic content of the text documents can be adequately captured as a set of binary relations forming a directed graph. It is shown that the data can then be grounded using existing domain thesauri, by building an upper ontology structure from these. A schema for cultural heritage data is proposed, intended to be generic for that domain and as compact as possible. Another hypothesis is that use of a graph will assist retrieval. The structure is uniform and very simple, and the graph can be queried even if the predicates (or edge labels) are unknown. Additional benefits of the graph structure are examined, such as using path length between nodes as a measure of relatedness (unavailable in a relational database where there is no equivalent concept of locality), and building information summaries by grouping the attributes of nodes that share predicates. These claims are tested by comparing queries across the original and the new data structures. The graph must be able to answer correctly queries that the original database dealt with, and should also demonstrate valid answers to queries that could not previously be answered or where the results were incomplete

    Digital Classical Philology

    Get PDF
    The buzzwords “Information Society” and “Age of Access” suggest that information is now universally accessible without any form of hindrance. Indeed, the German constitution calls for all citizens to have open access to information. Yet in reality, there are multifarious hurdles to information access – whether physical, economic, intellectual, linguistic, political, or technical. Thus, while new methods and practices for making information accessible arise on a daily basis, we are nevertheless confronted by limitations to information access in various domains. This new book series assembles academics and professionals in various fields in order to illuminate the various dimensions of information's inaccessability. While the series discusses principles and techniques for transcending the hurdles to information access, it also addresses necessary boundaries to accessability.This book describes the state of the art of digital philology with a focus on ancient Greek and Latin. It addresses problems such as accessibility of information about Greek and Latin sources, data entry, collection and analysis of Classical texts and describes the fundamental role of libraries in building digital catalogs and developing machine-readable citation systems

    Contextual Social Networking

    Get PDF
    The thesis centers around the multi-faceted research question of how contexts may be detected and derived that can be used for new context aware Social Networking services and for improving the usefulness of existing Social Networking services, giving rise to the notion of Contextual Social Networking. In a first foundational part, we characterize the closely related fields of Contextual-, Mobile-, and Decentralized Social Networking using different methods and focusing on different detailed aspects. A second part focuses on the question of how short-term and long-term social contexts as especially interesting forms of context for Social Networking may be derived. We focus on NLP based methods for the characterization of social relations as a typical form of long-term social contexts and on Mobile Social Signal Processing methods for deriving short-term social contexts on the basis of geometry of interaction and audio. We furthermore investigate, how personal social agents may combine such social context elements on various levels of abstraction. The third part discusses new and improved context aware Social Networking service concepts. We investigate special forms of awareness services, new forms of social information retrieval, social recommender systems, context aware privacy concepts and services and platforms supporting Open Innovation and creative processes. This version of the thesis does not contain the included publications because of copyrights of the journals etc. Contact in terms of the version with all included publications: Georg Groh, [email protected] zentrale Gegenstand der vorliegenden Arbeit ist die vielschichtige Frage, wie Kontexte detektiert und abgeleitet werden können, die dazu dienen können, neuartige kontextbewusste Social Networking Dienste zu schaffen und bestehende Dienste in ihrem Nutzwert zu verbessern. Die (noch nicht abgeschlossene) erfolgreiche Umsetzung dieses Programmes fĂŒhrt auf ein Konzept, das man als Contextual Social Networking bezeichnen kann. In einem grundlegenden ersten Teil werden die eng zusammenhĂ€ngenden Gebiete Contextual Social Networking, Mobile Social Networking und Decentralized Social Networking mit verschiedenen Methoden und unter Fokussierung auf verschiedene Detail-Aspekte nĂ€her beleuchtet und in Zusammenhang gesetzt. Ein zweiter Teil behandelt die Frage, wie soziale Kurzzeit- und Langzeit-Kontexte als fĂŒr das Social Networking besonders interessante Formen von Kontext gemessen und abgeleitet werden können. Ein Fokus liegt hierbei auf NLP Methoden zur Charakterisierung sozialer Beziehungen als einer typischen Form von sozialem Langzeit-Kontext. Ein weiterer Schwerpunkt liegt auf Methoden aus dem Mobile Social Signal Processing zur Ableitung sinnvoller sozialer Kurzzeit-Kontexte auf der Basis von Interaktionsgeometrien und Audio-Daten. Es wird ferner untersucht, wie persönliche soziale Agenten Kontext-Elemente verschiedener Abstraktionsgrade miteinander kombinieren können. Der dritte Teil behandelt neuartige und verbesserte Konzepte fĂŒr kontextbewusste Social Networking Dienste. Es werden spezielle Formen von Awareness Diensten, neue Formen von sozialem Information Retrieval, Konzepte fĂŒr kontextbewusstes Privacy Management und Dienste und Plattformen zur UnterstĂŒtzung von Open Innovation und KreativitĂ€t untersucht und vorgestellt. Diese Version der Habilitationsschrift enthĂ€lt die inkludierten Publikationen zurVermeidung von Copyright-Verletzungen auf Seiten der Journals u.a. nicht. Kontakt in Bezug auf die Version mit allen inkludierten Publikationen: Georg Groh, [email protected]
    • 

    corecore