72 research outputs found

    Knowledge Patterns for the Web: extraction, tranformation and reuse

    Get PDF
    This thesis aims at investigating methods and software architectures for discovering what are the typical and frequently occurring structures used for organizing knowledge in the Web. We identify these structures as Knowledge Patterns (KPs). KP discovery needs to address two main research problems: the heterogeneity of sources, formats and semantics in the Web (i.e., the knowledge soup problem) and the difficulty to draw relevant boundary around data that allows to capture the meaningful knowledge with respect to a certain context (i.e., the knowledge boundary problem). Hence, we introduce two methods that provide different solutions to these two problems by tackling KP discovery from two different perspectives: (i) the transformation of KP-like artifacts to KPs formalized as OWL2 ontologies; (ii) the bottom-up extraction of KPs by analyzing how data are organized in Linked Data. The two methods address the knowledge soup and boundary problems in different ways. The first method provides a solution to the two aforementioned problems that is based on a purely syntactic transformation step of the original source to RDF followed by a refactoring step whose aim is to add semantics to RDF by select meaningful RDF triples. The second method allows to draw boundaries around RDF in Linked Data by analyzing type paths. A type path is a possible route through an RDF that takes into account the types associated to the nodes of a path. Then we present K~ore, a software architecture conceived to be the basis for developing KP discovery systems and designed according to two software architectural styles, i.e, the Component-based and REST. Finally we provide an example of reuse of KP based on Aemoo, an exploratory search tool which exploits KPs for performing entity summarization

    Language representations for computational argumentation

    Full text link
    Argumentation is an essential feature and, arguably, one of the most exciting phenomena of natural language use. Accordingly, it has fascinated scholars and researchers in various fields, such as linguistics and philosophy, for long. Its computational analysis, falling under the notion of computational argumentation, is useful in a variety of domains of text for a range of applications. For instance, it can help to understand users’ stances in online discussion forums towards certain controversies, to provide targeted feedback to users for argumentative writing support, and to automatically summarize scientific publications. As in all natural language processing pipelines, the text we would like to analyze has to be introduced to computational argumentation models in the form of numeric features. Choosing such suitable semantic representations is considered a core challenge in natural language processing. In this context, research employing static and contextualized pretrained text embedding models has recently shown to reach state-of-the-art performances for a range of natural language processing tasks. However, previous work has noted the specific difficulty of computational argumentation scenarios with language representations as one of the main bottlenecks and called for targeted research on the intersection of the two fields. Still, the efforts focusing on the interplay between computational argumentation and representation learning have been few and far apart. This is despite (a) the fast-growing body of work in both computational argumentation and representation learning in general and (b) the fact that some of the open challenges are well known in the natural language processing community. In this thesis, we address this research gap and acknowledge the specific importance of research on the intersection of representation learning and computational argumentation. To this end, we (1) identify a series of challenges driven by inherent characteristics of argumentation in natural language and (2) present new analyses, corpora, and methods to address and mitigate each of the identified issues. Concretely, we focus on five main challenges pertaining to the current state-of-the-art in computational argumentation: (C1) External knowledge: static and contextualized language representations encode distributional knowledge only. We propose two approaches to complement this knowledge with knowledge from external resources. First, we inject lexico-semantic knowledge through an additional prediction objective in the pretraining stage. In a second study, we demonstrate how to inject conceptual knowledge post hoc employing the adapter framework. We show the effectiveness of these approaches on general natural language understanding and argumentative reasoning tasks. (C2) Domain knowledge: pretrained language representations are typically trained on big and general-domain corpora. We study the trade-off between employing such large and general-domain corpora versus smaller and domain-specific corpora for training static word embeddings which we evaluate in the analysis of scientific arguments. (C3) Complementarity of knowledge across tasks: many computational argumentation tasks are interrelated but are typically studied in isolation. In two case studies, we show the effectiveness of sharing knowledge across tasks. First, based on a corpus of scientific texts, which we extend with a new annotation layer reflecting fine-grained argumentative structures, we show that coupling the argumentative analysis with other rhetorical analysis tasks leads to performance improvements for the higher-level tasks. In the second case study, we focus on assessing the argumentative quality of texts. To this end, we present a new multi-domain corpus annotated with ratings reflecting different dimensions of argument quality. We then demonstrate the effectiveness of sharing knowledge across the different quality dimensions in multi-task learning setups. (C4) Multilinguality: argumentation arguably exists in all cultures and languages around the globe. To foster inclusive computational argumentation technologies, we dissect the current state-of-the-art in zero-shot cross-lingual transfer. We show big drops in performance when it comes to resource-lean and typologically distant target languages. Based on this finding, we analyze the reasons for these losses and propose to move to inexpensive few-shot target-language transfer, leading to consistent performance improvements in higher-level semantic tasks, e.g., argumentative reasoning. (C5) Ethical considerations: envisioned computational argumentation applications, e.g., systems for self-determined opinion formation, are highly sensitive. We first discuss which ethical aspects should be considered when representing natural language for computational argumentation tasks. Focusing on the issue of unfair stereotypical bias, we then conduct a multi-dimensional analysis of the amount of bias in monolingual and cross-lingual embedding spaces. In the next step, we devise a general framework for implicit and explicit bias evaluation and debiasing. Employing intrinsic bias measures and benchmarks reflecting the semantic quality of the embeddings, we demonstrate the effectiveness of new debiasing methods, which we propose. Finally, we complement this analysis by testing the original as well as the debiased language representations for stereotypically unfair bias in argumentative inferences. We hope that our contributions in language representations for computational argumentation fuel more research on the intersection of the two fields and contribute to fair, efficient, and effective natural language processing technologies

    Decompositional Semantics for Events, Participants, and Scripts in Text

    Get PDF
    This thesis presents a sequence of practical and conceptual developments in decompositional meaning representations for events, participants, and scripts in text under the framework of Universal Decompositional Semantics (UDS) (White et al., 2016a). Part I of the thesis focuses on the semantic representation of individual events and their participants. Chapter 3 examines the feasibility of deriving semantic representations of events from dependency syntax; we demonstrate that predicate- argument structure may be extracted from syntax, but other desirable semantic attributes are not directly discernible. Accordingly, we present in Chapters 4 and 5 state of the art models for predicting these semantic attributes from text. Chapter 4 presents a model for predicting semantic proto-role labels (SPRL), attributes of participants in events based on Dowty’s seminal theory of thematic proto-roles (Dowty, 1991). In Chapter 5 we present a model of event factuality prediction (EFP), the task of determining whether an event mentioned in text happened (according to the meaning of the text). Both chapters include extensive experiments on multi-task learning for improving performance on each semantic prediction task. Taken together, Chapters 3, 4, and 5 represent the development of individual components of a UDS parsing pipeline. In Part II of the thesis, we shift to modeling sequences of events, or scripts (Schank and Abelson, 1977). Chapter 7 presents a case study in script induction using a collection of restaurant narratives from an online blog to learn the canonical “Restaurant Script.” In Chapter 8, we introduce a simple discriminative neural model for script induction based on narrative chains (Chambers and Jurafsky, 2008) that outperforms prior methods. Because much existing work on narrative chains employs semantically impoverished representations of events, Chapter 9 draws on the contributions of Part I to learn narrative chains with semantically rich, decompositional event representations. Finally, in Chapter 10, we observe that corpus based approaches to script induction resemble the task of language modeling. We explore the broader question of the relationship between language modeling and acquisition of common-sense knowledge, and introduce an approach that combines language modeling and light human supervision to construct datasets for common-sense inference

    An inquiry into the typical and atypical language development of young transnational multilingual children in an international school

    Get PDF
    This PhD thesis investigates some of the unique characteristics of young transnational multilingual children aged five to eleven from high-socioeconomic status families educated in an international school in Switzerland. Its purpose is to improve understanding of typical and atypical language development for this group. It draws on sociolinguistic research on language variation and exposure, and clinical linguistic research on developmental language disorder identification and cross-linguistic considerations. The specific aim of the pilot research study presented in this thesis is to measure and discuss seven multilingual children’s verbal language abilities in each of their languages, and to measure their combined bilingual verbal abilities and multilingual verbal abilities. It is, therefore, influenced by discussion on language acquisition theories that relate to complex and dynamic systems, such as the Dynamic Model of Multilingualism. In addition, it also identifies any common characteristics, familial language practices or experiences of the pilot group of children. A methodological design is created that could be replicated in the future on a much larger scale as a means of confirming, extending or disputing the findings from the pilot group. This thesis’s pilot research findings suggest that multilingual children from high-income families who attend international schools have significantly above average verbal language abilities when their verbal language abilities are evaluated as one total language system (multilingual ability), a finding that is in stark contrast to the ‘average’ results they receive when each language is evaluated on its own. The thesis concludes that research on multilingual children that does not take into account the variables unique to this group may fail to recognise important factors that can impact their language development

    'Strange and Absurd Words:' Translation as Ethics and Poetics in the Transcultural U.S. 1830-1915

    Get PDF
    ABSTRACT Title of Document: "STRANGE AND ABSURD WORDS:" TRANSLATION AS ETHICS AND POETICS IN THE TRANSCULTURAL U.S. 1830-1915 Laura E. Lauth, PhD, 2011 Directed By: Professor Martha Nell Smith Department of English This dissertation documents the emergence of "foreignizing" translation and its influence on poetic practice in the transcultural United States between 1830 and 1915--a period critical to the development of free verse in English. The study also explores the extent to which poetry translation constitutes a genre with special relevance to the multilingual U.S. In Lawrence Venuti's formulation, foreignizing signals the difference of the source text by disrupting cultural codes and literary norms in the target language (Translator's Invisibility 15). The innovative and ethically-charged translations recuperated here played a vital role in the development of "American poetry" by introducing heterodox authors, genres, and discourses into print. Despite nationalist and English-only tendencies in U.S. scholarship, the literature of the United States has always exceeded the bounds of a single language or nation. More than a mere byproduct of foreign dependency, the nineteenth-century proliferation of literary translations and non-English literatures reflected a profoundly multilingual "nation of nations." As such, this study emphasizes both the transnational and multicultural character of U.S. poetry. In tracing this often invisible tradition of foreign-bent translation, I offer five case studies spanning eighty years, two centuries, three continents, and numerous languages. From the influential debut of Bettina Brentano-von Arnim's self-translated Goethe's Correspondence with a Child (1838) to Henry Wadsworth Longfellow's comparativist translation anthology, Poets and Poetry of Europe (1845); from Judith Gautier's pioneering vers libre variations on the Classical Chinese (1867) to binational poet Stuart Merrill's free verse Englishing of Gautier (1890); from Pound's heteroclite Medievalism (1905-1910) to the inaugural volume of Harriet Monroe's transnational magazine, Poetry (1912-1913), the translations considered here challenged "literary canons, professional standards, and ethical norms in the target language" (Venuti, "Strategies of Translation" 242). Taken together, these chapters offer a new transcultural perspective on modern literary translation and the development of free verse in English

    OM-2017: Proceedings of the Twelfth International Workshop on Ontology Matching

    Get PDF
    shvaiko2017aInternational audienceOntology matching is a key interoperability enabler for the semantic web, as well as auseful tactic in some classical data integration tasks dealing with the semantic heterogeneityproblem. It takes ontologies as input and determines as output an alignment,that is, a set of correspondences between the semantically related entities of those ontologies.These correspondences can be used for various tasks, such as ontology merging,data translation, query answering or navigation on the web of data. Thus, matchingontologies enables the knowledge and data expressed with the matched ontologies tointeroperate

    Uses of English as a Lingua Franca in Domain-specific Contexts of Intercultural Communication

    Get PDF
    This special issue of Lingue e Linguaggi collects the contributions presented at the International Conference Uses of English as a Lingua Franca in Domain-Specific Contexts of Intercultural Communication, which took place at the University of Salento, Italy, in December 2019. The Conference represented the conclusion of a PRIN Project co-funded by the Italian Ministry of University and Research, which started from the assumption that ELF is an area needing a more principled systematic investigation since, so far, it has not been recognized as a use of English that is independent from English as a Native Language. The chapters of this special issue concern ELF variations employed in: (a) institutional, professional, as well as ‘undeclared’ migration settings (UniSalento Unit); (b) digital media employed for global communication (UniVerona Unit); (c) multicultural and multilingual classrooms characterizing contemporary western societies (UniRoma Tre Unit). The contributions enquire into the ELF uses in domain-specific discourses that demonstrate the extent to which the English language comes to be appropriated by non-native speakers who, indeed, do not experience it as an alien ‘foreign’ language, but rather as a ‘lingua franca’ through which they feel free to convey their own native linguacultural and experiential uses and narratives, rhetorical and specialized repertoires and, ultimately, their own socio-cultural identities. The contributors’ research has provided evidence in support of an acknowledgement that people from different linguacultural backgrounds appropriate English by making reference to their own different native semantic, syntactic and pragmatic codes through which they convey their own communicative needs

    Simple identification tools in FishBase

    Get PDF
    Simple identification tools for fish species were included in the FishBase information system from its inception. Early tools made use of the relational model and characters like fin ray meristics. Soon pictures and drawings were added as a further help, similar to a field guide. Later came the computerization of existing dichotomous keys, again in combination with pictures and other information, and the ability to restrict possible species by country, area, or taxonomic group. Today, www.FishBase.org offers four different ways to identify species. This paper describes these tools with their advantages and disadvantages, and suggests various options for further development. It explores the possibility of a holistic and integrated computeraided strategy
    • 

    corecore