305 research outputs found

    Predicate Matrix: an interoperable lexical knowledge base for predicates

    Get PDF
    183 p.La Matriz de Predicados (Predicate Matrix en inglés) es un nuevo recurso léxico-semántico resultado de la integración de múltiples fuentes de conocimiento, entre las cuales se encuentran FrameNet, VerbNet, PropBank y WordNet. La Matriz de Predicados proporciona un léxico extenso y robusto que permite mejorar la interoperabilidad entre los recursos semánticos mencionados anteriormente. La creación de la Matriz de Predicados se basa en la integración de Semlink y nuevos mappings obtenidos utilizando métodos automáticos que enlazan el conocimiento semántico a nivel léxico y de roles. Asimismo, hemos ampliado la Predicate Matrix para cubrir los predicados nominales (inglés, español) y predicados en otros idiomas (castellano, catalán y vasco). Como resultado, la Matriz de predicados proporciona un léxico multilingüe que permite el análisis semántico interoperable en múltiples idiomas

    Workshop Proceedings of the 12th edition of the KONVENS conference

    Get PDF
    The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years

    Entity Linking in Low-Annotation Data Settings

    Get PDF
    Recent advances in natural language processing have focused on applying and adapting large pretrained language models to specific tasks. These models, such as BERT (Devlin et al., 2019) and BART (Lewis et al., 2020a), are pretrained on massive amounts of unlabeled text across a variety of domains. The impact of these pretrained models is visible in the task of entity linking, where a mention of an entity in unstructured text is matched to the relevant entry in a knowledge base. State-of-the-art linkers, such as Wu et al. (2020) and De Cao et al. (2021), leverage pretrained models as a foundation for their systems. However, these models are also trained on large amounts of annotated data, which is crucial to their performance. Often these large datasets consist of domains that are easily annotated, such as Wikipedia or newswire text. However, tailoring NLP tools to a narrow variety of textual domains severely restricts their use in the real world. Many other domains, such as medicine or law, do not have large amounts of entity linking annotations available. Entity linking, which serves to bridge the gap between massive unstructured amounts of text and structured repositories of knowledge, is equally crucial in these domains. Yet tools trained on newswire or Wikipedia annotations are unlikely to be well-suited for identifying medical conditions mentioned in clinical notes. As most annotation efforts focus on English, similar challenges can be noted in building systems for non-English text. There is often a relatively small amount of annotated data in these domains. With this being the case, looking to other types of domain-specific data, such as unannotated text or highly-curated structured knowledge bases, is often required. In these settings, it is crucial to translate lessons taken from tools tailored for high-annotation domains into algorithms that are suited for low-annotation domains. This requires both leveraging broader types of data and understanding the unique challenges present in each domain

    Aspect and Meaning in the Russian Future Tense: Corpus and Experimental Investigations

    Get PDF
    This dissertation is a study of the Russian future tense within the framework of cognitive linguistics. In this dissertation I focus on the distribution of the perfective and imperfective future forms, their future and non-future meanings, and the use of the future tense verb forms by both native and non-native speakers. In the Russian tense-aspect system, it is reasonable to operate with markedness on a local level of tense, rather than the level of the verb. Via local markedness it is possible to see that the perfective future is the unmarked member of the opposition, and the imperfective future is the marked one. The perfective future tense forms are approximately fourteen times more frequent than imperfective future tense forms in the Russian National Corpus. Both perfective and imperfective future tense forms express not only future meanings but also gnomic, directive etc. The (non-)future meanings form a radial category with the future meaning as a prototype and other meanings as extensions. Native speakers operate with frequency when they use future tense forms. Non-native speakers are not sensitive to frequency, and instruction in the use of the future tense forms in Russian could be improved

    Methods for improving entity linking and exploiting social media messages across crises

    Get PDF
    Entity Linking (EL) is the task of automatically identifying entity mentions in texts and resolving them to a corresponding entity in a reference knowledge base (KB). There is a large number of tools available for different types of documents and domains, however the literature in entity linking has shown the quality of a tool varies across different corpus and depends on specific characteristics of the corpus it is applied to. Moreover the lack of precision on particularly ambiguous mentions often spoils the usefulness of automated disambiguation results in real world applications. In the first part of this thesis I explore an approximation of the difficulty to link entity mentions and frame it as a supervised classification task. Classifying difficult to disambiguate entity mentions can facilitate identifying critical cases as part of a semi-automated system, while detecting latent corpus characteristics that affect the entity linking performance. Moreover, despiteless the large number of entity linking tools that have been proposed throughout the past years, some tools work better on short mentions while others perform better when there is more contextual information. To this end, I proposed a solution by exploiting results from distinct entity linking tools on the same corpus by leveraging their individual strengths on a per-mention basis. The proposed solution demonstrated to be effective and outperformed the individual entity systems employed in a series of experiments. An important component in the majority of the entity linking tools is the probability that a mentions links to one entity in a reference knowledge base, and the computation of this probability is usually done over a static snapshot of a reference KB. However, an entity’s popularity is temporally sensitive and may change due to short term events. Moreover, these changes might be then reflected in a KB and EL tools can produce different results for a given mention at different times. I investigated the prior probability change over time and the overall disambiguation performance using different KB from different time periods. The second part of this thesis is mainly concerned with short texts. Social media has become an integral part of the modern society. Twitter, for instance, is one of the most popular social media platforms around the world that enables people to share their opinions and post short messages about any subject on a daily basis. At first I presented one approach to identifying informative messages during catastrophic events using deep learning techniques. By automatically detecting informative messages posted by users during major events, it can enable professionals involved in crisis management to better estimate damages with only relevant information posted on social media channels, as well as to act immediately. Moreover I have also performed an analysis study on Twitter messages posted during the Covid-19 pandemic. Initially I collected 4 million tweets posted in Portuguese since the begining of the pandemic and provided an analysis of the debate aroud the pandemic. I used topic modeling, sentiment analysis and hashtags recomendation techniques to provide isights around the online discussion of the Covid-19 pandemic

    Challenges and perspectives of hate speech research

    Get PDF
    This book is the result of a conference that could not take place. It is a collection of 26 texts that address and discuss the latest developments in international hate speech research from a wide range of disciplinary perspectives. This includes case studies from Brazil, Lebanon, Poland, Nigeria, and India, theoretical introductions to the concepts of hate speech, dangerous speech, incivility, toxicity, extreme speech, and dark participation, as well as reflections on methodological challenges such as scraping, annotation, datafication, implicity, explainability, and machine learning. As such, it provides a much-needed forum for cross-national and cross-disciplinary conversations in what is currently a very vibrant field of research

    The universe without us: a history of the science and ethics of human extinction

    Get PDF
    This dissertation consists of two parts. Part I is an intellectual history of thinking about human extinction (mostly) within the Western tradition. When did our forebears first imagine humanity ceasing to exist? Have people always believed that human extinction is a real possibility, or were some convinced that this could never happen? How has our thinking about extinction evolved over time? Why do so many notable figures today believe that the probability of extinction this century is higher than ever before in our 300,000-year history on Earth? Exploring these questions takes readers from the ancient Greeks, Persians, and Egyptians, through the 18th-century Enlightenment, past scientific breakthroughs of the 19th century like thermodynamics and evolutionary theory, up to the Atomic Age, the rise of modern environmentalism in the 1970s, and contemporary fears about climate change, global pandemics, and artificial general intelligence (AGI). Part II is a history of Western thinking about the ethical and evaluative implications of human extinction. Would causing or allowing our extinction be morally right or wrong? Would our extinction be good or bad, better or worse compared to continuing to exist? For what reasons? Under which conditions? Do we have a moral obligation to create future people? Would past “progress” be rendered meaningless if humanity were to die out? Does the fact that we might be unique in the universe—the only “rational” and “moral” creatures—give us extra reason to ensure our survival? I place these questions under the umbrella of Existential Ethics, tracing the development of this field from the early 1700s through Mary Shelley’s 1826 novel The Last Man, the gloomy German pessimists of the latter 19th century, and post-World War II reflections on nuclear “omnicide,” up to current-day thinkers associated with “longtermism” and “antinatalism.” In the dissertation, I call the first history “History #1” and the second “History #2.” A main thesis of Part I is that Western thinking about human extinction can be segmented into five distinction periods, each of which corresponds to a unique “existential mood.” An existential mood arises from a particular set of answers to fundamental questions about the possibility, probability, etiology, and so on, of human extinction. I claim that the idea of human extinction first appeared among the ancient Greeks, but was eclipsed for roughly 1,500 years with the rise of Christianity. A central contention of Part II is that philosophers have thus far conflated six distinct types of “human extinction,” each of which has its own unique ethical and evaluative implications. I further contend that it is crucial to distinguish between the process or event of Going Extinct and the state or condition of Being Extinct, which one should see as orthogonal to the six types of extinction that I delineate. My aim with the second part of the book is to not only trace the history of Western thinking about the ethics of annihilation, but lay the theoretical groundwork for future research on the topic. I then outline my own views within “Existential Ethics,” which combine ideas and positions to yield a novel account of the conditions under which our extinction would be bad, and why there is a sense in which Being Extinct might be better than Being Extant, or continuing to exist
    • …
    corecore