19 research outputs found

    Ranking and Retrieval under Semantic Relevance

    Get PDF
    This thesis presents a series of conceptual and empirical developments on the ranking and retrieval of candidates under semantic relevance. Part I of the thesis introduces the concept of uncertainty in various semantic tasks (such as recognizing textual entailment) in natural language processing, and the machine learning techniques commonly employed to model these semantic phenomena. A unified view of ranking and retrieval will be presented, and the trade-off between model expressiveness, performance, and scalability in model design will be discussed. Part II of the thesis focuses on applying these ranking and retrieval techniques to text: Chapter 3 examines the feasibility of ranking hypotheses given a premise with respect to a human's subjective probability of the hypothesis happening, effectively extending the traditional categorical task of natural language inference. Chapter 4 focuses on detecting situation frames for documents using ranking methods. Then we extend the ranking notion to retrieval, and develop both sparse (Chapter 5) and dense (Chapter 6) vector-based methods to facilitate scalable retrieval for potential answer paragraphs in question answering. Part III turns the focus to mentions and entities in text, while continuing the theme on ranking and retrieval: Chapter 7 discusses the ranking of fine-grained types that an entity mention could belong to, leading to state-of-the-art performance on hierarchical multi-label fine-grained entity typing. Chapter 8 extends the semantic relation of coreference to a cross-document setting, enabling models to retrieve from a large corpus, instead of in a single document, when resolving coreferent entity mentions

    Reality in Perspectives

    Get PDF
    This dissertation is about human knowledge of reality. In particular, it argues that scientific knowledge is bounded by historically available instruments and theories; nevertheless, the use of several independent instruments and theories can provide access to the persistent potentialities of reality. The replicability of scientific observations and experiments allows us to obtain explorable evidence of robust entities and properties. The dissertation includes seven chapters. It also studies three cases – namely, Higgs bosons and hypothetical Ϝ-particles (section 2.4), the Ptolemaic and Kepler model of the planets (section 6.7), and the special theory of relativity (chapter 7). Chapter 1 is the introduction of the dissertation. Chapter 2 clarifies the notion of the real on the basis of two concepts: persistence and resistance. These concepts enable me to explain my ontological belief in the real potentialities of human-independent things and the implications of this view for the perceptual and epistemological levels of discussion. On the basis of the concept of “overlapping perspectives”, chapter 3 argues that entity realism and perspectivism are complementary. That is, an entity that manifests itself through several experimental/observational methods is something real, but our knowledge of its nature is perspectival. Critically studying the recent views of entity realism, chapter 4 extends the discussion of entity realism and provides a criterion for the reality of property tokens. Chapter 5, in contrast, develops the perspectival aspects of my view on the basis of the phenomenological-hermeneutical approaches to the philosophy of science. This chapter also elaborates my view of empirical evidence, as briefly expressed in sections 2.5 and 4.5. Chapter 6 concerns diachronic theoretical perspectives. It first explains my view of progress, according to which current perspectives are broader than past ones. Second, it argues that the successful explanations and predictions of abandoned theories can be accounted for from our currently acceptable perspectives. The case study of Ptolemaic astronomy supports the argument of this chapter. Chapter 7 serves as the conclusion of the dissertation by applying the central themes of the previous chapters to the case study of special relativity theory. I interpret frame-dependent properties, such as length and time duration, and the constancy of the speed of light according to realist perspectivism

    Sensory Representation and Cognitive Architecture: An alternative to phenomenal concepts

    Get PDF
    We present a cognitive-physicalist account of phenomenal consciousness. We argue that phenomenal concepts do not differ from other types of concepts. When explaining the peculiarities of conscious experience, the right place to look at is sensory/ perceptual representations and their interaction with general conceptual structures. We utilize Jerry Fodor’s psycho- semantic theory to formulate our view. We compare and contrast our view with that of Murat Aydede and Güven Güzeldere, who, using Dretskean psychosemantic theory, arrived at a solution different from ours in some ways. We have suggested that the representational atomism of certain sensory experiences plays a central role in reconstructing the epistemic gap associated with conscious experience, still, atomism is not the whole story. It needs to be supple- mented by some additional principles. We also add an account of introspection, and suggest some cognitive features that might distinguish representational atoms with phenomenal character from those without it

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    Unsupervised Induction of Frame-Based Linguistic Forms

    Get PDF
    This thesis studies the use of bulk, structured, linguistic annotations in order to perform unsupervised induction of meaning for three kinds of linguistic forms: words, sentences, and documents. The primary linguistic annotation I consider throughout this thesis are frames, which encode core linguistic, background or societal knowledge necessary to understand abstract concepts and real-world situations. I begin with an overview of linguistically-based structured meaning representation; I then analyze available large-scale natural language processing (NLP) and linguistic resources and corpora for their abilities to accommodate bulk, automatically-obtained frame annotations. I then proceed to induce meanings of the different forms, progressing from the word level, to the sentence level, and finally to the document level. I first show how to use these bulk annotations in order to better encode linguistic- and cognitive science backed semantic expectations within word forms. I then demonstrate a straightforward approach for learning large lexicalized and refined syntactic fragments, which encode and memoize commonly used phrases and linguistic constructions. Next, I consider two unsupervised models for document and discourse understanding; one is a purely generative approach that naturally accommodates layer annotations and is the first to capture and unify a complete frame hierarchy. The other conditions on limited amounts of external annotations, imputing missing values when necessary, and can more readily scale to large corpora. These discourse models help improve document understanding and type-level understanding

    On Coreferring Text-extracted Event Descriptions with the aid of Ontological Reasoning

    No full text
    Systems for automatic extraction of semantic information about events from large textual resources are now available: these tools are capable to generate RDF datasets about text extracted events and this knowledge can be used to reason over the recognized events. On the other hand, text based tasks for event recognition, as for example event coreference (i.e. recognizing whether two textual descriptions refer to the same event), do not take into account ontological information of the extracted events in their process. In this paper, we propose a method to derive event coreference on text extracted event data using semantic based rule reasoning. We demonstrate our method considering a limited (yet representative) set of event types: we introduce a formal analysis on their ontological properties and, on the base of this, we define a set of coreference criteria. We then implement these criteria as RDF-based reasoning rules to be applied on text extracted event data. We evaluate the effectiveness of our approach over a standard coreference benchmark dataset

    On Coreferring Text-extracted Event Descriptions with the aid of Ontological Reasoning

    No full text
    Systems for automatic extraction of semantic information about events from large textual resources are now available: these tools are capable to generate RDF datasets about text extracted events and this knowledge can be used to reason over the recognized events. On the other hand, text based tasks for event recognition, as for example event coreference (i.e. recognizing whether two textual descriptions refer to the same event), do not take into account ontological information of the extracted events in their process. In this paper, we propose a method to derive event coreference on text extracted event data using semantic based rule reasoning. We demonstrate our method considering a limited (yet representative) set of event types: we introduce a formal analysis on their ontological properties and, on the base of this, we define a set of coreference criteria. We then implement these criteria as RDF-based reasoning rules to be applied on text extracted event data. We evaluate the effectiveness of our approach over a standard coreference benchmark dataset
    corecore