282 research outputs found

    Mining Large-scale Event Knowledge from Web Text

    Get PDF
    AbstractThis paper addresses the problem of automatic acquisition of semantic relations between events. While previous works on semantic relation automatic acquisition relied on annotated text corpus, it is still unclear how to develop more generic methods to meet the needs of identifying related event pairs and extracting event-arguments (especially the predicate, subject and object). Motivated by this limitation, we develop a three-phased approach that acquires causality from the Web text. First, we use explicit connective markers (such as “because”) as linguistic cues to discover causal related events. Next, we extract the event-arguments based on local dependency parse trees of event expressions. At the last step, we propose a statistical model to measure the potential causal relations. The results of our empirical evaluations on a large-scale Web text corpus show that (a) the use of local dependency tree extensively improves both the accuracy and recall of event-arguments extraction task, and (b) our measure improves the traditional PMI method

    A contrastive systemic functional analysis of causality in Japanese and English academic articles

    Get PDF
    Typological differences between languages have been a much debated topic in linguistic studies. Despite their usefulness in understanding syntactic features of various languages, such contrastive analyses have yet to thoroughly explore semantic variation among languages; furthermore, the results obtained have not been practically utilized in other areas of applied linguistics. This situation may come from the fact that a large number of contrastive studies have eclectically examined isolated areas of language variation either from syntactic, morphological, or from pragmatic perspectives. Viewing this issue from another angle, Systemic Functional Linguistics (SFL) focuses on language from a multi-dimensional perspective, where language is a realization of both interpersonal, textual, and social contextual factors. In recent years, SFL has demonstrated its applicability to neglected areas in applied linguistics such as translation studies and foreign language pedagogy. On par with current SFL research into the language of various text types or genres, the purpose of this study is to investigate the ways in which the concept of causality is realized in syntactically distinct patterns and how such syntactic variations serve different discourse functions in Japanese and English academic articles. From the various realizations of causality, this thesis focuses on explicit logical and ideational causality and its lexicogrammatical realizational patterns and functions as used in published journal articles on second language acquisition. This study indicates that contrary to the current claim about the function of causality-oriented grammatical metaphors (Halliday and Matthiessen, 1999), causality and its realizational patterns are language-specific phenomenon

    Toward a Discourse Theory for Annotating Causal Relations in Japanese

    Get PDF
    We present a revised discourse theory based on segmented discourse represen-tation theory and provide a method for building a Japanese corpus suitable for causal relation extraction. This extends and refines the framework proposed in Kaneko and Bekki (2014), and we evalu-ate our corpus and compare it with that work.

    The encoding of irrelevance in discourse: tanto between concession and justification

    Get PDF
    This paper sets out to investigate the linguistic expression of irrelevance in discourse, by focusing on the functions and uses of the marker tanto in present-day Italian. The marker encodes the irrelevance of a given condition, thus conveying a concessive meaning not distant from the value expressed by concessive conditionals or ‘no matter’ predicates. In the construction [p, tanto q], the marker tanto conveys the fact that q holds in any case, namely whether p, non-p,or any value of p is the case. After discussing how the notion of irrelevance has been treated in the literature on condition-als, concessives and predicates of indifference, we will discuss the results of a corpus-based study on spoken Italian, identifying and annotating all the occurrences of irrelevance-tanto. We will show that irrelevance may be expressed by a number of different discourse patterns, explicitly mentioning the irrelevant proposition, the null-effect and the motivation for irrelevance, or omitting one or more of these components. It will be argued that the cases in which speakers simply refer to the irrelevance of a given proposition are rare in our sample, whereas it is more frequent that they also mention the motiva-tion underlying irrelevance, justifying their indifference and thus crucially acting at the intersubjective level. We will show that tanto may also be used alone as a discourse marker encoding the speaker’s attitude of indifference: in these cases, tanto is pronounced with suspensive intonation, and subsumes under its semantics the adversative, justificative, and indiscriminative value, activating meanings that speakers are supposed to share

    The Semantics and Acquisition of Time in Language

    Get PDF
    This dissertation is about the structure of temporal semantics and children’s acquisition of temporal language. It argues for the importance of investigating semantics both at the abstract level of linguistic structures and at the concrete level of the time-course of acquisition, as these two levels provide natural constraints for each other. With respect to semantics, it provides a computationally inspired analysis of tense, grammatical aspect and lexical aspect that uses finite state automata to dynamically calculate the progress of an event over a time interval. It is shown that the analysis can account for many well-known temporal phenomena, such as the different entailments of telic and atelic predicates in the imperfective aspect (the imperfective paradox), and the various unified and serial interpretations of sentences involving a cardinally quantified phrase, such as Three Ringlings visited Florida. With respect to children’s acquisition of temporal language, the dissertation investigates the Aspect First hypothesis which states that children initially use tense and grammatical aspect morphology to mark the lexical aspect property of telicity. Two forced-choice comprehension experiments were conducted with children aged 2.5 to 5 years old to test children’s understanding of tense and grammatical aspect morphology; in a control condition, open class cues were used to test children’s conceptual competence with tense and grammatical aspect information independently of their competence with the relevant morphology (e.g., in the middle of and in a few seconds were the open class cues for imperfective aspect and future tense, respectively). Results showed that even the youngest children understood the concepts underlying tense and grammatical aspect as measured by their performance with the open class cues but they did not demonstrate adult competence with the closed class morphology for grammatical aspect and did so only marginally for tense. Comprehension of tense morphology preceded that of grammatical aspect morphology and in particular, children showed an early facility with markers of the future tense

    Unsupervised extraction of semantic relations using discourse information

    Get PDF
    La compréhension du langage naturel repose souvent sur des raisonnements de sens commun, pour lesquels la connaissance de relations sémantiques, en particulier entre prédicats verbaux, peut être nécessaire. Cette thèse porte sur la problématique de l'utilisation d'une méthode distributionnelle pour extraire automatiquement les informations sémantiques nécessaires à ces inférences de sens commun. Des associations typiques entre des paires de prédicats et un ensemble de relations sémantiques (causales, temporelles, de similarité, d'opposition, partie/tout) sont extraites de grands corpus, par l'exploitation de la présence de connecteurs du discours signalant typiquement ces relations. Afin d'apprécier ces associations, nous proposons plusieurs mesures de signifiance inspirées de la littérature ainsi qu'une mesure novatrice conçue spécifiquement pour évaluer la force du lien entre les deux prédicats et la relation. La pertinence de ces mesures est évaluée par le calcul de leur corrélation avec des jugements humains, obtenus par l'annotation d'un échantillon de paires de verbes en contexte discursif. L'application de cette méthodologie sur des corpus de langue française et anglaise permet la construction d'une ressource disponible librement, Lecsie (Linked Events Collection for Semantic Information Extraction). Celle-ci est constituée de triplets: des paires de prédicats associés à une relation; à chaque triplet correspondent des scores de signifiance obtenus par nos mesures.Cette ressource permet de dériver des représentations vectorielles de paires de prédicats qui peuvent être utilisées comme traits lexico-sémantiques pour la construction de modèles pour des applications externes. Nous évaluons le potentiel de ces représentations pour plusieurs applications. Concernant l'analyse du discours, les tâches de la prédiction d'attachement entre unités du discours, ainsi que la prédiction des relations discursives spécifiques les reliant, sont explorées. En utilisant uniquement les traits provenant de notre ressource, nous obtenons des améliorations significatives pour les deux tâches, par rapport à plusieurs bases de référence, notamment des modèles utilisant d'autres types de représentations lexico-sémantiques. Nous proposons également de définir des ensembles optimaux de connecteurs mieux adaptés à des applications sur de grands corpus, en opérant une réduction de dimension dans l'espace des connecteurs, au lieu d'utiliser des groupes de connecteurs composés manuellement et correspondant à des relations prédéfinies. Une autre application prometteuse explorée dans cette thèse concerne les relations entre cadres sémantiques (semantic frames, e.g. FrameNet): la ressource peut être utilisée pour enrichir cette structure par des relations potentielles entre frames verbaux à partir des associations entre leurs verbes. Ces applications diverses démontrent les contributions prometteuses amenées par notre approche permettant l'extraction non supervisée de relations sémantiques.Natural language understanding often relies on common-sense reasoning, for which knowledge about semantic relations, especially between verbal predicates, may be required. This thesis addresses the challenge of using a distibutional method to automatically extract the necessary semantic information for common-sense inference. Typical associations between pairs of predicates and a targeted set of semantic relations (causal, temporal, similarity, opposition, part/whole) are extracted from large corpora, by exploiting the presence of discourse connectives which typically signal these semantic relations. In order to appraise these associations, we provide several significance measures inspired from the literature as well as a novel measure specifically designed to evaluate the strength of the link between the two predicates and the relation. The relevance of these measures is evaluated by computing their correlations with human judgments, based on a sample of verb pairs annotated in context. The application of this methodology to French and English corpora leads to the construction of a freely available resource, Lecsie (Linked Events Collection for Semantic Information Extraction), which consists of triples: pairs of event predicates associated with a relation; each triple is assigned significance scores based on our measures. From this resource, vector-based representations of pairs of predicates can be induced and used as lexical semantic features to build models for external applications. We assess the potential of these representations for several applications. Regarding discourse analysis, the tasks of predicting attachment of discourse units, as well as predicting the specific discourse relation linking them, are investigated. Using only features from our resource, we obtain significant improvements for both tasks in comparison to several baselines, including ones using other representations of the pairs of predicates. We also propose to define optimal sets of connectives better suited for large corpus applications by performing a dimension reduction in the space of the connectives, instead of using manually composed groups of connectives corresponding to predefined relations. Another promising application pursued in this thesis concerns relations between semantic frames (e.g. FrameNet): the resource can be used to enrich this sparse structure by providing candidate relations between verbal frames, based on associations between their verbs. These diverse applications aim to demonstrate the promising contributions provided by our approach, namely allowing the unsupervised extraction of typed semantic relations

    Aspects of Linguistic Variation

    Get PDF
    This volume brings together papers on linguistic variation. It takes a broad perspective, covering not only crosslinguistic and diachronic but also intralinguistic and interspeaker variation, and examines phenomena ranging from negation and TAM over connectives and the lexicon to definite articles and comparative concepts in well- and lesser-known languages. The collection thus contributes to our understanding of variation in general

    Tagging kausaler Relationen

    Get PDF
    In dieser Diplomarbeit geht es um kausale Beziehungen zwischen Ereignissen und Erklärungsbeziehungen zwischen Ereignissen, bei denen kausale Relationen eine wichtige Rolle spielen. Nachdem zeitliche Relationen einerseits ihrer einfacheren Formalisierbarkeit und andererseits ihrer gut sichtbaren Rolle in der Grammatik (Tempus und Aspekt, zeitliche Konjunktionen) wegen in jüngerer Zeit stärker im Mittelpunkt des Interesses standen, soll hier argumentiert werden, dass kausale Beziehungen und die Erklärungen, die sie ermöglichen, eine wichtigere Rolle im Kohärenzgefüge des Textes spielen. Im Gegensatz zu “tiefen” Verfahren, die auf einer detaillierten semantischen Repr¨asentation des Textes aufsetzen und infolgedessen für unrestringierten Text m. E. nicht geeignet sind, wird hier untersucht, wie man dieses Ziel erreichen kann, ohne sich auf eine aufwändig konstruierte Wissensbasis verlassen zu müssen.Causal relations between events and explanational relations among these events, where the causal relations play an important role, are the main topic of the present diploma thesis. After temporal relations between events have been more in the focus of interest recently because of both being easier to formalize and playing a visible role in grammar (notably the effects of time and aspect, as well as temporal conjunctions), I will argue that causal relations and the explanations they provide play the greater role in the coherence of a text. In contrast to “deep” approaches that rely on a fine-grained semantic representation of the text and by consequent can be unsuitable for unrestricted text, I will investigate how to reach this goal without requiring an expensive hand-coded knowledge base

    Use of the Knowledge-Based System LOG-IDEAH to Assess Failure Modes of Masonry Buildings, Damaged by L'Aquila Earthquake in 2009

    Get PDF
    This article, first, discusses the decision-making process, typically used by trained engineers to assess failure modes of masonry buildings, and then, presents the rule-based model, required to build a knowledge-based system for post-earthquake damage assessment. The acquisition of the engineering knowledge and implementation of the rule-based model lead to the developments of the knowledge-based system LOG-IDEAH (Logic trees for Identification of Damage due to Earthquakes for Architectural Heritage), a web-based tool, which assesses failure modes of masonry buildings by interpreting both crack pattern and damage severity, recorded on site by visual inspection. Assuming that failure modes detected by trained engineers for a sample of buildings are the correct ones, these are used to validate the predictions made by LOG-IDEAH. Prediction robustness of the proposed system is carried out by computing Precision and Recall measures for failure modes, predicted for a set of buildings selected in the city center of L’Aquila (Italy), damaged by an earthquake in 2009. To provide an independent meaning of verification for LOG-IDEAH, random generations of outputs are created to obtain baselines of failure modes for the same case study. For the baseline output to be compatible and consistent with the observations on site, failure modes are randomly generated with the same probability of occurrence as observed for the building samples inspected in the city center of L’Aquila. The comparison between Precision and Recall measures, calculated on the output, provided by LOG-IDEAH and predicted by random generations, underlines that the proposed knowledge-based system has a high ability to predict failure modes of masonry buildings, and has the potential to support surveyors in post-earthquake assessments

    Causality Management and Analysis in Requirement Manuscript for Software Designs

    Get PDF
    For software design tasks involving natural language, the results of a causal investigation provide valuable and robust semantic information, especially for identifying key variables during product (software) design and product optimization. As the interest in analytical data science shifts from correlations to a better understanding of causality, there is an equal task focused on the accuracy of extracting causality from textual artifacts to aid requirement engineering (RE) based decisions. This thesis focuses on identifying, extracting, and classifying causal phrases using word and sentence labeling based on the Bi-directional Encoder Representations from Transformers (BERT) deep learning language model and five machine learning models. The aim is to understand the form and degree of causality based on their impact and prevalence in RE practice. Methodologically, our analysis is centered around RE practice, and we considered 12,438 sentences extracted from 50 requirement engineering manuscripts (REM) for training our machine models. Our research reports that causal expressions constitute about 32% of sentences from REM. We applied four evaluation metrics, namely recall, accuracy, precision, and F1, to assess our machine models’ performance and accuracy to ensure the results’ conformity with our study goal. Further, we computed the highest model accuracy to be 85%, attributed to Naive Bayes. Finally, we noted that the applicability and relevance of our causal analytic framework is relevant to practitioners for different functionalities, such as generating test cases for requirement engineers and software developers and product performance auditing for management stakeholders
    corecore