9 research outputs found

    Distant Co-occurrence Patterns of Connectives: a Corpus Study of Formulaicity in Japanese

    Get PDF
    Using corpus research methods, this study aims to establish whether there are two-item and, more generally, multi-item distant co-occurrence patterns of connectives in written Japanese, and further, to clarify the role these patterns play in discourse. The study is based on a hybrid corpus of written Japanese including Humanities and social science papers, Science and technology papers, and general written language data. The co-occurrence threshold was set at co-occurrence frequency > 10, PMI value > 2, and Dice coefficient > 0.01. The distribution of the observed co-occurring pairs differed according to the genre. Visualization of the connectivity potential of co-occurring pairs as directed graphs showed that these co-occurring pairs constitute longer co-occurrence chains which can be interpreted as ready-made co-occurrence patterns. Two-item and multi-item co-occurrence patterns are considered a type of Bourdieu’s habitus and contribute to both discourse development and discourse prediction

    Unsupervised extraction of semantic relations using discourse information

    Get PDF
    La compréhension du langage naturel repose souvent sur des raisonnements de sens commun, pour lesquels la connaissance de relations sémantiques, en particulier entre prédicats verbaux, peut être nécessaire. Cette thèse porte sur la problématique de l'utilisation d'une méthode distributionnelle pour extraire automatiquement les informations sémantiques nécessaires à ces inférences de sens commun. Des associations typiques entre des paires de prédicats et un ensemble de relations sémantiques (causales, temporelles, de similarité, d'opposition, partie/tout) sont extraites de grands corpus, par l'exploitation de la présence de connecteurs du discours signalant typiquement ces relations. Afin d'apprécier ces associations, nous proposons plusieurs mesures de signifiance inspirées de la littérature ainsi qu'une mesure novatrice conçue spécifiquement pour évaluer la force du lien entre les deux prédicats et la relation. La pertinence de ces mesures est évaluée par le calcul de leur corrélation avec des jugements humains, obtenus par l'annotation d'un échantillon de paires de verbes en contexte discursif. L'application de cette méthodologie sur des corpus de langue française et anglaise permet la construction d'une ressource disponible librement, Lecsie (Linked Events Collection for Semantic Information Extraction). Celle-ci est constituée de triplets: des paires de prédicats associés à une relation; à chaque triplet correspondent des scores de signifiance obtenus par nos mesures.Cette ressource permet de dériver des représentations vectorielles de paires de prédicats qui peuvent être utilisées comme traits lexico-sémantiques pour la construction de modèles pour des applications externes. Nous évaluons le potentiel de ces représentations pour plusieurs applications. Concernant l'analyse du discours, les tâches de la prédiction d'attachement entre unités du discours, ainsi que la prédiction des relations discursives spécifiques les reliant, sont explorées. En utilisant uniquement les traits provenant de notre ressource, nous obtenons des améliorations significatives pour les deux tâches, par rapport à plusieurs bases de référence, notamment des modèles utilisant d'autres types de représentations lexico-sémantiques. Nous proposons également de définir des ensembles optimaux de connecteurs mieux adaptés à des applications sur de grands corpus, en opérant une réduction de dimension dans l'espace des connecteurs, au lieu d'utiliser des groupes de connecteurs composés manuellement et correspondant à des relations prédéfinies. Une autre application prometteuse explorée dans cette thèse concerne les relations entre cadres sémantiques (semantic frames, e.g. FrameNet): la ressource peut être utilisée pour enrichir cette structure par des relations potentielles entre frames verbaux à partir des associations entre leurs verbes. Ces applications diverses démontrent les contributions prometteuses amenées par notre approche permettant l'extraction non supervisée de relations sémantiques.Natural language understanding often relies on common-sense reasoning, for which knowledge about semantic relations, especially between verbal predicates, may be required. This thesis addresses the challenge of using a distibutional method to automatically extract the necessary semantic information for common-sense inference. Typical associations between pairs of predicates and a targeted set of semantic relations (causal, temporal, similarity, opposition, part/whole) are extracted from large corpora, by exploiting the presence of discourse connectives which typically signal these semantic relations. In order to appraise these associations, we provide several significance measures inspired from the literature as well as a novel measure specifically designed to evaluate the strength of the link between the two predicates and the relation. The relevance of these measures is evaluated by computing their correlations with human judgments, based on a sample of verb pairs annotated in context. The application of this methodology to French and English corpora leads to the construction of a freely available resource, Lecsie (Linked Events Collection for Semantic Information Extraction), which consists of triples: pairs of event predicates associated with a relation; each triple is assigned significance scores based on our measures. From this resource, vector-based representations of pairs of predicates can be induced and used as lexical semantic features to build models for external applications. We assess the potential of these representations for several applications. Regarding discourse analysis, the tasks of predicting attachment of discourse units, as well as predicting the specific discourse relation linking them, are investigated. Using only features from our resource, we obtain significant improvements for both tasks in comparison to several baselines, including ones using other representations of the pairs of predicates. We also propose to define optimal sets of connectives better suited for large corpus applications by performing a dimension reduction in the space of the connectives, instead of using manually composed groups of connectives corresponding to predefined relations. Another promising application pursued in this thesis concerns relations between semantic frames (e.g. FrameNet): the resource can be used to enrich this sparse structure by providing candidate relations between verbal frames, based on associations between their verbs. These diverse applications aim to demonstrate the promising contributions provided by our approach, namely allowing the unsupervised extraction of typed semantic relations

    Kayardild Morphology, Phonology and Morphosyntax

    Get PDF
    Kayardild possesses one of, if not the, most exuberant systems of morphological concord known to linguists, and a phonological system which is intricately sensitive to its morphology. This dissertation provides a comprehensive description of the phonology of Kayardild, an investigation of its phonetics, its intonation, and a formal analysis of its inflectional morphology. A key component of the latter is the existence of a ‘morphomic’ level of representation intermediate between morphosyntactic features and underlying phonological forms. Chapter 2 introduces the segmental inventory of Kayardild, the phonetic realisations of surface segments, and their phonotactics. Chapter 3 provides an introduction to the empirical facts of Kayardild word structure, outlining the kinds of morphs of which words are composed, their formal shapes and their combinations. Chapter 4 treats the segmental phonology of Kayardild. After a survey of the mappings between underlying and (lexical) surface forms, the primary topic is the interaction of the phonology with morphology, although major generalisations identifiable in the phonology itself are also identified and discussed. Chapter 5 examines Kayardild stress, and presents a constraint based analysis, before turning to an empirical and analytical discussion of intonation. Chapter 6, on the syntax and morphosyntax of Kayardild, is most substantial chapter of the dissertation. In association with the examination of a large corpus of new and newly collated data, mutually compatible analyses of the syntax and morphosyntactic features of Kayardild are built up and compared against less favourable alternatives. A critical review of Evans’ (1995a) analysis of similar phenomena is also provided. Chapter 7 turns to the realisational morphology — the component of the grammar which ties the morphosyntax to the phonology, by realising morphosyntactic features structures as morphomic representations, then morphomic representations as underlying phonological representations. A formalism is proposed in order to express these mappings within a constraint based grammar. In addition to enriching our understanding of Kayardild, the dissertation presents data and analyses which will be of interest for theories of the interface between morphology on the one hand and phonology and syntax on the other, as well as for morphological and phonological theory more narrowly

    Meaning versus Grammar

    Get PDF
    This volume investigates the complicated relationship between grammar, computation, and meaning in natural languages. It details conditions under which meaning-driven processing of natural language is feasible, discusses an operational and accessible implementation of the grammatical cycle for Dutch, and offers analyses of a number of further conjectures about constituency and entailment in natural language

    『現代日本語書き言葉コーパス』完成記念講演会予稿集

    Get PDF
    『現代日本語書き言葉均衡コーパス』完成記念講演会,JA共済ビル,2011年8月2-3日,特定領域研究「日本語コーパス」総括

    特定領域研究「日本語コーパス」平成20年度公開ワークショップ(研究成果報告会)予稿集

    Get PDF
    特定領域研究「日本語コーパス」平成20年度公開ワークショップ,東京工業大学大岡山キャンパスディジタル多目的ホール,2009年3月15-16日,特定領域研究「日本語コーパス」総括

    特定領域研究「日本語コーパス」平成22年度公開ワークショップ(研究成果報告会)予稿集

    Get PDF
    特定領域研究「日本語コーパス」平成22年度公開ワークショップ,時事通信ホール,2011年3月14-16日,特定領域研究「日本語コーパス」総括
    corecore