20 research outputs found

    ANNODIS : une approche outillée de l'annotation de structures discursives

    Get PDF
    International audienceThe ANNODIS project has two interconnected objectives: to produce a corpus of texts annotated at discourse-level, and to develop tools for corpus annotation and exploitation. Two sets of annotations are proposed, representing two complementary perspectives on discourse organisation: a bottom-up approach starting from minimal discourse units and building complex structures via a set of discourse relations; a top-down approach envisaging the text as a whole and using pre-identified cues to detect discourse macro-structures. The construction of the corpus goes hand in hand with the development of two interfaces: the first one supports manual annotation of discourse structures, and allows different views of the texts using NLP-based pre-processing; another interface will support the exploitation of the annotations. We present the discourse models and annotation protocols, and the interface which embodies them.Le projet ANNODIS vise la construction d'un corpus de textes annotĂ©s au niveau discursif ainsi que le dĂ©veloppement d'outils pour l'annotation et l'exploitation de corpus. Les annotations adoptent deux points de vue complĂ©mentaires : une perspective ascendante part d'unitĂ©s de discours minimales pour construire des structures complexes via un jeu de relations de discours ; une perspective descendante aborde le texte dans son entier et se base sur des indices prĂ©-identifiĂ©s pour dĂ©tecter des structures discursives de haut niveau. La construction du corpus est associĂ©e Ă  la crĂ©ation de deux interfaces : la premiĂšre assiste l'annotation manuelle des relations et structures discursives en permettant une visualisation du marquage issu des prĂ©traitements ; une seconde sera destinĂ©e Ă  l'exploitation des annotations. Nous prĂ©sentons les modĂšles et protocoles d'annotation Ă©laborĂ©s pour mettre en Ɠuvre, au travers de l'interface dĂ©diĂ©e, la campagne d'annotation

    Quelle(s) relation(s) de discours pour les structures énumératives ?

    Get PDF
    International audienceLes structures Ă©numĂ©ratives, que nous nous proposons d'Ă©tudier dans cet article, sont des structures hiĂ©rarchiques au plan textuel et sont frĂ©quemment associĂ©es Ă  des marques de surface typo- dispositionnelles. Notre Ă©tude vise Ă  explorer la contribution d'une source d'information quelque peu nĂ©gligĂ©e (mĂȘme si nous n'Ă©tudierons pas directement les marques typo-dispositionnelles) Ă  l'Ă©tablissement de la cohĂ©rence discursive, et plus particuliĂšrement la contribution de cette source d'information au repĂ©rage de la relation d'Elaboration, une tĂąche notoirement difficile

    Knowledge Graph and Deep Learning-based Text-to-GQL Model for Intelligent Medical Consultation Chatbot

    Get PDF
    Text-to-GQL (Text2GQL) is a task that converts the user's questions into GQL (Graph Query Language) when a graph database is given. That is a task of semantic parsing that transforms natural language problems into logical expressions, which will bring more efficient direct communication between humans and machines. The existing related work mainly focuses on Text-to-SQL tasks, and there is no available semantic parsing method and data set for the graph database. In order to fill the gaps in this field to serve the medical Human–Robot Interactions (HRI) better, we propose this task and a pipeline solution for the Text2GQL task. This solution uses the Adapter pre-trained by “the linking of GQL schemas and the corresponding utterances" as an external knowledge introduction plug-in. By inserting the Adapter into the language model, the mapping between logical language and natural language can be introduced faster and more directly to better realize the end-to-end human–machine language translation task. In the study, the proposed Text2GQL task model is mainly constructed based on an improved pipeline composed of a Language Model, Pre-trained Adapter plug-in, and Pointer Network. This enables the model to copy objects' tokens from utterances, generate corresponding GQL statements for graph database retrieval, and builds an adjustment mechanism to improve the final output. And the experiments have proved that our proposed method has certain competitiveness on the counterpart datasets (Spider, ATIS, GeoQuery, and 39.net) converted from the Text2SQL task, and the proposed method is also practical in medical scenarios

    DĂ©tection automatique de structures fines de texte

    Get PDF
    National audienceDans ce papier, nous prĂ©sentons un systĂšme de DĂ©tection de Structures fines de Texte (appelĂ© DST). DST utilise un modĂšle prĂ©dictif obtenu par un algorithme d’apprentissage qui, pour une configuration d’indices discursifs donnĂ©s, prĂ©dit le type de relation de dĂ©pendance existant entre deux Ă©noncĂ©s. Trois types d’indices discursifs ont Ă©tĂ© considĂ©rĂ©s (des relations lexicales, des connecteurs et un parallĂ©lisme syntaxico-sĂ©mantique) ; leur repĂ©rage repose sur des heuristiques. Nous montrons que notre systĂšme se classe parmi les plus performants

    Accounting for Discourse Relations: Constituency and Dependency

    Get PDF
    At the start of my career, I had the good fortune of working wit

    Knowledge Graph and Deep Learning-based Text-to-GQL Model for Intelligent Medical Consultation Chatbot

    Get PDF
    Text-to-GQL (Text2GQL) is a task that converts the user's questions into GQL (Graph Query Language) when a graph database is given. That is a task of semantic parsing that transforms natural language problems into logical expressions, which will bring more efficient direct communication between humans and machines. The existing related work mainly focuses on Text-to-SQL tasks, and there is no available semantic parsing method and data set for the graph database. In order to fill the gaps in this field to serve the medical Human–Robot Interactions (HRI) better, we propose this task and a pipeline solution for the Text2GQL task. This solution uses the Adapter pre-trained by “the linking of GQL schemas and the corresponding utterances" as an external knowledge introduction plug-in. By inserting the Adapter into the language model, the mapping between logical language and natural language can be introduced faster and more directly to better realize the end-to-end human–machine language translation task. In the study, the proposed Text2GQL task model is mainly constructed based on an improved pipeline composed of a Language Model, Pre-trained Adapter plug-in, and Pointer Network. This enables the model to copy objects' tokens from utterances, generate corresponding GQL statements for graph database retrieval, and builds an adjustment mechanism to improve the final output. And the experiments have proved that our proposed method has certain competitiveness on the counterpart datasets (Spider, ATIS, GeoQuery, and 39.net) converted from the Text2SQL task, and the proposed method is also practical in medical scenarios

    Processing Units in Conversation: A Comparative Study of French and Mandarin Data

    Full text link

    RST Signalling Corpus: A Corpus of Signals of Coherence Relations

    Get PDF
    We present the RST Signalling Corpus (Das et al. in RST signalling corpus, LDC2015T10. https://catalog.ldc.upenn.edu/LDC2015T10, 2015), a corpus annotated for signals of coherence relations. The corpus is developed over the RST Discourse Treebank (Carlson et al. in RST Discourse Treebank, LDC2002T07. https://catalog.ldc.upenn.edu/LDC2002T07, 2002) which is annotated for coherence relations. In the RST Signalling Corpus, these relations are further annotated with signalling information. The corpus includes annotation not only for discourse markers which are considered to be the most typical (or sometimes the only type of) signals in discourse, but also for a wide array of other signals such as reference, lexical, semantic, syntactic, graphical and genre features as potential indicators of coherence relations. We describe the research underlying the development of the corpus and the annotation process, and provide details of the corpus. We also present the results of an inter-annotator agreement study, illustrating the validity and reproducibility of the annotation. The corpus is available through the Linguistic Data Consortium, and can be used to investigate the psycholinguistic mechanisms behind the interpretation of relations through signalling, and also to develop discourse-specific computational systems such as discourse parsing applications
    corecore