6 research outputs found

    Analysis of Identifying Linguistic Phenomena for Recognizing Inference in Text

    Get PDF
    [[abstract]]Recognizing Textual Entailment (RTE) is a task in which two text fragments are processed by system to determine whether the meaning of hypothesis is entailed from another text or not. Although a considerable number of studies have been made on recognizing textual entailment, little is known about the power of linguistic phenomenon for recognizing inference in text. The objective of this paper is to provide a comprehensive analysis of identifying linguistic phenomena for recognizing inference in text (RITE). In this paper, we focus on RITE-VAL System Validation subtask and propose a model by using an analysis of identifying linguistic phenomena for Recognizing Inference in Text (RITE) using the development dataset of NTCIR-11 RITE-VAL subtask. The experimental results suggest that well identified linguistic phenomenon category could enhance the accuracy of textual entailment system.[[sponsorship]]IEEE[[incitationindex]]EI[[conferencetype]]國際[[conferencedate]]20140813~20140815[[booktype]]電子版[[iscallforpapers]]Y[[conferencelocation]]San Francisco, California, US

    ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation

    Full text link
    We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce multiple high-quality sentential paraphrases per source sentence, yielding an English paraphrase resource with more than 4 billion generated tokens and exhibiting greater lexical diversity. Using human judgments, we also demonstrate that ParaBank's paraphrases improve over ParaNMT on both semantic similarity and fluency. Finally, we use ParaBank to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.Comment: To be presented at AAAI 2019. 8 page

    Context Aware Textual Entailment

    Get PDF
    In conversations, stories, news reporting, and other forms of natural language, understanding requires participants to make assumptions (hypothesis) based on background knowledge, a process called entailment. These assumptions may then be supported, contradicted, or refined as a conversation or story progresses and additional facts become known and context changes. It is often the case that we do not know an aspect of the story with certainty but rather believe it to be the case; i.e., what we know is associated with uncertainty or ambiguity. In this research a method has been developed to identify different contexts of the input raw text along with specific features of the contexts such as time, location, and objects. The method includes a two-phase SVM classifier along with a voting mechanism in the second phase to identify the contexts. Rule-based algorithms were utilized to extract the context elements. This research also develops a new context˗aware text representation. This representation maintains semantic aspects of sentences, as well as textual contexts and context elements. The method can offer both graph representation and First-Order-Logic representation of the text. This research also extracts a First-Order Logic (FOL) and XML representation of a text or series of texts. The method includes entailment using background knowledge from sources (VerbOcean and WordNet), with resolution of conflicts between extracted clauses, and handling the role of context in resolving uncertain truth

    言語学的特徴を用いた述部の正規化と同義性判定

    Get PDF
    京都大学0048新制・課程博士博士(情報学)甲第17991号情博第513号新制||情||91(附属図書館)80835京都大学大学院情報学研究科知能情報学専攻(主査)教授 黒橋 禎夫, 教授 石田 亨, 教授 河原 達也学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    Unsupervised extraction of semantic relations using discourse information

    Get PDF
    La compréhension du langage naturel repose souvent sur des raisonnements de sens commun, pour lesquels la connaissance de relations sémantiques, en particulier entre prédicats verbaux, peut être nécessaire. Cette thèse porte sur la problématique de l'utilisation d'une méthode distributionnelle pour extraire automatiquement les informations sémantiques nécessaires à ces inférences de sens commun. Des associations typiques entre des paires de prédicats et un ensemble de relations sémantiques (causales, temporelles, de similarité, d'opposition, partie/tout) sont extraites de grands corpus, par l'exploitation de la présence de connecteurs du discours signalant typiquement ces relations. Afin d'apprécier ces associations, nous proposons plusieurs mesures de signifiance inspirées de la littérature ainsi qu'une mesure novatrice conçue spécifiquement pour évaluer la force du lien entre les deux prédicats et la relation. La pertinence de ces mesures est évaluée par le calcul de leur corrélation avec des jugements humains, obtenus par l'annotation d'un échantillon de paires de verbes en contexte discursif. L'application de cette méthodologie sur des corpus de langue française et anglaise permet la construction d'une ressource disponible librement, Lecsie (Linked Events Collection for Semantic Information Extraction). Celle-ci est constituée de triplets: des paires de prédicats associés à une relation; à chaque triplet correspondent des scores de signifiance obtenus par nos mesures.Cette ressource permet de dériver des représentations vectorielles de paires de prédicats qui peuvent être utilisées comme traits lexico-sémantiques pour la construction de modèles pour des applications externes. Nous évaluons le potentiel de ces représentations pour plusieurs applications. Concernant l'analyse du discours, les tâches de la prédiction d'attachement entre unités du discours, ainsi que la prédiction des relations discursives spécifiques les reliant, sont explorées. En utilisant uniquement les traits provenant de notre ressource, nous obtenons des améliorations significatives pour les deux tâches, par rapport à plusieurs bases de référence, notamment des modèles utilisant d'autres types de représentations lexico-sémantiques. Nous proposons également de définir des ensembles optimaux de connecteurs mieux adaptés à des applications sur de grands corpus, en opérant une réduction de dimension dans l'espace des connecteurs, au lieu d'utiliser des groupes de connecteurs composés manuellement et correspondant à des relations prédéfinies. Une autre application prometteuse explorée dans cette thèse concerne les relations entre cadres sémantiques (semantic frames, e.g. FrameNet): la ressource peut être utilisée pour enrichir cette structure par des relations potentielles entre frames verbaux à partir des associations entre leurs verbes. Ces applications diverses démontrent les contributions prometteuses amenées par notre approche permettant l'extraction non supervisée de relations sémantiques.Natural language understanding often relies on common-sense reasoning, for which knowledge about semantic relations, especially between verbal predicates, may be required. This thesis addresses the challenge of using a distibutional method to automatically extract the necessary semantic information for common-sense inference. Typical associations between pairs of predicates and a targeted set of semantic relations (causal, temporal, similarity, opposition, part/whole) are extracted from large corpora, by exploiting the presence of discourse connectives which typically signal these semantic relations. In order to appraise these associations, we provide several significance measures inspired from the literature as well as a novel measure specifically designed to evaluate the strength of the link between the two predicates and the relation. The relevance of these measures is evaluated by computing their correlations with human judgments, based on a sample of verb pairs annotated in context. The application of this methodology to French and English corpora leads to the construction of a freely available resource, Lecsie (Linked Events Collection for Semantic Information Extraction), which consists of triples: pairs of event predicates associated with a relation; each triple is assigned significance scores based on our measures. From this resource, vector-based representations of pairs of predicates can be induced and used as lexical semantic features to build models for external applications. We assess the potential of these representations for several applications. Regarding discourse analysis, the tasks of predicting attachment of discourse units, as well as predicting the specific discourse relation linking them, are investigated. Using only features from our resource, we obtain significant improvements for both tasks in comparison to several baselines, including ones using other representations of the pairs of predicates. We also propose to define optimal sets of connectives better suited for large corpus applications by performing a dimension reduction in the space of the connectives, instead of using manually composed groups of connectives corresponding to predefined relations. Another promising application pursued in this thesis concerns relations between semantic frames (e.g. FrameNet): the resource can be used to enrich this sparse structure by providing candidate relations between verbal frames, based on associations between their verbs. These diverse applications aim to demonstrate the promising contributions provided by our approach, namely allowing the unsupervised extraction of typed semantic relations
    corecore